Causal diagrams in systems epidemiology
© Joffe et al; licensee BioMed Central Ltd. 2012
Received: 6 August 2011
Accepted: 19 March 2012
Published: 19 March 2012
Methods of diagrammatic modelling have been greatly developed in the past two decades. Outside the context of infectious diseases, systematic use of diagrams in epidemiology has been mainly confined to the analysis of a single link: that between a disease outcome and its proximal determinant(s). Transmitted causes ("causes of causes") tend not to be systematically analysed.
The infectious disease epidemiology modelling tradition models the human population in its environment, typically with the exposure-health relationship and the determinants of exposure being considered at individual and group/ecological levels, respectively. Some properties of the resulting systems are quite general, and are seen in unrelated contexts such as biochemical pathways. Confining analysis to a single link misses the opportunity to discover such properties.
The structure of a causal diagram is derived from knowledge about how the world works, as well as from statistical evidence. A single diagram can be used to characterise a whole research area, not just a single analysis - although this depends on the degree of consistency of the causal relationships between different populations - and can therefore be used to integrate multiple datasets.
Additional advantages of system-wide models include: the use of instrumental variables - now emerging as an important technique in epidemiology in the context of mendelian randomisation, but under-used in the exploitation of "natural experiments"; the explicit use of change models, which have advantages with respect to inferring causation; and in the detection and elucidation of feedback.
KeywordsEpidemiological methodology Causation DAGs Diagrammatic methods Infectious disease epidemiology models Web of causation Instrumental variables Change models Feedback
"Could one of the problems of modern epidemiology ... be that we have drifted back to a posteriori methods - fitting black box equations to data, rather than working out predictions from mathematical modeling of underlying processes?" Norman E Breslow, 2003 .
"... narrowness of thinking ... pervades much of modern science and leads to inaccurate assessments and prescriptions in many fields. The narrowness itself stems from a perennial challenge with which every scientist must grapple: many phenomena we'd like to understand are highly complex and have multiple, interacting causes." Paul Epstein, 2011 .
The role of causation in epidemiology
Causation is very important in epidemiology. Epidemiologists are traditionally cautious in using causal concepts: the basic method of epidemiology is to observe and quantify associations, whereas causal relationships cannot be directly observed. Causal inference is then a distinct step which is not unproblematic, but which cannot be ignored because the two main purposes of epidemiological evidence are to provide understanding and the basis for intervention, and for both of these it is necessary to know about the causal status of the observed associations.
More abstractly, a causal relationship is one that has a mechanism that by its operation makes a difference [5, 6]. The scientific process of discovery of causal relationships can proceed using either of these features. Epidemiology employs difference-making, i.e. how much effect one variable has on another; the other approach, which has a complementary role, is uncovering the mechanism, i.e. explaining how it exerts that effect [5, 6].
Causal relationships operate over time, so that difference-making is distinct from non-causal differences that exist between categories of background variables, such as sex differences in disease risk. For example, the higher rate of breast cancer in women than men can be traced to metabolic differences between the two sexes (e.g. high endogenous estrogens in females), which do play a causal role over time. The observed sex difference is due to differences between processes in the two sexes that are themselves causal.
Diagrams consisting of variables connected by arrows or lines are widely used in epidemiology, either formally as in the Directed Acyclic Graph (DAG) literature, or informally as influence diagrams, to depict relationships that are relatively complicated and so are considered to deserve illustrating in this way. In this paper we consider the use of diagrams that denote causation, not merely association: one variable alters the probability, timing, magnitude and/or severity of the next variable; or alternatively they represent the "flow" of, for example, individuals from the status of susceptible to infected and thence to recovered (or dead).
In particular, we review the types of diagram that go beyond the depiction of a single link, e.g. a disease and its proximal causal factor, to focus on a larger causal system that is important to health. A "system" in this context is made up of multiple causal relationships, each one of which can be considered as a "link"; and each of the links is considered potentially important, as it could influence how the system as a whole behaves. Because it can be difficult to envisage such multiple links intuitively, and in more complicated cases errors are likely, diagrams are very valuable in showing the inter-relationships. Some of these uses are already well established, especially in infectious disease epidemiology, but we believe that this perspective could be further developed in epidemiology more generally - what could be called "systems epidemiology", by analogy with the recent development of systems biology (see below). Such causal systems could include biochemical pathways, e.g. in relation to biomarkers, or the social/environmental context in which people live that could affect their disease risk.
Directed Acyclic Graphs (DAGs)
The use of DAGs has gained increasing recognition within epidemiology in recent years, following the work of Pearl, Robins, Greenland and others . DAGs are simple to use, and in addition it has been shown that if certain simple rules are followed, they provide a rigorous guide to such issues as confounding and selection effects. In general, the procedures associated with DAGs correspond to traditional statistical methods, including informal "rules of thumb" such as not adjusting for a covariate that is on the causal pathway, but they are less error-prone in complicated situations.
Furthermore, the DAG tradition has its limitations: once one goes beyond the technical issues of inferring the causal status of a particular observed association, other considerations come into play. These require the use of other diagram-associated methods, including the modelling of infectious disease outbreaks with differential equations, fitting statistical models to causal networks, and analysing systems characterised by feedback. The wider properties of such systems are scientifically and practically important, yet are insufficiently appreciated in most of epidemiology.
Systems epidemiology and the use of diagrams
In this paper, we discuss different types of causal system that are relevant to epidemiology: models of infectious disease transmission, in which the human population is located within a broader system with which it interacts; models that integrate the emission and dispersion of pollutants with their impacts on health; and the relationship of social factors to specific risk factors and to selection effects. We describe how diagrams can be employed to improve the analysis of such systems, and in the course of doing so we note that generic properties of the systems can be observed that are independent of the specific content, even though the diagrams themselves have been constructed solely from empirical evidence - no structure has been imposed on them.
We draw on a number of traditions that have analysed systems and/or that have used causal diagrams. The most important of these are:
path diagram analysis, which was devised by the geneticist Sewall Wright but which has mainly been employed in quantitative social science analysis, and
the similar but more general method of structural equation modelling, which also systematically analyses measurement error  - including the use of latent variables that represent theoretical constructs, estimated from several measured variables;
econometrics, in which the structure of a system is represented by an equation for each link, albeit without the systematic use of causal diagrams ;
diagrams of metabolic pathways in biochemistry;
the tradition of infectious disease epidemiology modelling , which is based on demographic and ecological models involving the relationship between different species;
Modelling the larger system
Models and diagrams in infectious disease epidemiology
In 1897, Ronald Ross established that malaria is spread by the Anopheles mosquito, and subsequently received the second Nobel prize for medicine. He then defined a mathematical model describing the time dependent dynamics of infection and recovery in human and mosquito populations. The major terms in the differential equations describing this human-mosquito-parasite ecology were (unless otherwise stated, these terms are numbers per unit time): the number of newly infected humans arising due to bites from infected mosquitoes, the number of new mosquito infections due to biting infected humans, and the rate of recovery of both humans and mosquitoes from infection .The explicit expression of these differential equations as an a priori model - i.e. a model in which the sole causative agent of disease was assumed from outset to be the protozoan parasite, which was acquired by mosquito biting - led to the startling conclusion that there existed a critical value for the number of mosquitoes per person that needed to be present in order to allow the parasite to persist locally. Ross estimated this critical number of mosquitoes per person to be 40 - implying that Anopheles did not need to be eradicated for the disease to die out .
Ross reached this conclusion by modelling the whole system: the human population within its environment. It was built on evidence at the individual level, but with some of the (implied) interventions at group or environmental level. His method was not expressed as a diagram, but it represents a sequential causal relationship, the key outcome being whether the number of infected people in one period is higher or lower than that in the previous one. The method was feasible because he focused on the single cause, malaria transmission by mosquito which had already been established, and omitted other relevant factors, e.g. that nutritional status might affect susceptibility.
Models of this type can be more complex, for example if vector transmission is involved, but the principle remains the same. The equivalent of Ross's critical mosquito density is the basic reproduction number R0: if is greater than unity, this indicates that the number of new cases in one period is higher than that in the previous one, and therefore that the outbreak can propagate itself; if it is less than unity then the epidemic will fade out. Most such models are deterministic in that they do not consider stochastic causation, but probabilistic elements are increasingly being incorporated .
Compartmental models rely on the existence of a single characteristic that can be used to partition the whole population. In the SIR case, the partitioning characteristic is the status of each person with respect to susceptibility and infectiousness. The model is thus mono-causal, neglecting other factors such as nutritional status and the existence of other infections that may influence the recovery rate; models can be modified to take these into account, e.g. stratifying the population into high and low risk groups .
Single-chain models outside infectious disease epidemiology
This approach is no longer used only for modelling infectious diseases. For example, it has been applied to cervical cancer, involving carcinogenic HPV transmission dynamics and the natural history of the disease. It involved comparing scenarios of vaccination against HPV-16, either of 12-year-old girls alone or of both sexes, and of the no-vaccination scenario . Thus, the distinction of infectious and non-infectious disease is somewhat artificial, given that the same modelling methodology can be used in situations where the infectious agent is but one factor contributing to the development of the disease.
More generally, compartmental models can be viewed as a sub-type of diagrammatic models: flow diagrams in which the population is subdivided into ordered states. They are also of interest in chronic disease epidemiology, where they can be used to represent the evolution of health status among known steps of disease progression. These stages can either be observed or hidden (e.g. if the prevalence of the asymptomatic affection cannot be measured) [28, 29]. On top of providing a quantification of the impact of risk factors/exposures on the disease risk, these approaches also give an insight into the dynamic of disease progression at the individual level, and at the population level, into the dynamic of the epidemic.
Compartmental models aim at reconstructing the individual or population natural history of the disease progression amongst disease states, based on - potentially longitudinal - exposure or complex mixtures of exposures. Hence, by nature, they incorporate a temporal component in their causal inference, and in accordance with the recently formalised exposome concept [30, 31], they allow the disease risk to be driven not only by exposure level itself but also by its evolution in time and by potential temporal patterns in the exposure history.
A similar use of diagrams has long been standard practice in another branch of biology: biochemical pathways. These are flow diagrams in which at each stage, the molecule is modified by an enzyme belonging to that step in the pathway. In practice they are often drawn as cartoons that include also a spatial element, indicating the location of the different chemical processes within the cell.
The situation here is directly analogous to the SIR model, where the tendency of an outbreak to increase or decrease depends on the balance between inflow and outflow. In that situation this balance depends on the force of infection as measured by R0,: if greater than unity, the outflow is the rate-limiting step and infected individuals will tend to accumulate in the population, like acetaldehyde, and vice versa for values lower than unity. Although both these diagrams have been constructed in radically different contexts, their structure as well as the type of results they provide are comparable, thus highlighting the potential general use of these models. While their formulation is general, the way transitions from one compartment to another are defined is highly specific of the modelled phenomenon. This type of approach relies on the modelling of the whole system rather than focusing on a single link within the system of interest.
Multiple causation: diagrams with multiple and branching chains
The models considered so far have been concerned with only one causal pathway. However, epidemiology of non-infectious diseases usually deals with a situation of multiple causation, in which all (or most) links are analysed as stochastic - there are no necessary or sufficient causes, and Koch's postulates do not apply. Under such conditions, diagrammatic models are no longer confined to a single chain.
It is simple to draw a diagram that contains branches, but this introduces new issues that go beyond the scope of the present paper. In principle, causal diagrams and DAGs can readily cope with multiple causation, but further methodological work is needed on effect modification [34–36].
In social epidemiology, a classic question is, how much of the observed social gradient is mediated by known risk factors? It is possible to investigate this question on the simple assumption that no effect modification or other complicating factor is present, in which case a diagram is probably not necessary. However, such an assumption may not be justified. For example, an econometric analysis of the Whitehall II Study has shown that if allowance is made for selection effects, the findings change. Whilst childhood socioeconomic circumstances are still found to impact on adult health, it emerges that the association of current civil service grade with health status reflects the tendency for healthier people to be promoted. And employment grade is also predicted by childhood socioeconomic position, which thus influences adult health both directly and via job success - for example, promotion is more likely for taller people, and height is an indicator of childhood wellbeing .
By making the pathways explicit in a web of causation, a diagram deepens understanding and provides a framework for statistical analysis. In addition, it serves as a valuable practical guide: it not only provides multiple entry points for intervention, but also has the capacity to demonstrate and quantify the inter-relationship of different factors - including unpredicted and possibly undesirable side-effects. Strangely, although influence diagrams have been used informally to clarify hypotheses on the particular pathways that may be operating, it is rare to find causal diagrams being used as the basis for the statistical analysis of a system , as has been proposed in the context of setting out the evidence base for Health Impact Assessment  or Strategic Health Assessment [41, 42].
Modelling multiple and branching chains is more complicated than in the example of a whole-chain approach to exposure assessment as in Figure 5, because it involves the assumption that the chains are independent; in addition, intervention may involve multiple actions affecting more than one pathway, e.g. combining the use of "carrot" and "stick". Such diagrams are best organised by economic or policy sector; but the criterion for including variables and pathways in the diagram is that they are relevant to health - the content of the diagram is "driven by the bottom line" . An additional layer can also be included below that for health outcomes, if so desired, on the economic costs of each of the adverse health outcomes. The analysis of a diagram of this type, and indeed confirmation of its structure, requires bringing together information from a number of different sources; and some aspects (such as "community severance" in Figure 6) may not be readily quantifiable. Multi-disciplinary research projects to integrate the relevant areas are currently underway .
Properties and functions of causal diagrams
Causal diagrams are distinct from "mental maps", because they set out to describe relationships in the real world. The appropriate structure for a particular application is always driven by the content, so that the diagram is constructed by knowledge of the actual and possible pathways. For most people this is an intuitive and rather simple process, and informal diagrams have been used in non-academic situations, for example in stakeholder consultation in the context of Health Impact Assessment. In fact their flexibility and ease of use could lead to misuse, and one purpose of this paper is to make the case for the explicit further development of rigorous diagrammatic methods and associated statistical analysis.
A diagram can be used as the basis for a single study using a single dataset, but is not limited to this. As it conceptually maps out the research topic, it can have the function of synthesising the evidence from several distinct studies, including integration of multiple datasets that cover different parts of the causal web, and representation of qualitative as well as quantitative links. Thus, the diagram can be updated with new evidence as it accumulates.
A corollary is that a diagram can even be constructed when the evidence for some of the links is only tentative. The most important part is the structure, which is derived from substantive knowledge of a subject, as this is more difficult to modify later than the existence and strengths of individual component links. It may happen that more than one structure is possible, if different investigators have different conceptions of a system's causal relationships. This of course happens whether or not a diagram is used, and the advantage of using one is that it makes the different options explicit. They can then be discussed, and if appropriate, rival conceptions can be tested against the data. It is important that such a diagram is clearly indicated as being only conjectural; as evidence accumulates, the diagram can then evolve from having conjectural to well-supported status. Even at the conjectural stage, a diagram can have several important functions:
to make assumptions and hypotheses explicit for discussion;
to place hypotheses in the public domain prior to testing - a conjecture that is open to refutation;
to plan data collection;
to structure the statistical analysis of the hypothesised pathways;
to identify evidence gaps and thereby to generate a research agenda.
Publishing the hypothesis of each study in advance of carrying out the research would remove the temptation for epidemiologists to adjust it once they have seen the data, which is an inevitable hazard of the rich datasets that are now available, and threatens to erode the distinction between hypothesis-testing and hypothesis-generating studies. This could conveniently be done in the form of a causal diagram, or more than one if disagreement is present between the researchers.
Depending on the degree of stability across different contexts, the application of a given model to different populations may require its modification. For instance, if the causal parameter for each component link varies between populations, and if its variation is systematic, the source of such variations can be included in the causal diagram, yielding a "hierarchical" structure.
Once a structure (or, rival structures) has/have been constructed, it/they can be used as the framework for statistical analysis of the component links. If in doubt, a postulated link should be included, as it can always be deleted in the light of evidence suggesting its magnitude is zero, whereas discovery of a link that was omitted in error is more difficult - although this can be achieved by algorithms incorporated in software e.g. in the context of DAGs used in genetics. The same applies to variables: they should be included, with all the pathways thought to be possibly relevant, unless and until analysis shows them to be unimportant. (This corresponds to how candidates for confounding variables are conventionally handled.) Thus the most conservative diagram contains all possible variables and pathways. The statistical analysis then results in deletion of some links, and quantification of those that remain.
In the deletion of links, it is clearly inappropriate to use a simple criterion such as striking out those that do not reach statistical significance. This is because a relationship could fail to reach significance merely due to small sample size. A better method is to use model comparison/selection methods such as those based on likelihood ratio (e.g. Akaike Information Criterion (AIC), or its Bayesian alternative the Deviance Information Criterion (DIC)). However, this process is fallible, especially in the presence of measurement error. An alternative is the use of structural equation modelling, in which latent variables can be introduced to deal with measurement error. The addition of a hierarchical layer modelling the relationship between observations and "true" values of a parameter could be considered, thus defining a hidden Markov Model .
One of the distinctive features of a diagrammatic approach is that a causal pathway can be modelled using any parametric form, therefore separating the two key questions "does a link exist?", and "if so, what is its functional form?". This has an advantage over the specification of the system in terms of equations, where the elision of these two questions may be harder to avoid. For example, it is rather straightforward to draw a diagram such as that shown in Figures 6 or 7 from existing knowledge, but many of the causal relationships may be difficult to specify with any confidence. Another implication is that the use of causal diagramming clarifies the distinction between effect modification and statistical interaction; the latter may arise merely because e.g. linearity has been assumed in a situation where it does not correspond to the real functional form. Effect modification, on the other hand, corresponds to the situation where the relationship between two variables is altered by a third variable [34–36].
On the other hand, it is necessary to be cautious - diagrams may make the situation look simpler than it really is. An example of this is transmissibility: it may appear that if X → Y and Y → Z, then it is necessarily true that X → Y → Z. Logically it seems undeniable, but in real life this is not always the case. For example, in toxicology it is typically found that the dose-response relationship has a threshold: below a certain dose of the substance it has no impact on the organism. If this is represented by Y → Z, and the pathway X → Y does not result in the accumulation of Y to the threshold level, then X → Y → Z will not be true. This has fundamental implications even for basic data handling. For example, in studying the possible effects of disinfection by-products on the outcome of pregnancy, it was found that swimming led to infrequent but very high exposure levels . If the exposure was coded as e.g. a weekly average, this was implicitly assuming that the actual exposure-response relationship is linear, which is not necessarily the case. The implication is that the assessment of one link cannot legitimately be considered separately from the characteristics of the neighbouring ones. This is easily missed if the inter-connections between links are not given their due weight.
Extensions of the method
One of the most intractable problems with epidemiology, other than in the (rather rare and special) situations where randomisation can be used, is that it is difficult to reliably infer causation from observational studies, because the upstream causal pathways are complicated and may introduce confounding or selection. One approach is to try and map out these pathways and analyse them in their own right. An alternative, which also involves analysing a system that is conveniently portrayed by means of a diagram, is to use the instrumental variable approach . The basic idea is to find something that is outside the system being studied, and that influences the putative causal variable (actually "influences" here is misleading - the relationship does not have to be causal, only associational).
The principle is that one or more additional variables - "instruments"- are introduced, associated with the putative causal variables, but not directly with the outcome variable or any potential confounders. Further assumptions are that effect modification and alternative pathways are absent. All these assumptions need to be checked, and a convincing case made that they are satisfied; it is impossible for this to be conclusively established, a similar situation to the familiar case of unmeasured confounding. This approach is the equivalent in observational studies of analysis by intention to treat in randomised controlled trials.
A frequently-used method of statistical analysis has two stages: first, the instrument is used in a regression to obtain an estimated value of the putative causal variable, and then this estimated value is plugged into a second regression equation that contains the variables of substantive interest. The estimated value is an unconfounded measure if the above assumptions are met.
Such approaches can also be used in observational studies that do not involve genetics, as has long been routine in econometrics. A nice example is a study of the effect of family size on the mother's work status: to distinguish a direct causal effect from confounding (e.g. her preference for career as against childbearing) and from reverse causation (e.g. promotion leading to a decision not to have a further child, or not yet), the authors used the sex of the first two children as a natural experiment . If they were of the same sex, the parents are more likely to want another child, for reasons unconnected with the labour market, so this plays the same role as deliberate assignment would if it were possible. Using this type of analysis in the context of a natural experiment could produce valuable evidence with a better grasp on the issue of causality than is often the case in observational epidemiology, but as far as we are aware this has not yet been attempted. An example might be the introduction of an alcohol tax that influenced consumption in an analogous way to the ALDH2 polymorphism - if the assumption is sustainable that there is no effect modification with other variables in the system.
It is usual to construct diagrams in terms of the levels of the relevant variables, but an alternative is to instead use their changes. The mathematics of a change (or "first difference") model are different from one in terms of levels, a distinction that is very familiar in econometrics. One advantage is, any elements that remain invariant do not feature in a change model, so it can be a great deal simpler and thus more tractable. This invariance condition can be violated, for example in the presence of effect modification, or when the variable itself has a time-varying effect, such as the differing effect of maternal education on a child's IQ at different ages.
A second benefit is that interpretation is clearer: for example, it is relatively straightforward to think about the health impacts of a factory closure, whereas a discussion of the effects of (un)employment on health is more complicated, e.g. due to (self-)selection effects. Evidence derived from a change perspective may also carry more weight causally: for example, a controlled before-after study of a coal ban in Dublin showed the change in pollutant levels and in subsequent mortality there, but not in the rest of Ireland that was unaffected by the ban . This is more convincing than when causation is inferred from cross-sectional studies . In Bradford Hill's classic paper on inferring causation, he considered "experiment" - whether when "some preventive action is taken does it in fact prevent [the disease]?" - as "the strongest support for the causation hypothesis" . However, caution is required: for example in the factory closure example, the health deficit that results is not necessarily the same as the health benefit that would occur in the reverse situation, i.e. if the same number of jobs were created (a possibility that is frequently put forward by proponents of capital projects, and which therefore is a recurring issue in Health Impact Assessment).
Feedback and cyclical models
Reinforcing (positive) feedback may also occur, for example, people who are physically inactive may tend to become obese and have other physical changes that further discourage them from exercising, and conversely, more active people have physiological changes that encourage them to take more exercise (this feedback mechanism seems plausible although there is no clear evidence for it). Reinforcing feedback also frequently occurs in models of population growth, and therefore in infectious disease modelling which is derived from demography and ecology. For example, this occurs when parasites sexually reproduce (such as worms that cause chronic tropical diseases): as the parasite population size increases, assuming that the sex ratio remains fixed, so does the likelihood that males and females will discover and mate with one another and therefore population growth occurs faster .
More generally, as humans tend to respond systematically to their situation, if the response (e.g. policy or other intervention) is included in the model, then feedback may have to be taken into account. This is true also of conditions (or social issues with health consequences) that involve a large behavioural element, such as obesity, mental health and homicide .
In general, analysing systems with feedback requires a different approach, with diagrams that contain cycles. In infectious disease epidemiology, this is explicit because feedback loops are the rule and acyclic diagrams the exception. Infectious disease epidemiology models are, in general, system dynamical in this sense, and off-the-shelf software such as Vensim  is often used to construct models. An excellent account of the issues and methods, in the context of business studies, can be found in Sterman . An important feature of systems containing feedback is that they tend to have the property of generating their own endogenous causation processes [63, 64], a simple biological example being homeostasis, a compensating feedback system that keeps a variable such as potassium or cortisol concentration, or temperature, at an appropriate level.
Explicitly causal methods of diagramming and modelling have been greatly developed in the past two decades. However, use of such methods in epidemiology has been mainly confined to the analysis of a single link: that between a disease outcome and its proximal determinant(s). Apart from in the context of infectious diseases, they have been under-exploited in their potential to model the larger system in which health is generated or undermined.
This approach would accord with wider developments in biology. The Human Genome Project has revealed that the number of protein-coding genes is far fewer than was previously thought, and that they are influenced by upstream genes in that large proportion of DNA that was previously referred to as "junk" . "Causes of causes" are therefore relevant outside the realm of epidemiology. In addition, more complicated networks, consisting of multiple and interacting causal chains, sometimes with regulatory feedback, are the focus of the increasingly important interdisciplinary field of systems biology .
Diagrams and models are constructed to fit each situation, from a combination of substantive knowledge and statistical evidence - but can then take on properties that result from their abstract structures. By constructing diagrams of a larger system, inter-relationships of different factors can readily be visualised, and then analysed statistically. As well as its scientific function, this has practical advantages in terms of designing interventions.
Such methods are applicable to all branches of epidemiology, including infectious diseases epidemiology, chronic disease epidemiology, environmental and occupational epidemiology, and social epidemiology - and especially to their inter-relationship, e.g. simultaneous consideration of social and environmental influences.
This paper grew out of a wider discussion group with colleagues, including Sara Geneletti and Ben Lopman. We have also benefitted from discussions with Nicky Best, David Briggs, Philip Dawid, Jenny Mindell and Eva Rehfuess. In addition, reviewers' comments from Tony McMichael and from the ETE corresponding editor greatly helped us improve the paper.
- Breslow NE: Are statistical contributions to medicine undervalued?. Biometrics. 2003, 59: 1-8. 10.1111/1541-0420.00001View ArticlePubMedGoogle Scholar
- Epstein P: Changing planet, changing health University of California Press; 2011.Google Scholar
- Pearl J: Causality: models, reasoning and inference New York: Cambridge University Press; 2000.Google Scholar
- Pearl J: Causal inference in the health sciences: a conceptual introduction. Health services and outcomes research methodology. 2002, 2: 189-220.View ArticleGoogle Scholar
- Joffe M: Causality and evidence discovery in epidemiology. In Explanation, Prediction, and Confirmation. New Trends and Old Ones Reconsidered. Edited by: Dieks D, Wenceslao JG, Hartmann S, Uebel T, Weber M. Springer; 2011:.Google Scholar
- Joffe M: The gap between evidence discovery and actual causal relationships. Preventive Medicine. 2011, 53: 246-49. 10.1016/j.ypmed.2011.07.011View ArticlePubMedGoogle Scholar
- Greenland S, Pearl J, Robins JM: Causal diagrams for epidemiologic research. Epidemiol. 1999, 10: 37-48. 10.1097/00001648-199901000-00008. 10.1097/00001648-199901000-00008View ArticleGoogle Scholar
- Robins JM: Data, design, and background knowledge in etiologic inference. Epidemiology. 2001, 11: 313-20.View ArticleGoogle Scholar
- Lauritzen SL, Richardson TS: Chain graph models and their causal interpretations. J R Statist Soc B. 2002, 64: 321-61. 10.1111/1467-9868.00340. 10.1111/1467-9868.00340View ArticleGoogle Scholar
- Maldonado G, Greenland S: Estimating causal effects. Int J Epidemiol. 2002, 31: 422-29. 10.1093/ije/31.2.422View ArticlePubMedGoogle Scholar
- Hernán MA, Hernández-Díaz S, Werler MM, Mitchell AA: Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002, 155: 176-84. 10.1093/aje/155.2.176View ArticlePubMedGoogle Scholar
- Hernán MA, Hernández-Díaz S, Robins JM: A structural approach to selection bias. Epidemiol. 2004, 15: 615-25. 10.1097/01.ede.0000135174.63482.43. 10.1097/01.ede.0000135174.63482.43View ArticleGoogle Scholar
- Howards PP, Schisterman EF, Heagerty PJ: Potential confounding by exposure history and prior outcomes - an example from perinatal epidemiology. Epidemiology. 2007, 18: 544-51. 10.1097/EDE.0b013e31812001e6View ArticlePubMedGoogle Scholar
- Glymour MM, Greenland S: Causal diagrams. In Modern epidemiology. Edited by: Rothman KJ, Greenland S, Lash TL. Philadelphia: Wolters Kluwer/ Lippincott Williams 2008:.Google Scholar
- VanderWeele TJ, Hernán MA, Robins JM: Causal directed acyclic graphs and the direction of unmeasured confounding bias. Epidemiol. 2008, 19: 720-28. 10.1097/EDE.0b013e3181810e29. 10.1097/EDE.0b013e3181810e29View ArticleGoogle Scholar
- Hogan JW: Bringing causal models into the mainstream. Epidemiology. 2009, 20: 431-32. 10.1097/EDE.0b013e3181a0997aView ArticlePubMedGoogle Scholar
- Dawid AP: Beware of the DAG. JMLR: workshop and conference proceedings. 2009, 6: 59-86. http://jmlr.csail.mit.edu/proceedings/papers/v6/dawid10a/dawid10a.pdf [accessed 1 February 2012].Google Scholar
- Spirtes P, Glymour C, Scheines R: Causation, prediction and search. 2 edition. New York: Springer-Verlag;.Google Scholar
- Wright S: The method of path coefficients. Annals of Mathematical Statistics. 1934, 5: 161-215. 10.1214/aoms/1177732676. 10.1214/aoms/1177732676View ArticleGoogle Scholar
- Kennedy P: A guide to econometrics. Oxford: Blackwell Publishers Ltd;, 4 1998.Google Scholar
- Anderson RM, May RM: Infectious diseases of humans: dynamics and control Oxford: Oxford University Press; 1992.Google Scholar
- Sterman JD: Business dynamics Boston: Irwin McGraw-Hill; 2000.Google Scholar
- von Bertallanfy L: General system theory New York: George Braziller; 1968.Google Scholar
- Ross R: The prevention of malaria New York: E.P. Dutton & company; 1910Google Scholar
- Isham V: Stochastic Models for Epidemics with Special Reference to AIDS. Ann Appl Probab. 1993, 3: 1-27. 10.1214/aoap/1177005505. 10.1214/aoap/1177005505View ArticleGoogle Scholar
- Garnett GP, Anderson RM: Balancing sexual partnerships in an age and activity stratified model of HIV transmission in heterosexual populations. IMA J Math Appl Med Biol. 1994, 11: 161-92. 10.1093/imammb/11.3.161View ArticlePubMedGoogle Scholar
- Baussano I, Garnett G, Segnan N, Ronco G, Vineis P: Modelling patterns of clearance of HPV-16 infection and vaccination efficacy. Vaccine. 2011, 29: 1270-77. 10.1016/j.vaccine.2010.11.082View ArticlePubMedGoogle Scholar
- Chadeau-Hyam M, Guihenneuc-Jouyaux C, Cousens SN, et al: An application of hidden Markov models to the French variant Creutzfeldt-Jakob disease epidemic. J Roy Stat Soc C (App Stat). 2010, 59: 839-53. 10.1111/j.1467-9876.2010.00714.x. 10.1111/j.1467-9876.2010.00714.xView ArticleGoogle Scholar
- Vineis P, Chadeau-Hyam M: Integrating biomarkers into molecular epidemiological studies. Current Opinion in Oncology. 2011, 23: 100-05. 10.1097/CCO.0b013e3283412de0View ArticlePubMedGoogle Scholar
- Wild CP: Complementing the genome with an 'exposome': The outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev. 2005, 14: 1847-50. 10.1158/1055-9965.EPI-05-0456View ArticlePubMedGoogle Scholar
- Rappaport SM, Smith MT: Environment and disease risks. Science. 2010, 330: 460-41. 10.1126/science.1192603View ArticlePubMedGoogle Scholar
- Briggs DJ: A framework for integrated environmental health impact assessment of systemic risks. Environ Health. 2008, 7: 61. 10.1186/1476-069X-7-61PubMed CentralView ArticlePubMedGoogle Scholar
- Rose G: The strategy of preventive medicine Oxford: Oxford University Press; 1992.Google Scholar
- VanderWeele TJ, Robins JM: Four types of effect modification: a classification based on directed acyclic graphs. Epidemiol. 2007, 18: 561-68. 10.1097/EDE.0b013e318127181b. 10.1097/EDE.0b013e318127181bView ArticleGoogle Scholar
- Weinberg CR: Can DAGs clarify effect modification?. Epidemiol. 2007, 18: 569-72. 10.1097/EDE.0b013e318126c11d. 10.1097/EDE.0b013e318126c11dView ArticleGoogle Scholar
- VanderWeele TJ: On the distinction between interaction and effect modification. Epidemiol. 2009, 20: 863-71. 10.1097/EDE.0b013e3181ba333c. 10.1097/EDE.0b013e3181ba333cView ArticleGoogle Scholar
- Case A, Paxson C: The long reach of childhood health and circumstance: evidence from the Whitehall II Study. Economic Journal. 2011, 121: F183-F204. 10.1111/j.1468-0297.2011.02447.x. 10.1111/j.1468-0297.2011.02447.xView ArticleGoogle Scholar
- MacMahon B, Pugh TF: Epidemiology principles and methods Boston: Little Brown and Co; 1970.Google Scholar
- Krieger N: Epidemiology and the web of causation: has anyone seen the spider?. Soc Sci Med. 1994, 39: 887-903. 10.1016/0277-9536(94)90202-XView ArticlePubMedGoogle Scholar
- Joffe M, Mindell J: Complex causal process diagrams for analyzing the health impacts of policy interventions. Am J Public Health. 2006, 96: 473-79. 10.2105/AJPH.2005.063693PubMed CentralView ArticlePubMedGoogle Scholar
- Joffe M: The need for strategic health assessment. Eur J Public Health. 2008, 18: 439-40. 10.1093/eurpub/ckn088View ArticlePubMedGoogle Scholar
- Joffe M: The role of strategic health impact assessment in sustainable development and green economics. International Journal of Green Economics. 2010, 4: 1-16.View ArticleGoogle Scholar
- Sacerdote C, Ricceri F, Rolandsson O, Baldi I, Chirlaque MD, Feskens E: Education level is a strong predictor of the risk of type 2 diabetes. The EPIC-InterAct Study. Int J Epidemiol, in revision.Google Scholar
- Rehfuess EA, Best N, Briggs DJ, Joffe M: Use of causal diagrams in systems epidemiology: elucidating the inter-relationships between determinants of acute lower respiratory infections among children in sub-Saharan Africa. Submitted to Emerging Themes in Epidmiology.Google Scholar
- de Nazelle A, Nieuwenhuijsen MJ, Antó JM, Brauer M, Briggs DJ, Braun-Fahrlander C, et al: Improving health through policies that promote active travel: a review of evidence to support integrated health impact assessment. Environment International , doi: 10.1016/j.envint.2011.02.003Google Scholar
- Best N, Joffe M, Key J, Keiding N, Jensen TK: Social variation in biological fertility. (manuscript in preparation).Google Scholar
- Guihenneuc-Jouyaux C, Richardson S, Longini IM Jr: Modeling markers of disease progression by a hidden Markov process: application to characterizing CD4 cell decline. Biometrics. 2000, 56: 733-41. 10.1111/j.0006-341X.2000.00733.xView ArticlePubMedGoogle Scholar
- Best N, Jackson C, Richardson S: Modelling complexity in health and social sciences: Bayesian graphical models as a tool for combining multiple sources of information. In Proceedings of the 3 rd ASC International Conference on Survey Research Methods Edited by: Banks R, Cornelius R, Evans S, Manners T 2005.Google Scholar
- Smith R: Assessment and validation of exposure to disinfection byproducts during pregnancy, in an epidemiological study examining associated risk of adverse fetal growth outcomes. PhD thesis Imperial College London, Department of Epidemiology and Biostatistics; 2011.Google Scholar
- Greenland S: An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000, 29: 722-29. 10.1093/ije/29.4.722View ArticlePubMedGoogle Scholar
- Davey Smith G, Ebrahim S: What can mendelian randomisation tell us about modifiable behavioural and environmental exposures?. BMJ. 2005, 330: 1076-79. 10.1136/bmj.330.7499.1076PubMed CentralView ArticlePubMedGoogle Scholar
- Chen L, Davey Smith G, Harbord RM, Lewis SJ: Alcohol intake and blood pressure: a systematic review implementing a Mendelian randomization approach. PLoS Medicine. 2008, 5: e52:0461-71. http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050052 [accessed 1 February 2012]View ArticleGoogle Scholar
- Allin KH, Nordestgaard BG, Zacho J, Tybjaerg-Hansen A, Bojesen SE: C-reactive protein and the risk of cancer: a mendelian randomization study. J Natl Cancer Inst. 2010, 102: 202-06. 10.1093/jnci/djp459View ArticlePubMedGoogle Scholar
- Marott SC, Nordestgaard BG, Zacho J, Friberg J, Jensen GB, Tybjaerg-Hansen A, Benn M: Does elevated C-reactive protein increase atrial fibrillation risk? A Mendelian randomization of 47, 000 individuals from the general population. J Am Coll Cardiol. 2010, 56: 789-95. 10.1016/j.jacc.2010.02.066View ArticlePubMedGoogle Scholar
- Angrist J, Evans W: Children and their parents' labor supply: Evidence from exogenous variation in family size. American Economic Review. 1998, 88: 450-77.Google Scholar
- Clancy L, Goodman P, Sinclair H, Dockery DW: Effect of air-pollution control on death rates in Dublin, Ireland: an intervention study. Lancet. 2002, 360: 1210-14. 10.1016/S0140-6736(02)11281-5View ArticlePubMedGoogle Scholar
- Wilkinson R, Pickett K: The Spirit Level: why equality is better for everyone London: Penguin; 2010.Google Scholar
- Hill AB: The environment and disease: association or causation?. Proc Royal Soc Med. 1965, 58: 295-300.Google Scholar
- Joffe M, Mindell J: A framework for the evidence base to support Health Impact Assessment. J Epidemiol Community Health. 2002, 56: 132-38. 10.1136/jech.56.2.132PubMed CentralView ArticlePubMedGoogle Scholar
- Adams J: Risk London: UCL Press; 1995.Google Scholar
- Galea S, Riddle M, Kaplan GA: Causal thinking and complex system approaches in epidemiology. Int J Epidemiol. 2010, 39: 97-106. 10.1093/ije/dyp296PubMed CentralView ArticlePubMedGoogle Scholar
- Vensim. Ventana Systems, Inc. [http://www.vensim.com/], [accessed 1February 2012].
- Forrester JW: Counterintuitive behaviour of social systems. In Collected papers of Jay W Forrester: collectio. Volume 1970. Cambridge, MA: Wright-Allen Press; 1975:211-44.Google Scholar
- Lane DC: The power of the bond between cause and effect. System Dynamics Review. 2007, 23: 95-118. 10.1002/sdr.370. 10.1002/sdr.370View ArticleGoogle Scholar
- Joffe M: Health, livelihoods, and nutrition in low-income rural systems. Food Nut Bull. 2007, 28 (suppl.): S227-36.Google Scholar
- Elgar G, Vavouri T: Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends Genet. 2008, 24: 344-52. 10.1016/j.tig.2008.04.005View ArticlePubMedGoogle Scholar
- di Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EI, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ: Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nature Biotechnology. 2005, 23: 377-83. 10.1038/nbt1075View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.