Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon – the reversal paradox
© Tu et al; licensee BioMed Central Ltd. 2008
Received: 02 April 2007
Accepted: 22 January 2008
Published: 22 January 2008
This article discusses three statistical paradoxes that pervade epidemiological research: Simpson's paradox, Lord's paradox, and suppression. These paradoxes have important implications for the interpretation of evidence from observational studies. This article uses hypothetical scenarios to illustrate how the three paradoxes are different manifestations of one phenomenon – the reversal paradox – depending on whether the outcome and explanatory variables are categorical, continuous or a combination of both; this renders the issues and remedies for any one to be similar for all three. Although the three statistical paradoxes occur in different types of variables, they share the same characteristic: the association between two variables can be reversed, diminished, or enhanced when another variable is statistically controlled for. Understanding the concepts and theory behind these paradoxes provides insights into some controversial or contradictory research findings. These paradoxes show that prior knowledge and underlying causal theory play an important role in the statistical modelling of epidemiological data, where incorrect use of statistical models might produce consistent, replicable, yet erroneous results.
This article discusses three statistical paradoxes that pervade epidemiological research: Simpson's paradox, Lord's paradox, and suppression. These paradoxes are not just tantalising puzzles of purely academic interest; potentially, they have serious implications for the interpretation of evidence from observational studies. Scenarios which are associated with and can be explained by these paradoxes are discussed. A concise explanation of these paradoxes and an historical overview is also provided. Simulated data based upon the foetal origins of adult diseases hypothesis [1, 2] are used to illustrate how the three paradoxes are different manifestations of one phenomenon – the reversal paradox – depending on whether the outcome and explanatory variables are categorical, continuous or a combination of both; this renders the issues and remedies for any one to be similar for all three. All statistical analyses were performed within SPSS 15.0 (SPSS Inc, Chicago, USA).
Foetal origins hypothesis
The 'foetal origins of adult disease' hypothesis (FOAD), which has evolved into the 'developmental origins of health and disease' (DOHaD) hypothesis [1, 2], was proposed to explain the associations observed between low birth weight and a range of diseases in later life. These associations have been interpreted as evidence that growth retardation in utero has adverse long-term effects on the development of vital organ systems which predispose the individuals to a range of metabolic and related disorders in later life. Nevertheless, although an inverse association between birth weight and disease in later life was found in some studies, this relationship was only established in many studies after the current body size variables such as body mass index (BMI), body weight and/or body height were adjusted for in the regression analysis. As body sizes may be in the causal pathway from birth weight to health outcomes in later life, the justification of this adjustment of current body sizes has been questioned recently [3–8].
Figure 1a,c and 1d all explain the observed correlation structure amongst birth weight, current body weight and blood pressure equally well, and it is not possible to judge which one is true based upon the observed data. For example, researchers may argue current body weight is a genuine confounder in Figure 1d and therefore should be adjusted for . This can only be confirmed when the unobserved confounder (be parental, genetic, or environmental factors) is identified and the conditional independence between birth weight and current weight is satisfied.
Nevertheless, the adjustment of current body weight in the statistical analysis will change the estimated relationship between birth weight and blood pressure, as the adjusted relationship is a conditional relationship. Differences between the unadjusted and adjusted (i.e. unconditional and conditional) relationships frequently cause confusion in the interpretations of statistical analyses and they also give rise to three statistical paradoxes, which we shall explain in the next section.
Simpson's paradox , or Yule's paradox , is a well known statistical phenomenon. It is observed when the relationship between two categorical variables is reversed after a third variable is introduced to the analysis of their association, or alternatively where the relationship between two variables differs within subgroups compared to that observed for the aggregated data. Although first discussed by Karl Pearson in 1899 , it is George Udny Yule, once Pearson's assistant, who provides a detailed assessment of this problem in 1903 .
A numerical example
Summary of the analysis of simulated systolic blood pressure, birth weight and current body weight data for 1000 adult males
Current weight (kg)
Systolic BP (mmHg)
Birth weight (kg)
Numbers and Percentages of subjects with high blood pressure (> 135 mmHg) according to their birth weight and current body weight
Percentage of subjects with high BP
Low birth weight
High birth weight
Current weight < = 90 Kg
Low birth weight
High birth weight
Current weight > 90 Kg
Low birth weight
High birth weight
In statistical language, adjustment for current body weight represents a conditional relationship; the relationship between birth weight and blood pressure is conditional on current body weight. Although there are substantial differences in the numbers of subjects with low birth weight between the two subgroups of current weight, the adjustment for CWb indicates that if all subjects had the same level of current body weight, subjects with low birth weight would have a greater risk of developing high blood pressure, i.e. the adjustment of CWb erases the greater influence of subjects with low birth weight and lower current weight on the association between birth weight and blood pressure, as people born smaller in general grow into a smaller adults.
Simpson's paradox has broad implications for epidemiological research since it indicates that making causal inference from any non-randomised study (e.g. cohort studies, case-control studies) can be difficult, because, whilst it is possible to control for the differences between cases and controls, there will always be the possibility that an unobserved and therefore unadjusted confounder might attenuate the association (or even reverse its direction) between exposure and outcome, due to the difference in the mean values or the distribution of confounders between the case or control group. Nevertheless, whether or not there is any unobserved (and therefore unadjusted) confounder may not always be an issue of debate, because in most epidemiologic studies, the important confounders are generally known. The controversy in making causal inference arises in situations where the adjusted variable may not be a genuine confounder [6, 7, 19, 20]. Within epidemiology, Simpson's paradox is closely linked to the concepts of confounding  and incollapsibility .
Lord's paradox was named after two short articles in the psychology literature by Frederick M Lord regarding the use of analysis of covariance (ANCOVA) within non-experimental studies [21, 22]. In contrast to Simpson's paradox, little discussion of Lord's paradox can be found in the statistical and epidemiological literature , though social scientists have shown a great interest in this phenomenon [24–28]. Lord's paradox refers to the relationship between a continuous outcome and a categorical exposure being reversed when an additional continuous covariate is introduced to the analysis. One specific example is that the additional covariate is a measure made at baseline within a longitudinal study, where the outcome is the same variable measured some time later (e.g. following an intervention). Therefore, the aim is to measure change in the outcome by adjusting for the baseline measurements, and the categorical covariate might be the exposure/control groups – this is the familiar design for ANCOVA. This controversy was first discussed in 1910 between Karl Pearson and Arthur C Pigou when they debated the role of parental alcoholism and its impact on the performance of children .
A numerical example
Considering the previous numerical example for Simpson's paradox, we examine current body weight (CW) and blood pressure (BP) as continuous variables, retaining birth weight as a binary (BWb). The two-sample t-test shows that, on average, the blood pressure of subjects with higher birth weight is 2.49 mmHg (95%CI: 1.12, 3.87) greater than those with lower birth weight. However, using ANCOVA (i.e. linear regression with a (categorical) group-allocating variable and with the adjustment of a continuous confounding variable), adjusting for current weight as a covariate, the blood pressure of subjects with higher birth weight becomes 2.94 mmHg (95%CI: 1.12, 3.87) lower than those with lower birth weight.
Differences in the results of the two analyses are due to adjustment in the second analysis for current body weight (CW). As current weight is positively associated with both BP and BWb, it is expected that the relation between BP and BWb will change when current weight is adjusted for. In randomised controlled trials, mean values of the adjusted baseline covariate are expected to be approximately equal across treatment and control groups since, assuming randomisation has been achieved, baseline variation should be within groups rather than between groups), i.e. there is no correlation between the group variable and adjusted covariate (i.e. in our numerical example, no correlation between BWb and current weight). In such circumstances it is well known that using ANCOVA achieves the same estimated treatment difference across groups as found by the t-test, though the former will generally have greater power [30, 31]. Recall our previous discussion of two scenarios in the section on Simpson's paradox, where the adjustment for CWb will not change the relationship between BWb and BP. Randomised controlled trials may thus be seen as a special case of scenario (a) where there is no difference in the mean current weight between the two sub-groups of birth weight.
In statistical language, results from the regression analyses are conditional on both birth weight groups having equal mean current weight in later life, and if true there would be a benefit from low birth weight in terms of blood pressure. However, since the two groups have a different mean current weight in later life, results from the regression analysis need to be interpreted with caution. In Simpson's paradox, the discussion surrounds the differences in results between unconditional and conditional risk/probability, and in Lord's paradox, discussion is around the differences in results between unconditional and conditional means.
Of the three paradoxes, suppression effects within multiple regression are probably the least recognised amongst clinical and epidemiological researchers, though the suppression phenomenon has been extensively discussed by statisticians [32–34] and methodologists from the social sciences [35, 36]. The classical definition of suppression is that a potential covariate that is unrelated to the outcome variable (i.e. has a bivariate correlation of zero) increases the overall model fit within regression (as assessed by R2, for instance) when this covariate is added to the model. This seems counter-intuitive and needs some explanation.
Since will always be smaller than 1, will always be greater than . By including x1 in the regression model, more variance of y is 'explained', i.e. the predictability of the model increases. However, this seems counterintuitive, since the zero bivariate correlation between y and x1 (ry1= 0) indicates that no more variance in y can be explained by x1. So where does the additional 'explained variance' in y come from when x1 is entered in the regression model? The answer is that the additional explained variance in y comes from x2.
Although x1 is not correlated with y, it is positively correlated with x2, which in turn is positively correlated with y. When x1 is entered in the model, it 'suppresses' the part of x2 that is uncorrelated with y, thereby increasing overall predictability. In other words, the role of x1 in the model is to suppress (reduce) the noise (the uncorrelated component of x2) within the correlation between y and x2, as though any uncertainty in x2 'predicting' y is 'explained' by x1.
It is not only R2 that is increased; the coefficient for x2, , becomes greater than ry 2. Furthermore, although ry 1is equal to zero, β1 is not zero and becomes negative: . In general, the greater the positive correlation between x1 and x2, the greater the absolute value of β1 and β2. However, having ry 1equal zero (or being negative) is not necessary to observe suppression; ry 1may be positive and x1 may still be a suppressor .
It was Paul Horst, in 1941, who first explored this curious phenomenon within educational research , and in the last few decades, many statisticians have been interested in this topic [33–35]. There are still very few discussions within the clinical and epidemiological literature regarding the impact of suppression (i.e. the impact on the changes in the regression coefficients and R2) on the interpretation of non-randomised studies whilst making statistical adjustment for covariates within regression [12, 43].
A numerical example
Simple and multiple regression models for simulated hypothetical data on birth weight (BW), blood pressure (BP), and current body weight (CW); the dependent variable in all three models is BP.
Regression Coefficients (Standard Errors)
(< 0.001) †
It is noteworthy that not only the association of birth weight with BP is reversed (coefficients change from 1.861 to -3.708 mmHg/Kg), but that the impact of current weight also increases from 0.382 to 0.465 mmHg/Kg. The R2 for multiple regression is 0.283, which is greater than the sum of the squared correlations for birth weight ((0.105)2 = 0.011) and current weight ((0.501)2 = 0.251), i.e. 0.262. Therefore, the explained variance of BP is greater than the sum of the explained variances for the two simple regression models.
In the hypothetical foetal origins example, the strength of association between BP and birth weight differs considerably between simple regression and multiple regression. Which model genuinely reflects their true causal relationship depends on whether or not current weight should be adjusted for; whether or not current weight is a confounder for the relationship between BP and birth weight, which depends upon biological and clinical knowledge, not ad hoc statistical analyses and changes in the estimated effects . The question is whether or not it is also biologically and clinically feasible to isolate the independent effect of birth weight on BP by removing the impact of current weight on BP[3, 5–7, 44]. In other words, changes in the regression coefficient for birth weight caused by current weight being adjusted for in multiple regression is irrelevant to whether or not current weight is viewed to be a confounder. The definition of confounding depends upon the a priori causal model assumed by the investigator [8, 11], which then dictates which statistical model is adopted.
In statistical language, results from adjustment for current weight are conditional on all babies growing to the same size in adulthood. In Simpson's paradox, the 'paradox' is due to differences in the results between unconditional and conditional risk/probability, and in Lord's paradox, it is due to differences in the results between unconditional and conditional means. In suppression, the paradox is due to differences in the results between the marginal (i.e. unconditional) BP- birth weight relation and the BP- birth weight relation conditional on current weight.
Comparison of Simpson's paradox, Lord's Paradox, and suppression
Type of Reversal Paradox
Outcome (illustrated example)
Exposure (illustrated example)
Covariate/'Confounder' (illustrated example)
Categorical (birth weight: high vs. low)
Categorical (current weight: high vs. low)
Continuous (blood pressure)
Categorical (birth weight: high vs. low)
Continuous (current weight)
Continuous (blood pressure)
Continuous (birth weight)
Continuous (current weight)
In non-randomised studies, the reversal paradox can often occur due to 'controlling' for what is typically termed a confounder, even though a clear definition of what is meant by 'confounder' is rarely provided (contingent on understanding its role in the biological/clinical process being modelled). Differences in the strength or even direction of any association between outcome and exposure might give rise to contradictory interpretations regarding potential causal relationships. Furthermore, it is very difficult, if not impossible, to compare results across studies where many varied attempts are made to control for different confounders, especially in the absence of any consistent reasoning given for the choice of confounders. In some situations, statistical adjustment might introduce bias rather than eliminate it .
It might be suggested that the adjustment of current weight in our foetal origins example can be viewed as estimations of direct and indirect effects, such as those in path analysis or structural equation modelling. Recall Figure 1a, the path from birth weight to BP is to estimate the direct effect of birth weight → BP, and then the path from birth weight → current weight → BP is to estimate the indirect effect. For instance, in the model 3 of Table 3, the regression coefficient for birth weight, -3.708, is the direct effect, and the indirect effect is derived from 0.465 (the regression coefficient for current weight in model 3) multiplied by 11.976 (the simple regression coefficient for birth weight when current weight is regressed on birth weight) = 5.569. The total effect is therefore -3.708 + 5.569 = 1.861, which is the simple regression coefficient for birth weight in the model 1 of Table 3. Our reservation with interpreting the results from model 3 as the partition of the total effect into direct and indirect effect is that many variables, such as current height and current BMI, can be put in between birth weight and BP, and it can be claimed that there is more than one indirect effect. Furthermore, any body size measured after birth, for example, body weight at year one, year two etc, can be adjusted for in the model and presumably used to estimate the indirect effects and direct effect. Whilst the total effect of birth weight on BP is not affected by the numbers of intermediate body size variables in the model, the estimation of 'direct' effect differs when different intermediate variables are adjusted for. Unless there is experimental evidence to support the notion that there are indeed different paths of direct and indirect effects from birth weight to BP, we are cautious of using such terminology to label the results from multiple regression, as with model 3. In other words, to determine whether the unconditional or conditional relationship reflects the true physiological relationship between birth weight and blood pressure, experiments in which birth weight and current weight can be manipulated are required in order to estimate the impact of birth weight on blood pressure.
Although the three statistical paradoxes occur in different types of variables, they share the same characteristic: the association between two variables can be reversed, diminished, or enhanced when another variable is statistically controlled for. Understanding the concepts and theory behind these paradoxes will provide insights into some of the controversial or contradictory results from previous research. Prior knowledge and theory play an important role in the statistical modelling of non-randomised data. Incorrect use of statistical models might produce consistent, replicable, yet erroneous results.
We are very grateful for the constructive comments of two reviewers. One reviewer brought to our attention of the excellent paper by Cox and Wermuth . YKT conceived the ideas of this study and wrote the first draft. DG and MSG contributed to the discussion of these ideas and writing of the final draft.
- Barker DJ: Fetal origins of coronary heart disease. BMJ. 1995, 311:171-4.PubMed CentralView ArticlePubMed
- Barker DJ, Eriksson JG, Forsen T, Osmond C: Fetal origins of adult disease: strength of effects and biological basis. Int J Epidemiol. 2002, 31:1235-9. 10.1093/ije/31.6.1235View ArticlePubMed
- Paneth N, Ahmed F, Stein AD: Early nutritional origins of hypertension: a hypothesis still lacking support. Journal of Hypertensio. 1996, 14 (5): S121-S129.
- Lucas A, Fewtrell MS, Cole TJ: Fetal origins of adult disease-the hypothesis revisited. BMJ. 1999, 319: 245-9.PubMed CentralView ArticlePubMed
- Huxley RR, Neil A, Collins R: Unravelling the fetal origins hypothesis: is there really an inverse association between birth weight and subsequent blood pressure?. Lancet. 2002, 360: 659-65. 10.1016/S0140-6736(02)09834-3View ArticlePubMed
- Tu YK, West R, Ellison GTH, Gilthorpe MS: Why evidence for the fetal origins of adult disease might be a statistical artifact: the "reversal paradox" for the relation between birth weight and blood pressure in later life. Am J Epidemiol. 2005, 161: 27-32. 10.1093/aje/kwi002View ArticlePubMed
- Weinberg CR: Invited commentary: Barker meets Simpson. Am J Epidemiol. 2005, 161: 33-5. 10.1093/aje/kwi003View ArticlePubMed
- De Stavola BL, Nitsch D, dos Santos Silva I, McCormack V, Hardy R, Mann V, Cole TJ, Morton S, Leon DA: Statistical issues in life course epidemiology. Am J Epidemiol. 2006, 163: 84-96. 10.1093/aje/kwj003View ArticlePubMed
- Pearl J: Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press; 2000.
- Greenland S, Robins JM, Pearl J: Confounding and collapsibility in causal inference. Stat Sci. 1999, 14: 29-46. 10.1214/ss/1009211805View Article
- Jewell NP: Statistics for Epidemiology. London: Chapman & Hall; 2004.
- Cox DR, Wermuth N: A general condition for avoiding effect reversal after marginalisation. J R Statist Soc B. 2003, 65: 937-941. 10.1111/1467-9868.00424View Article
- Simpson EH: The interpretation of interaction in contingency tables. J R Stat Soc Ser B. 1951, 13: 238-41.
- Yule GU: Notes on the theory of association of attributes in statistics. Biometrika. 1903, 2: 121-34. 10.1093/biomet/2.2.121View Article
- Pearson K, Lee A, Bramley-Moore L: Mathematical contributions to the theory of evolution: VI – Genetic (reproductive) selection: Inheritance of fertility in man, and of fecundity in thoroughbred racehorses. Philos Trans R Soc Lond A. 1899, 192: 257-330. 10.1098/rsta.1899.0006View Article
- Hennessy E, Alberman E: Intergenerational influences affecting birth outcome. II. Preterm delivery and gestational age in the children of the 1958 British birth cohort. Paediatr Perinat Epidemiol. 1998, 12 (1): 61-75. 10.1046/j.1365-3016.1998.0120s1061.xView ArticlePubMed
- Website of the Department of Health, United Kingdom. http://www.doh.gov.uk
- Paik M: A graphical representation of a three-way contingency table: Simpson's paradox and correlation. Am Stat. 1985, 39: 53-54. 10.2307/2683907. 10.2307/2683907
- Hernandez-Diaz S, Schisterman EF, Hernan M: The "birth weight" paradox uncovered?. Am J Epidemiol. 2006, 164: 1115-1120. 10.1093/aje/kwj275View ArticlePubMed
- Wilcox A: Invited Commentary: The perils of birth weight – a lesson from directed acyclic graphs. Am J Epidemiol. 2006, 164: 1121-1123. 10.1093/aje/kwj276View ArticlePubMed
- Lord FM: A paradox in the interpretation of group comparisons. Psychol Bull. 1967, 68: 304-5. 10.1037/h0025105View ArticlePubMed
- Lord FM: Statistical adjustments when comparing preexisting groups. Psychol Bull. 1969, 72: 337-8. 10.1037/h0028108. 10.1037/h0028108View Article
- Glymour MM, Weuve J, Berkman LF, Kawachi I, Robins JM: When is baseline adjustment useful in analysis of change? An example with education and cognitive change. Am J Epidemiol. 2005, 162: 267-278. 10.1093/aje/kwi187View ArticlePubMed
- Hand D: Deconstructuring statistical questions. J R Stat Soc Ser A Stat Soc. 1994, 157: 317-56. 10.2307/2983526. 10.2307/2983526View Article
- Campbell DT, Kenny DA: A primer on regression artefact. Guildford: The Guilford Press; 1999.
- Mohr LB: Regression artifacts and other customs of dubious desert. Eval Program Plann. 2000, 23: 397-409. 10.1016/S0149-7189(00)00029-X. 10.1016/S0149-7189(00)00029-XView Article
- Reichardt CS: Regression facts and artifacts. Eval Program Plann. 2000, 23: 411-4. 10.1016/S0149-7189(00)00030-6. 10.1016/S0149-7189(00)00030-6View Article
- Wainer H: Adjusting for differential base rates: Lord's paradox again. Psychol Bull. 1991, 109: 147-51. 10.1037/0033-2909.109.1.147View ArticlePubMed
- Stigler SM: Statistics on the Table. Cambridge, Massachusetts: Harvard University Press; 1999.
- Vickers AJ, Altman DG: Analysing controlled trials with baseline and follow up measurements. BMJ. 2001, 323: 1123-4. 10.1136/bmj.323.7321.1123PubMed CentralView ArticlePubMed
- Tu YK, Blance A, Clerehugh V, Gilthorpe MS: Statistical power for analyses of changes in randomized controlled trials. J Dent Res. 2005, 84: 283-287.View ArticlePubMed
- Lewis JW, Escobar LA: Suppression and enhancement in bivariate regression. Statistician. 1986, 35: 17-26. 10.2307/2988294. 10.2307/2988294View Article
- Bertrand PV, Holder RL: A quirk in multiple regression: the whole regression can be greater than the sum of its parts. Statistician. 1988, 37: 371-4. 10.2307/2348761. 10.2307/2348761View Article
- Sharpe NR, Roberts RA: The relationship among sums of squares, correlation coefficients and suppression. Am Stat. 1997, 51: 46-48. 10.2307/2684693. 10.2307/2684693
- Friedman L, Wall M: Graphical views of suppression and multicollinearity in multiple linear regression. Am Stat. 2005, 127-136.
- Cohen J, Cohen P: Applied multiple regression/correlation analysis for the behavioural sciences. London: LEA; 1983.
- Pedhazur EJ: Multiple regression in behavioral research: Explanation and prediction. Fort Worth: Harcourt; 1997.
- Stocks NP, Davey Smith G: Blood pressure and birth weight in the first year university student aged 18–25. Public Health. 1999, 113: 273-7. 10.1016/S0033-3506(99)00179-1View ArticlePubMed
- Williams S, Poulton R: Birth size, growth, and blood pressure between the ages of 7 and 26 years: failure to support the fetal origins hypothesis. Am J Epidemiol. 2002, 155: 849-52. 10.1093/aje/155.9.849View ArticlePubMed
- McNeill G, Tuya C, Campbell DM, Haggarty P, Smith WCS, Masson LF, Cumming A, Broom I, Haites N: Blood pressure in relation to birth weight in twins and singleton controls matched for gestational age. Am J Epidemiol. 2003, 158: 150-5. 10.1093/aje/kwg130View ArticlePubMed
- Tu YK, Gilthorpe MS, TH Ellison GTH: What is the effect of adjusting for more than one measure of current body size on the relation between birth weight and blood pressure?. J Hum Hypertens. 2006, 20: 646-657. 10.1038/sj.jhh.1002044View ArticlePubMed
- Horst P: The role of prediction variables which are independent of the criterion. The Prediction of Personal Adjustment. Edited by: Horst P. New York: Social Science Research Council; 1941, 431-6.
- MacKinnon DP, Krull JL, Lockwood CM: Equivalence of the mediation, confounding and suppression effect. Prev Sci. 2000, 1: 173-81. 10.1023/A:1026595011371PubMed CentralView ArticlePubMed
- Tu YK, Ellison GTH, Gilthorpe MS: Growth, current size and the role of the 'reversal paradox' in the foetal origins of adult disease: an illustration using vector geometry. Epidemiol Perspect Innov. 2006, 3: 9. 10.1186/1742-5573-3-9PubMed CentralView ArticlePubMed
- Von Elm E, Egger M: The scandal of poor epidemiological research. BMJ. 2004, 329: 868-9. 10.1136/bmj.329.7471.868PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.