# The role of causal reasoning in understanding Simpson's paradox, Lord's paradox, and the suppression effect: covariate selection in the analysis of observational studies

- Onyebuchi A Arah
^{1, 2}Email author

**5**:5

https://doi.org/10.1186/1742-7622-5-5

© Arah; licensee BioMed Central Ltd. 2008

**Received: **06 February 2008

**Accepted: **26 February 2008

**Published: **26 February 2008

## Abstract

Tu *et al* present an analysis of the equivalence of three paradoxes, namely, Simpson's, Lord's, and the suppression phenomena. They conclude that all three simply reiterate the occurrence of a change in the association of any two variables when a third variable is statistically controlled for. This is not surprising because reversal or change in magnitude is common in conditional analysis. At the heart of the phenomenon of change in magnitude, with or without reversal of effect estimate, is the question of which to use: the unadjusted (combined table) or adjusted (sub-table) estimate. Hence, Simpson's paradox and related phenomena are a problem of covariate selection and adjustment (when to adjust or not) in the causal analysis of non-experimental data. It cannot be overemphasized that although these paradoxes reveal the perils of using statistical criteria to guide causal analysis, they hold neither the explanations of the phenomenon they depict nor the pointers on how to avoid them. The explanations and solutions lie in causal reasoning which relies on background knowledge, not statistical criteria.

## Commentary

Simpson's paradox, Lord's paradox, and the suppression effect are examples of the perils of the statistical interpretation of a real but complex world. By rearing their heads intermittently in the literature they remind us about the inadequacy of statistical criteria for causal analysis. Those who believe in letting the data speak for themselves are in for a disappointment.

Tu *et al* present an analysis of the equivalence of three paradoxes, concluding that all three simply reiterate the unsurprising change in the association of any two variables when a third variable is statistically controlled for [1]. I call this unsurprising because reversal or change in magnitude is common in conditional analysis. To avoid either, we must avoid conditional analysis altogether. What is it about Simpson's and Lord's paradoxes or the suppression effect, beyond their pointing out the obvious, that attracts the intermittent and sometimes alarmist interests seen in the literature? Why are they paradoxes? A paradox is a seemingly absurd or self-contradictory statement or proposition that may in fact be true [2]. What is so self-contradictory about the Simpson's, Lord's, and suppression phenomena that may turn out to be true? After reading the paper by Tu *et al* one still gets the uneasy feeling that the paradoxes are anything but surprising, that the statistical phenomenon they purport to represent are in fact causal in nature, requiring a causal language not a statistical one, and that the problem can be resolved only with causal reasoning. So, why bother with the statistics of these paradoxes, much less their equivalence, in the first instance if both the correct language and resolution lie elsewhere? Although we are given a glimpse of the appropriate tools (such as the implied causal calculus of directed acyclic graphs [3–6]), we must look beyond the authors' paper for satisfactory answers.

Now, suppose there are no other unmeasured covariates given the DAGs in Figures 1 to 7. If Figure 1 is the true state of affairs, to estimate the *total* effect of BW on BP, the unadjusted analysis will suffice. If, however, Figure 2 or 3 applies, then the adjusted (that is conditional on CW) is needed to estimate the total effect of BW on BP. This is because conditioning on CW will block the back-door BW to BP: BW←CW→BP in Figure 2 or BW←U→CW→BP in Figure 3. The reader could by now have doubts about the correctness of Figure 2 where a later observation CW is a confounder of the effect of BW on BP since one could argue that, by occurring after BW, CW could not be seen as a common cause of both BW and BP. (See Hernán *et al* [8] for an accessible defence of the structural approach to confounding and selection bias using DAGs.) Nonetheless, while temporality *seemingly* excludes CW as a confounder in Figure 2, it does not exclude CW from ever being part of a confounding path as seen in Figure 3. Both BW and CW are more likely to be the result of a common cause (U), possibly genetic. Based on background knowledge and common sense, Figure 3 is more plausible than 2. Therefore, temporality cannot be used to judge whether a variable is a confounder, part of a sufficient subset of covariates needed to block a backdoor, or not [5].

Figure 4 presents a scenario where the unadjusted effect of BW on BP is the correct estimate since CW is a collider (that is, without conditioning, it already acts as a blocker) in the DAG depicting two unobserved common causes of BW and CW and of CW and BP. This scenario is closely related to that in Figure 5 where BW has no effect on, but shares an unobserved common cause (U_{3}) with, BP. In all scenarios, our choice of which (unadjusted or adjusted) estimate to use is not based on the magnitude or direction of the estimate but on the governing causal relations. Put this way, Simpson's paradox becomes a problem of covariate adjustment (when to adjust or not) in the causal analysis of non-experimental or observational data. The paradox arises due to the causal interpretation of the observation that the proportion of a given level of BW is evidence for making an educated guess of the proportion of a given BP level in an observed sample *if* the status of the third related covariate CW is unknown [5]. What we really want to answer is "Does BW cause BP?", *not* "Does observing BW allow us to predict BP?".

*et al*'s paper would then go as follows: an action

*do*{BW} which decreases the probability of the event BP in each CW subpopulation must also decrease the probability of BP in the whole population, provided that the action

*do*{BW} does not change the distribution of the CW subpopulations. See Pearl [5] for a formal proof, although the sure thing principle follows naturally from the semantics of actions as modifiers of mechanisms, as embodied by the

*do*(·) operator. What is numerically observed in Simpson's paradox, however, is

which goes against our causal intuition or inclination to think "causes". If the DAG represented in Figure3 – or, for the sake of argument, Figure2 – applies, then we must consult the conditional analysis represented by inequalities 1 and 2, not the observed unconditional analysis in inequality 4. In this context, inequality 4 can only be seen as an evidence of BP that BW provides in the absence of information on CW, not as a statement of the causal effect of BW on BP which is what inequality 3 captures [5]. That is, Simpson's paradox arises because once CW is unknown to us, and we observe, for instance, that the proportion of {BW=high} is higher than that of {BW=low}, we have evidence for predicting (as in inequality 4) that the observable proportion of {BP=high} would also be higher than that of {BP=low} in the non-experimental data, but we cannot take this observation to imply that {BW=high} causes {BP=high} which goes against our causal knowledge that*doing*{BW=low} causes {BP=high} as depicted in inequality 3. Hence, prediction does not imply aetiology. The former tends to deal with usually transitory proportions whereas the latter deals with invariant causal relations.

A further illustration of the futility of the continued statistical discussion of the paradoxes is captured in the discussion of the suppression effect: how an unrelated covariate (CW) "increases the overall model fit ...assessed by*R*^{2}..." [1]. Tu*et al* should not be surprised that suppression is little known in epidemiology because epidemiologists do not and should not use the squared multiple-correlation-coefficient*R*^{2} as a measure of goodness-of-fit. As Tu*et al* algebraically admit,*R*^{2} is only an indication of the proportion of the variance in BP or outcome that is attributable to the variation in the fitted mean of BP [9]. It is known that the expected value of*R*^{2} can increase as more and even unrelated variables are added to the model thus making it a useless criterion for guiding covariate selection [10].

Furthermore, Tu*et al* make a passing mention of direct effect versus indirect effect (as might be the case in the consideration of adjustment in Figure1). This is, of course, beyond the scope of their paper and, therefore, my commentary. I refer the curious reader to the important work on the complex issues involved in the estimation of direct effect [3,5,11–14]. Suffice it to say that, in common situations where total effect estimation is possible, direct effect may be unidentifiable. For instance, although all effects of BW on BP can still be consistently estimated even in a scenario where there is an additional unobserved common cause (U) of CW and BP as in Figure6 (modified from Figure2), the direct effect of BW on BP cannot be identified without measuring U in Figure7 which is a similar modification of Figure1. Like Pearl [5] and Holland and Rubin [15], I take these paradoxes to be related to causal concepts which are, thus, best understood in the context of causal analysis.

In conclusion, it cannot be overemphasized that although Simpson's and related paradoxes reveal the perils of using statistical criteria to guide causal analysis, they hold neither the explanations of the phenomenon they purport to depict nor the pointers on how to avoid them. The explanations and solutions lie in causal reasoning which relies on background knowledge, not statistical criteria. It is high time we stopped treating misinterpreted signs and symptoms ('paradoxes'), and got on with the business of handling the disease ('causality'). We should rightly turn our attention to the perennial problem of covariate selection for causal analysis using non-experimental data.

## Declarations

### Acknowledgements

This work was supported by a Rubicon fellowship (grant number 825.06.026) awarded by the Board of the Council for Earth and Life Sciences (ALW), at the Netherlands Organisation for Scientific Research (NWO). The author thanks Timothy Hallett, and ETE's editorial board and associate editors for their insightful comments. This paper represents author's own opinions, but not those of ETE or other relevant affiliations.

## Authors’ Affiliations

## References

- Tu Y-K, Gunnell DJ, Gilthorpe MS: Simpson's paradox, Lord's paradox, and suppression effects are the same phenomenon – the reversal paradox. Emerg Themes Epidemiol. 2008, 5: 2. 10.1186/1742-7622-5-2PubMed CentralView ArticlePubMedGoogle Scholar
- Oxford University: Oxford Dictionary, Thesaurus, and Wordpower Guide. Oxford: Oxford University Press; 2001.Google Scholar
- Pearl J: Causal diagrams for empirical research. Biometrika. 1995, 82: 669-710. 10.1093/biomet/82.4.669. 10.1093/biomet/82.4.669View ArticleGoogle Scholar
- Greenland S, Pearl J, Robins JM: Causal diagrams for epidemiologic research. Epidemiol. 1999, 10 (1): 37-48. 10.1097/00001648-199901000-00008View ArticleGoogle Scholar
- Pearl J: Causality. Models, Reasoning and Inference. Cambridge: Cambridge University Press; 2000.Google Scholar
- Pearl J: Causal inference in health sciences. Health Serv Outcomes Res Methodol. 2001, 2: 189-220. 10.1023/A:1020315127304. 10.1023/A:1020315127304View ArticleGoogle Scholar
- Robins JM: Data, design, and background knowledge in etiologic inference. Epidemiol. 2001, 12 (3): 313-320. 10.1097/00001648-200105000-00011View ArticleGoogle Scholar
- Hernan MA, Hernandez-Diaz S, Robins JM: A structural approach to selection bias. Epidemiol. 2004, 15 (5): 615-625. 10.1097/01.ede.0000135174.63482.43View ArticleGoogle Scholar
- Rothman KJ, Greenland S, Lash TL, : Modern Epidemiology. Philadelphia: Lippincott, 3; 2008.Google Scholar
- Altman DG: Practical Statistics for Medical Research. Boca Raton, FL: Chapman & Hall, 1991.Google Scholar
- Robins JM, Greenland S: Identifiability and exchangeability for direct and indirect effects. Epidemiol. 1992, 3 (2): 143-155. 10.1097/00001648-199203000-00013View ArticleGoogle Scholar
- Pearl J: Direct and indirect effects. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann, 2001, 411-420.Google Scholar
- Cole SR, Hernan MA: Fallibility in estimating direct effects. Int J Epidemiol. 2002, 31: 163-165. 10.1093/ije/31.1.163View ArticlePubMedGoogle Scholar
- Petersen ML, Sinisi SE, van der Laan MJ: Estimation of direct causal effects. Epidemiol. 2006, 17: 276-284. 10.1097/01.ede.0000208475.99429.2d. 10.1097/01.ede.0000208475.99429.2dView ArticleGoogle Scholar
- Holland PW, Rubin DB: On Lord's paradox. Principles of Modern Psychological Measurement. Edited by: Wainer H, Messick S. Hillsdale, NJ: Lawrence Erlbaum Associates, 1982, 3-25.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.