A Primer on Causality in Data Science
Journal de la société française de statistique, Causality, Tome 161 (2020) no. 1, pp. 67-90.

Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the “Causal Roadmap” of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner.

Classification : 62-01, 62-07, 62A01, 62P10
Mots-clés : Causal inference, Directed acyclic graphs (DAGs), Observational studies, Structural causal models, Targeted learning, Targeted maximum likelihood estimation (TMLE)
Saddiki, Hachem 1 ; Balzer, Laura B. 1

1 Department of Biostatistics & Epidemiology, University of Massachusetts-Amherst, 715 North Pleasant St. Amherst, MA 01003-9304.
@article{JSFS_2020__161_1_67_0,
     author = {Saddiki, Hachem and Balzer, Laura B.},
     title = {A {Primer} on {Causality} in {Data} {Science}},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {67--90},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {161},
     number = {1},
     year = {2020},
     mrnumber = {4125249},
     zbl = {1445.62022},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2020__161_1_67_0/}
}
TY  - JOUR
AU  - Saddiki, Hachem
AU  - Balzer, Laura B.
TI  - A Primer on Causality in Data Science
JO  - Journal de la société française de statistique
PY  - 2020
SP  - 67
EP  - 90
VL  - 161
IS  - 1
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2020__161_1_67_0/
LA  - en
ID  - JSFS_2020__161_1_67_0
ER  - 
%0 Journal Article
%A Saddiki, Hachem
%A Balzer, Laura B.
%T A Primer on Causality in Data Science
%J Journal de la société française de statistique
%D 2020
%P 67-90
%V 161
%N 1
%I Société française de statistique
%U http://www.numdam.org/item/JSFS_2020__161_1_67_0/
%G en
%F JSFS_2020__161_1_67_0
Saddiki, Hachem; Balzer, Laura B. A Primer on Causality in Data Science. Journal de la société française de statistique, Causality, Tome 161 (2020) no. 1, pp. 67-90. http://www.numdam.org/item/JSFS_2020__161_1_67_0/

[1] Ahern, J. Start With the "C-Word," Follow the Roadmap for Causal Inference, American Journal of Public Health, Volume 108 (2018) no. 5, p. 621 | DOI

[2] Balzer, L.B. “All generalizations are dangerous, even this one.” - Alexandre Dumas [Commentary], Epidemiology, Volume 28 (2017) no. 4, pp. 562-566 | DOI

[3] Benkeser, D.; Carone, M.; van der Laan, M.J.; Gilbert, P.B. Doubly robust nonparametric inference on the average treatment effect, Biometrika, Volume 104 (2017) no. 4, pp. 863-880 | DOI | MR | Zbl

[4] Bodnar, L.M.; Davidian, M.; Siega-Riz, A.M.; Tsiatis, A.A. Marginal Structural Models for Analyzing Causal Effects of Time-dependent Treatments: An Application in Perinatal Epidemiology, American Journal of Epidemiology, Volume 159 (2004) no. 10, pp. 926-934 | DOI

[5] Bareinboim, E.; Pearl, J. A general algorithm for deciding transportability of experimental results, Journal of Causal Inference, Volume 1 (2013) no. 1, pp. 107-134 | DOI | MR

[6] Balzer, L.; Petersen, M.; van der Laan, M.J. Tutorial for Causal Inference, Handbook of Big Data (Buhlmann, P.; Drineas, P.; Kane, M.; van der Laan, M., eds.), Chapman & Hall/CRC, 2016 | DOI | MR

[7] Bang, H.; Robins, J.M. Doubly robust estimation in missing data and causal inference models, Biometrics, Volume 61 (2005), pp. 962-972 | DOI | MR | Zbl

[8] Breiman, L. Stacked regressions, Machine Learning, Volume 24 (1996), pp. 49-64 | DOI | Zbl

[9] Balzer, L.B.; Schwab, J.; van der Laan, M.J.; Petersen, M.L. Evaluation of Progress Towards the UNAIDS 90-90-90 HIV Care Cascade: A Description of Statistical Methods Used in an Interim Analysis of the Intervention Communities in the SEARCH Study (2017) no. 357 http://biostats.bepress.com/ucbbiostat/paper357/ (http://biostats.bepress.com/ucbbiostat/paper357/) ( Technical report )

[10] Buchanan, A.L.; Vermund, S.H.; Friedman, S.R.; Spiegelman, D. Assessing Individual and Disseminated Effects in Network-Randomized Studies, Am J Epidemiol, Volume 187 (2018) no. 11, pp. 2449-2459

[11] Balzer, L.B.; Zheng, W.; van der Laan, M.J.; Petersen, M.L.; the SEARCH Collaboration A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure, Stat Meth Med Res, Volume OnlineFirst (2018) | MR

[12] Cole, S.R.; Hernán, M.A. Constructing Inverse Probability Weights for Marginal Structural Models, American Journal of Epidemiology, Volume 168 (2008) no. 6, pp. 656-664 | DOI

[13] Cole, S.R.; Hudgens, M.G.; Tien, P.C.; Anastos, K.; Kingsley, L.; Chmiel, J.S.; Jacobson, L.P. Marginal structural models for case-cohort study designs to estimate the association of antiretroviral therapy initiation with incident AIDS or death, Am J Epidemiol, Volume 175 (2012) no. 5, pp. 381-390 | DOI

[14] Cain, L.E.; Robins, J.M.; Lanoy, E.; Logan, R.; Costagliola, D.; Hernán, M.A. When to start treatment? A systematic approach to the comparison of dynamic regimes using observational data, The International Journal of Biostatistics, Volume 6 (2010) no. 2, p. Article 18 | MR

[15] Cole, S.R.; Stuart, E.A. Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 Trial, American Journal of Epidemiology, Volume 172 (2010) no. 1, pp. 107-115 | DOI

[16] Dawid, A.P. Causal inference without counterfactuals, Journal of the American Statistical Association, Volume 95 (2000) no. 450, pp. 407-424 | DOI | MR | Zbl

[17] Descartes, R. Discours de la Méthode Pour bien conduire sa raison, et chercher la vérité dans les sciences, Leiden, Netherlands, 1637

[18] Decker, A.L.; Hubbard, A.; Crespi, C.M.; Seto, E.Y.W.; Wang, M.C. Semiparametric Estimation of the Impacts of Longitudinal Interventions on Adolescent Obesity using Targeted Maximum-Likelihood: Accessible Estimation with the ltmle Package, Journal of Causal Inference, Volume 2 (2014) no. 1, pp. 95-108 | DOI | MR

[19] Daniel, R.M.; Kenward, M.G.; Cousens, S.N.; De Stavola, B.L. Using causal diagrams to guide analysis in missing data problems, Stat Meth Med Res, Volume 21 (2012) no. 3, pp. 243-256 | DOI | MR | Zbl

[20] Danaei, G.; Pan, A.; Hu, F.B.; Hernán, M.A. Hypothetical midlife interventions in women and risk of type 2 diabetes, Epidemiol, Volume 24 (2013) no. 1, pp. 122-128 | DOI

[21] Duncan, O. Introduction to Structural Equation Models, Academic Press, New York, 1975 | MR

[22] Díaz, I.; van der Laan, M. Population Intervention Causal Effects Based on Stochastic Interventions, Biometrics, Volume 68 (2012) no. 2, pp. 541-549 | DOI | MR | Zbl

[23] Díaz, I.; van der Laan, M. Assessing the Causal Effect of Policies: An Example Using Stochastic Interventions, Int J Biostat, Volume 9 (2013) no. 2, pp. 161-174 | DOI | MR

[24] Díaz, I.; van der Laan, M. Sensitivity Analysis for Causal Inference Under Unmeasured Confounding and Measurement Error Problems, Int J Biostat, Volume 9 (2013), pp. 149-160 | DOI | MR

[25] Goldberger, A. Structural equation models in the social sciences, Econometrica: Journal of the Econometric Society, Volume 40 (1972), pp. 979-1001 | DOI | MR

[26] Gruber, S.; van der Laan, M.J. tmle: An R Package for Targeted Maximum Likelihood Estimation, Journal of Statistical Software, Volume 51 (2012) no. 13, pp. 1-35 | DOI

[27] Gruber, S.; van der Laan, M.J. Consistent causal effect estimation under dual misspecification and implications for confounder selection procedures, Stat Methods Med Res, Volume 24 (2015) no. 6, pp. 1003-1008 (PMID: 22368176) | DOI | MR

[28] Hernán, M.A.; Alonso, A.; Logan, R.; Grodstein, F.; Michels, K.B.; Willett, W.C.; Manson, J.E.; Robins, J.M. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease, Epidemiology, Volume 19 (2008), pp. 766-779 | DOI

[29] Hernán, M.A. Invited commentary: hypothetical interventions to define causal effects–afterthought or prerequisite?, Am J Epidemiol, Volume 162 (2005) no. 7, pp. 618-620 | DOI

[30] Hernán, M.A. The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data, American Journal of Public Health, Volume 108 (2018) no. 5, pp. 616-619 | DOI

[31] Hernán, M.A.; Hernández-Díaz, S.; Robins, J.M. A structural approach to selection bias, Epidemiology, Volume 15 (2004) no. 5, pp. 615-625 | DOI

[32] Hernán, M.A.; Hsu, J.; Healy, B. Data science is science’s second chance to get causal inference right: A classification of data science tasks (2018) (https://arxiv.org/abs/1804.10846) ( Technical report ) | arXiv

[33] Hong, J.L.; Jonsson Funk, M.; LoCasale, R.; Dempster, S.E.; Cole, S.R.; Webster-Clark, M.; Edwards, J.K.; Sturmer, T. Generalizing Randomized Clinical Trial Results: Implementation and Challenges Related to Missing Data in the Target Population, Am J Epidemiol, Volume 184 (2018) no. 4, p. 817-827z | DOI

[34] Hernán, M.A.; Lanoy, E.; Costagliola, D.; Robins, J.M. Comparison of dynamic treatment regimes via inverse probability weighting, Basic & Clinical Pharmacology & Toxicology, Volume 98 (2006) no. 3, pp. 237-242 | DOI

[35] Holland, P.W. Statistics and Causal Inference, Journal of the American Statistical Association, Volume 81 (1986) no. 396, pp. 945-960 | DOI | MR

[36] Hernán, M.A.; Robins, J.M. Comment on: Early versus deferred antiretroviral therapy for HIV on survival, New England Journal of Medicine, Volume 361 (2009) no. 8, pp. 823-824

[37] Hernán, M.A.; Robins, J.M. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available, American Journal of Epidemiology, Volume 183 (2016) no. 8, pp. 758-764 | DOI

[38] Halloran, M.E.; Struchiner, C.J. Study designs for dependent happenings, Epidemiology, Volume 2 (1991), pp. 331-338 | DOI

[39] Halloran, M.E.; Struchiner, C.J. Causal inference in infectious diseases, Epidemiology, Volume 6 (1995) no. 2, pp. 142-151 | DOI

[40] Hernández-Díaz, S.; Schisterman, E.F.; Hernán, M.A. The Birth Weight “Paradox” Uncovered?, Am J Epidemiol, Volume 164 (206) no. 11, pp. 1115-1120 | DOI

[41] Horvitz, D.G.; Thompson, D.J. A generalization of sampling without replacement from a finite universe, Journal of the American Statistical Association, Volume 47 (1952), pp. 663-685 | DOI | MR | Zbl

[42] Heckman, J.J.; Vytlacil, E.J. Econometric evaluation of social programs, part I: causal models, structural models and econometric policy evaluation, Handbook of Econometrics (2007), pp. 4779-4874 | DOI

[43] Hernán, M.A.; VanderWeele, T.J. Compound treatments and transportability of causal inference, Epidemiology, Volume 22 (2011), pp. 368-377 | DOI

[44] Imai, K.; Keele, L.; Yamamoto, T. Identification, inference, and sensitivity analysis for causal mediation effects, Statistical Science, Volume 25 (2010), pp. 51-71 | DOI | MR | Zbl

[45] Joint United Nations Programme on HIV/AIDS (UNAIDS) The gap report (2014)

[46] Kennedy, E.H. Semiparametric Theory (2017) (https://arxiv.org/abs/1709.06418v1) ( Technical report ) | arXiv

[47] Kitahata, M.M.; Gange, S.J.; Abraham, A.G.; Merriman, B.; Saag, M.S.; Justice, A.C. Effect of early versus deferred antiretroviral therapy for HIV on survival, New England Journal of Medicine, Volume 360 (2009) no. 18, pp. 1815-1826 | DOI

[48] Korb, K.; Hope, L.; Nicholson, A.; Axnick, K. Varieties of causal intervention, PRICAI 2004: Trends in Artificial Intelligence, volume 3157 of Lecture Notes in Computer Science (Zhang, C.W.; Guesgen, H.; Yeap, W.K., eds.), Springer, Heidelberg, Germany, 2004, pp. 322-331

[49] Kreif, N.; Tran, L.; Grieve, R.; De Stavola, B.; Tasker, R.C.; Petersen, M. Estimating the Comparative Effectiveness of Feeding Interventions in the Pediatric Intensive CareUnit: A Demonstration of Longitudinal Targeted Maximum Likelihood Estimation, American Journal of Epidemiology, Volume 186 (2017) no. 12, pp. 1370-1379 | DOI

[50] Lesko, C.R.; Buchanan, A.L.; Westreich, D.; Edwards, J.K.; Hudgens, M.G.; Cole, S.R. Generalizing study results: a potential outcomes perspective, Epidemiology, Volume 28 (2017) no. 4, pp. 553-561 | DOI

[51] Luque-Fernandez, M.A.; Belot, A.; Valeri, L.; Cerulli, G.; Maringe, C.; Rachet, B. Data-Adaptive Estimation for Double-Robust Methods in Population-Based Cancer Epidemiology: Risk Differences for Lung Cancer Mortality by Emergency Presentation, American Journal of Epidemiology, Volume 187 (2018) no. 4, pp. 871-878 | DOI

[52] Little, R.J.; Rubin, D.B. Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches, Annual Revue of Public Health, Volume 21 (2000), pp. 121-145 | DOI

[53] Lendle, S.D.; Schwab, J.; Petersen, M.L.; van der Laan, M.J. ltmle: An R Package Implementing Targeted Minimum Loss-based Estimation for Longitudinal Data, Journal of Statistical Software, Volume 81 (2017) no. 1, pp. 1-21 | DOI

[54] Morozova, O.; Cohen, T.; Crawford, F.W. Risk ratios for contagious outcomes, J. R. Soc. Interface, Volume 15 (2018) no. 20170696

[55] Marcus, G.; Davis, E. Eight (No, Nine!) Problems With Big Data, The New York Times (2014) http://www.nytimes.com/2014/04/07/opinion/eight-no-nine-problems-with-big-data.html

[56] Messer, L.C.; Oakes, J.M.; Mason, S. Effects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding, American Journal of Epidemiology, Volume 171 (2010), pp. 664-673 | DOI

[57] Mohan, K.; Pearl, J.; Tian, J. Graphical Models for Inference with Missing Data, Advances in Neural Information Processing Systems 26 (Burges, C. J. C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q., eds.), Curran Associates, Inc., 2013, pp. 1277-1285 http://papers.nips.cc/paper/4899-graphical-models-for-inference-with-missing-data.pdf

[58] Murphy, S.A. Optimal dynamic treatment regimes, J R Stat Soc Ser B, Volume 65 (2003) no. 2, pp. 331-355 | DOI | MR

[59] Naimi, A.I.; Balzer, L.B. Stacked generalization: An introduction to Super Learning, European Journal of Epidemiology (2018), pp. 459-464 | DOI

[60] Neyman, J. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes (In Polish). English translation by D.M. Dabrowska and T.P. Speed (1990), Statistical Science, Volume 5 (1923), pp. 465-480

[61] Naimi, A.I.; Schnitzer, M.E.; Moodie, E.E.; Bodnar, L.M. Mediation Analysis for Health Disparities Research, Am J Epidemiol2016, Volume 184 (2016) no. 4, pp. 315-324 | DOI

[62] Neugebauer, R.; van der Laan, M. J. Nonparametric causal effects based on marginal structural models, Journal of Statistical Planning and Inference, Volume 137 (2007) no. 2, pp. 419-434 | DOI | MR | Zbl

[63] Oakes, J.M. The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology (with discussion), Soc Sci Med, Volume 58 (2004) no. 10, pp. 1929-1952 (PMID: 15020009) | DOI

[64] Petersen, M.L.; Balzer, L.B. Introduction to Causal Inference. UC Berkeley (2014) (www.ucbbiostat.com/labs)

[65] Petersen, M.; Balzer, L.; Kwarsiima, D.; Sang, N. Association of implementation of a universal testing and treatment intervention with HIV diagnosis, receipt of antiretroviral therapy, and viral suppression among adults in East Africa, JAMA, Volume 317 (2017) no. 21, pp. 2196-2206 | DOI

[66] Pearl, J. Causality: Models, Reasoning and Inference, Cambridge University Press, New York, 2000 (Second ed., 2009) | MR

[67] Pearl, J. Direct and indirect effects, Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Francisco (2001), pp. 411-420

[68] Pearl, J. An Introduction to Causal Inference, The International Journal of Biostatistics, Volume 6 (2010) no. 2, p. Article 7 | DOI | MR

[69] Pearl, J. Generalizing Experimental Findings, Journal of Causal Inference, Volume 3 (2015) no. 2, pp. 259-266 | DOI | MR

[70] Pearl, J. The Seven Tools of Causal Inference with Reflections on Machine Learning (2018) no. R-481 ( Technical report )

[71] Pearl, J. Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Mateo, CA, 1988 | MR

[72] Pearl, J. Causal diagrams for empirical research, Biometrika, Volume 82 (1995), pp. 669-710 | DOI | MR | Zbl

[73] Petersen, M.L. Compound treatments, transportability, and the structural causal model: the power and simplicity of causal graphs, Epidemiology, Volume 22 (2011), pp. 378-381 | DOI

[74] Pearl, J.; Glymour, M.; Jewell, N.P. Causal inference in statistics: a primer, John Wiley and Sons Ltd, Chichester, West Sussex, UK, 2016 | MR

[75] Polley, Eric; LeDell, Erin; Kennedy, Chris; van der Laan, Mark SuperLearner: Super Learner Prediction (2018) https://CRAN.R-project.org/package=SuperLearner (R package version 2.0-24)

[76] Petersen, M.L.; LeDell, E.; Schwab, J.; Sarovar, V.; Gross, R.; Reynolds, N. Super Learner Analysis of Electronic Adherence Data Improves Viral Prediction and May Provide Strategies for Selective HIV RNA Monitoring, J Acquir Immune Defic Syndr, Volume 69 (2015) no. 1, pp. 109-118 | DOI

[77] Petersen, M.L.; Porter, K.E.; Gruber, S.; Wang, Y.; van der Laan, M.J. Diagnosing and responding to violations in the positivity assumption, Statistical Methods in Medical Research, Volume 21 (2012) no. 1, pp. 31-54 | DOI | MR

[78] Polley, E.C.; Rose, S.; van der Laan, M.J. Super Learner, Targeted Learning: Causal Inference for Observational and Experimental Data (van der Laan, M.J.; Rose, S., eds.), Springer, New York Dordrecht Heidelberg London, 2011

[79] Petersen, M.L.; Schwab, J.; Gruber, S.; Blaser, N.; Schomaker, M.; van der Laan, M.J. Targeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal Marginal Structural Working Models, Journal of Causal Inference, Volume 2 (2014) no. 2 | DOI | MR

[80] Petersen, M.L.; Sinisi, S.E.; van der Laan, M.J. Estimation of direct causal effects, Epidemiology, Volume 17 (2006) no. 3, pp. 276-284 | DOI

[81] Petersen, M.L.; van der Laan, M.J. Case Study: Longitudinal HIV Cohort Data, Targeted Learning: Causal Inference for Observational and Experimental Data (van der Laan, M.J.; Rose, S., eds.), Springer, New York Dordrecht Heidelberg London, 2011 | DOI | MR

[82] Petersen, M.L.; van der Laan, M.J. Causal Models and Learning from Data: Integrating Causal Modeling and Statistical Estimation, Epidemiology, Volume 25 (2014) no. 3, pp. 418-426 | DOI

[83] Prague, M.; Wang, R.; Stephens, A.; E. Tchetgen Tchetgen; De Gruttola, V. Accounting for Interactions and Complex Inter-Subject Dependency in Estimating Treatment Effect in Cluster-Randomized Trials with Missing Outcomes, Biometrics, Volume 72 (2016) no. 4, pp. 1066-1077 | DOI | MR | Zbl

[84] Rudolph, K.E.; Goin, D.E.; Paksarian, D.; Crowder, R.; Merikangas, K.R.; Stuart, E.A. Causal Mediation Analysis With Observational Data: Considerations and Illustration Examining Mechanisms Linking Neighborhood Poverty to Adolescent Substance Use, Am J Epidemiol, Volume Epub Ahead of Print (2018)

[85] Robins, J.M.; Hernán, M.A. 23, Longitudinal Data Analysis (Fitzmaurice, G.; Davidian, M.; Verbeke, G.; Molenberghs, G., eds.), Chapman & Hall/CRC, Boca Raton, FL, 2009 | MR

[86] Robins, J.M.; Hernán, M.A.; Brumback, B. Marginal structural models and causal inference in epidemiology, Epidemiology, Volume 11 (2000) no. 5, pp. 550-560 | DOI

[87] Robins, J.M. A new approach to causal inference in mortality studies with sustained exposure periods–application to control of the healthy worker survivor effect, Mathematical Modelling, Volume 7 (1986), pp. 1393-1512 | DOI | MR | Zbl

[88] Robins, J.M. Association, Causation, and Marginal Structural Models, Synthese, Volume 121 (1999) no. 1-2, pp. 151-179 | DOI | MR | Zbl

[89] Rose, S. Big data and the future, Significance, Volume 9 (2012) no. 4, pp. 47-48 | DOI

[90] Richardson, T.S.; Robins, J.M. Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality (2013) ( Working Paper Number 128 )

[91] Rosenbaum, P.; Rubin, D. The central role of the propensity score in observational studies, Biometrika, Volume 70 (1983), pp. 41-55 | DOI | MR | Zbl

[92] Robins, J.M.; Rotnitzky, A. Recovery of information and adjustment for dependent censoring using surrogate markers, AIDS Epidemiology - Methodological Issues (Jewell, N.; Dietz, K.; Farewell, V., eds.), Birkhäuser, Boston (1992) | DOI

[93] Robins, J.; Rotnitzky, A.; Scharfstein, D. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models, Statistical Models in Epidemiology: The Environment and Clinical Trials (Halloran, M.; Berry, D., eds.), Springer, New York, 1999 | MR

[94] Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association, Volume 89 (1994), pp. 846-866 | DOI | MR | Zbl

[95] Rudolph, K.E.; Sofrygin, O.; Zheng, W.; van der Laan, M.J. Robust and flexible estimation of stochastic mediation effects: a proposed method and example in a randomized trial setting, Epidemiologic Methods, Volume 7 (2017)

[96] Rubin, D.B. Estimating causal effects of treatments in randomized and nonrandomized studies., Journal of Educational Psychology, Volume 66 (1974) no. 5, pp. 688-701 | DOI

[97] Rubin, Donald B. Bayesian Inference for causal effects: the role of randomization, Ann Stat, Volume 6 (1978), pp. 34-58 | MR | Zbl

[98] Rubin, Donald B. Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies, Statistical Science, Volume 5 (1990) no. 4, pp. 472-480 | MR | Zbl

[99] Rudolph, K.E.; van der Laan, M.J. Robust estimation of encouragement-design intervention effects transported across sites, J R Stat Soc Ser B, Volume 79 (2017) no. 5, pp. 1509-1525 | DOI | MR | Zbl

[100] Stuart, E.A.; Cole, S.R.; Bradshaw, C.P.; Leaf, P.J. The use of propensity scores to assess the generalizability of results from randomized trials, Journal of the Royal Statistical Society: Series A, Volume 174 (2011) no. Part 2, pp. 369-386 | DOI | MR

[101] Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction and Search. Number 81 in Lecture Notes in Statistics, Springer-Verlag, New York/Berlin, 1993 | DOI | MR

[102] Schuler, M.S.; Rose, S. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies, American Journal of Epidemiology, Volume 185 (2017) no. 1, pp. 65-73 | DOI

[103] Scharfstein, D.O.; Rotnitzky, A.; Robins, J.M. Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models (with Rejoiner), Journal of the American Statistical Association, Volume 94 (1999) no. 448, p. 1096-1120 (1135–1146) | DOI | MR | Zbl

[104] Schnitzer, M.E.; van der Laan, M.J.; Moodie, E.E.; Platt, R.W. Effect of breastfeeding on gastrointestinal infection in infants: a targeted maximum likelihood approach for clustered longitudinal data, Annals of Applied Statistics, Volume 8 (2014) no. 2, pp. 703-725 | DOI | MR | Zbl

[105] Taubman, S.L.; Robins, J.M.; Mittleman, M.A.; Hernán, M.A. Intervening on risk factors for coronary heart disease: an application of the parametric G-formula, International Journal of Epidemiology, Volume 38 (2009) no. 6, pp. 1599-1611 | DOI

[106] Tchetgen Tchetgen, E.J.; VanderWeele, T.J. On causal inference in the presence of interference, Stat Meth Med Res, Volume 21 (2012) no. 1, pp. 55-75 | DOI | MR

[107] Tran, L.; Yiannoutsos, C.T.; Musick, B.S.; Wools-Kaloustian, K.K.; Siika, A.; Kimaiyo, S.; van der Laan, M.J.; Petersen, M. Evaluating the Impact of a HIV Low-Risk Express Care Task-Shifting Program: A Case Study of the Targeted Learning Roadmap, Epidemiologic Methods, Volume 5 (2016) no. 1, pp. 69-91 | DOI | Zbl

[108] VanderWeele, T.J.; Arah, O.A. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders, Epidemiology, Volume 22 (2011), pp. 42-52 | DOI

[109] VanderWeele, T.J. Marginal structural models for the estimation of direct and indirect effects, Epidemiology, Volume 20 (2009) no. 1, pp. 18-26 | DOI

[110] van der Laan, M.J. Causal Inference for a Population of Causally Connected Units, Journal of Causal Inference, Volume 0 (2014) no. 0, pp. 1-62 | DOI | MR

[111] van der Laan, M.J.; Gruber, S. Collaborative double robust targeted maximum likelihood estimation, The International Journal of Biostatistics, Volume 6 (2010) no. 1 | DOI | MR

[112] van der Laan, M.J.; Gruber, S. Targeted minimum loss based estimation of causal effects of multiple time point interventions, The International Journal of Biostatistics, Volume 8 (2012) no. 1 | DOI | MR

[113] van der Laan, M.J.; Petersen, M.L. Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules, The International Journal of Biostatistics, Volume 3 (2007) no. 1, p. Article 3 | DOI | MR | Zbl

[114] van der Laan, M.J.; Petersen, M.L. Direct effect models, The International Journal of Biostatistics, Volume 4 (2008) no. 1, p. Article 23 | DOI | MR

[115] van der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super Learner, Statistical Applications in Genetics and Molecular Biology, Volume 6 (2007) no. 1, p. 25 | DOI | MR | Zbl

[116] van der Laan, M.J.; Rubin, D.B. Targeted Maximum Likelihood Learning, The International Journal of Biostatistics, Volume 2 (2006) no. 1, p. Article 11 | DOI | MR

[117] van der Laan, M.; Rose, S. Targeted Learning: Causal Inference for Observational and Experimental Data, Springer, New York Dordrecht Heidelberg London, 2011 | DOI | MR

[118] van der Laan, M.J.; Haight, T.J.; Tager, I.B. van der Laan et al. Respond to “Hypothetical interventions to define causal effects”, Am J Epidemiol, Volume 162 (2005) no. 7, pp. 621-622 | DOI

[119] Westreich, D.; Cole, S.R.; Young, J.G.; Palella, F.; Tien, P.; Kingsley, L.; Gange, S.J.; Hernán, M.A. The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death, Statistics in Medicine, Volume 31 (2012) no. 18, pp. 2000-2009 | DOI | MR

[120] Wolpert, D. H. Stacked Generalization, Neural Networks, Volume 5 (1992), pp. 241-259 | DOI

[121] Young, J.G.; Cain, L.E.; Robins, J.M.; O’Reilly, E.J.; Hernán, M.A. Comparative Effectiveness of Dynamic Treatment Regimes: An Application of the Parametric G-Formula, Stat Biosci, Volume 3 (2011), pp. 119-143 | DOI

[122] Zheng, W.; Petersen, M.; van der Laan, M.J. Doubly Robust and Efficient Estimation of Marginal Structural Models for the Hazard Function, Int J Biostat, Volume 12 (2016) no. 1, pp. 233-252 | DOI | MR

[123] Zheng, W.; van der Laan, M.J. Targeted maximum likelihood estimation for natural direct effects, The International Journal of Biostatistics, Volume 8 (2012) no. 1, pp. 1-40 | DOI | MR

[124] Zhang, Y.; Young, J.G.; Thamer, M.; Hernán, M.A. Comparing the Effectiveness of Dynamic Treatment Strategies Using Electronic Health Records: An Application of the Parametric g-Formula to Anemia Management Strategies, Health Serv Res, Volume 53 (2018) no. 3, pp. 1900-1918 | DOI