Nima Hejazi

Nima Hejazi

Assistant Professor of Biostatistics

Harvard T.H. Chan School of Public Health

Nima Hejazi's GitHub Activity


I am an assistant professor in the Department of Biostatistics at the Harvard T.H. Chan School of Public Health. Prior to this, I was an NSF Mathematical Sciences Postdoctoral Research Fellow, working on causal inference and machine learning applied to problems with complex study designs, especially vaccine efficacy trials. I obtained my PhD in biostatistics from UC Berkeley, where I worked on non-/semi-parametric estimation and causal inference for continuous-valued exposures and on mediation analysis. In that time, I was on the founding core development team of the tlverse project, an open-source software ecosystem for targeted learning, and I was lucky to enjoy diverse scientific collaborations with the Fred Hutchinson Cancer Center, the Bill and Melinda Gates Foundation, and Netflix Research.

My research interests lie primarily in unifying statistical methodology for causal inference and machine learning under the central aim of developing efficient and robust, assumption-lean inferential techniques tailored for the applied sciences. Broadly speaking, I am often motivated by methodological topics from non- and semi-parametric inference (that is, from an assumption-lean or model-agnostic perspective), high-dimensional inference, applications of (targeted or minimum) loss-based estimation, corrections for the usage of biased sampling procedures, and the design of adaptive experiments. While my applied science interests are diverse, I have recently been captivated by problems that commonly arise in the study of infectious diseases and in their epidemiology, including clinical trials of these. I am also deeply interested in high-performance statistical computing and open-source software development to promote reproducibility, transparency, and data analysis “hygiene” in applied statistics and statistical data science.


  • causal machine learning and model-free causal inference
  • non/semi-parametric inference and assumption-lean methods
  • high-dimensional inference and bias-correction techniques
  • nonparametric estimation and statistical machine learning
  • statistical computing and reproducible data science


  • PhD in Biostatistics (designated emphasis in Computational & Genomic Biology), 2021

    University of California, Berkeley

  • MA in Biostatistics, 2017

    University of California, Berkeley

  • BA with a triple major in Molecular & Cell Biology (em. Neurobiology), Psychology, and Public Health, 2015

    University of California, Berkeley

Recent Publications

(see CV for a full list)


current courses

I won’t be teaching during the 2022-2023 academic year. I’ll resume in 2023-2024.

past courses

upcoming workshops

recent workshops

Carpentries workshops

I am a member of Software Carpentry and Data Carpentry, through which I work on curriculum development, maintenance of lesson materials, and workshop delivery.


Collected collateral damage from doing statistics research, hopefully useful to others.

Targeted Learning in the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I’ve made significant contributions include

Causal Machine Learning

A significant focus of my research program centers on the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for a range of problems: causal mediation analysis, evaluating the effects of stochastic interventions under two-phase sampling, conditional density estimation, causal segment discovery and offline policy evaluation, and survival analysis.

  • sherlock: An R package for employing causal machine learning and non/semi-parametric estimation to discover population segments (or subgroups) based on treatment effect heterogeneity. Flexible techniques for defining segment-specific treatment rules and efficient estimators of the causal effects of these dynamic treatment regimes are implemented. Joint work with Wenjing Zheng as part of an internship at Netflix Research.
    [Docs] | [GitHub]

  • medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
    [Docs] | [GitHub]

  • medoutcon: An R package for efficient estimation of interventional (in)direct effects subject to intermediate confounding, including one-step and targeted minimum loss estimators. Joint work with Iván Díaz and Kara Rudolph.
    [Docs] | [GitHub] | [Paper]

  • txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
    [Docs] | [GitHub] | [CRAN] | [Paper]

  • haldensify: An R package for nonparametric conditional density estimation based on the highly adaptive lasso, designed for estimating the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
    [Docs] | [GitHub] | [CRAN]

  • survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in right-censored survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
    [Docs] | [GitHub] | [CRAN]

High-Dimensional Biology

A parallel thread of my research concerns the development of novel statistical methodology for application in high-dimensional and computational biology. I have (co-)developed several R packages extending the Bioconductor Project.

Other Assorted Adventures