Nima Hejazi

Nima Hejazi

PhD Candidate in Biostatistics

University of California, Berkeley

Nima Hejazi's GitHub Activity


I am a PhD candidate in Biostatistics, working jointly with Mark van der Laan and Alan Hubbard. I am a founding core developer of the tlverse project, the software ecosystem for Targeted Learning. At UC Berkeley, I am affiliated with the Center for Computational Biology and the NIH Biomedical Big Data initiative. During my time in graduate school, I have also enjoyed scientific and statistical collaborations with the Bill & Melinda Gates Foundation, the Kaiser Permanente Division of Research, the Fred Hutchinson Cancer Research Center, Netflix, and SiriusXM+Pandora.

My research interests sit primarily at the intersection of causal inference and machine learning, with a particular concern towards developing efficient and robust statistical procedures for evaluating complex target estimands within observational studies and randomized trials. Broadly, my work draws on ideas from non/semi-parametric estimation in large, flexible statistical models; high-dimensional inference; targeted loss-based estimation; statistical computing; computational biology; and statistical epidemiology. Of late, my methodological work has touched on causal mediation analysis, stochastic treatment regimes, robust inference in two-phase designs, and efficient estimation with sieve-type methods. I am also quite keenly interested in designing open source statistical software to promote computational reproducibility in applied scientific practice.


  • Causal Inference and Censored Data Models
  • Nonparametric Estimation and Machine Learning
  • Semiparametric Theory and Robust Statistics
  • High-Dimensional and Computational Biology
  • Statistical Computing and Reproducible Research


  • PhD in Biostatistics (designated emphasis in Computational and Genomic Biology), 2017-2021 (expected)

    University of California, Berkeley

  • MA in Biostatistics, 2017

    University of California, Berkeley

  • BA with a triple major in Molecular and Cell Biology (em. Neurobiology), Psychology, and Public Health, 2015

    University of California, Berkeley

Recent & Upcoming Talks

Evaluating the Causal Impacts of Vaccine-induced Immune Responses in Two-phase Vaccine Efficacy Trials
Evaluating the Causal Impacts of Vaccine-induced Immune Responses in Two-phase Vaccine Efficacy Trials
Efficient Estimation of Stochastic Intervention Effects in Causal Mediation Analysis
Nonparametric Causal Mediation Analysis for Stochastic Interventions
Generalized Variance Moderation for Locally Efficient Estimation in High-Dimensional Biology


current courses

  • Public Health 240B & Statistics 245B: Survival Analysis and Causality (Fall 2020), as graduate student instructor with Prof. Mark van der Laan

upcoming courses

  • Public Health 290: Biomedical Big Data Capstone Seminar (Spring 2021), as graduate student instructor with Prof. Mark van der Laan

past courses

recent workshops

Carpentries workshops

I am a member of Software Carpentry and Data Carpentry, through which I work on curriculum development, maintenance of lesson materials, and workshop delivery.


Collected collateral damage from doing statistics research, hopefully useful to others.

Targeted Learning with the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I’ve made significant contributions include

Causal Inference meets Machine Learning

A significant focus of my research program centers on the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for a range of problems: causal mediation analysis, evaluating stochastic interventions under two-phase sampling, conditional density estimation, and survival analysis.

  • medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
    [Docs] | [GitHub]

  • medoutcon: An R package for efficient estimation of interventional (in)direct effects subject to intermediate confounding, including one-step and targeted minimum loss estimators. Joint work with Iván Díaz and Kara Rudolph.
    [Docs] | [GitHub]

  • txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
    [Docs] | [GitHub]

  • haldensify: An R package for nonparametric conditional density estimation based on the highly adaptive lasso, designed for estimating the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
    [Docs] | [GitHub] | [CRAN]

  • survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in right-censored survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
    [Docs] | [GitHub] | [CRAN]

Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.