Nima Hejazi

Nima Hejazi

PhD Candidate in Biostatistics

University of California, Berkeley

Nima Hejazi's GitHub Activity


I am a PhD candidate in Biostatistics, working jointly with Mark van der Laan and Alan Hubbard. I am a founding core developer of the tlverse project, the software ecosystem for Targeted Learning. At UC Berkeley, I am affiliated with the Center for Computational Biology and the NIH Biomedical Big Data initiative. I have also enjoyed serving in scientific/statistical collaborations with the Bill & Melinda Gates Foundation, the Kaiser Permanente Division of Research, and the Fred Hutchinson Cancer Research Center.

My research interests sit primarily at the intersection of causal inference and machine learning, with a particular concern towards developing efficient and robust statistical procedures for evaluating complex target estimands within observational studies and randomized trials. Broadly, my work draws on ideas from non/semi-parametric estimation in large, flexible statistical models; high-dimensional inference; targeted loss-based estimation; statistical computing; computational biology; and statistical epidemiology. Of late, my methodological work has touched on causal mediation analysis, stochastic treatment regimes, robust inference in two-phase designs, and efficient estimation with sieve-type methods. I am also quite keenly interested in designing open source statistical software to promote computational reproducibility in applied scientific practice.


  • Causal Inference and Censored Data Models
  • Nonparametric Estimation and Machine Learning
  • Semiparametric Theory and Robust Statistics
  • High-Dimensional and Computational Biology
  • Statistical Computing and Reproducible Research


  • PhD in Biostatistics (designated emphasis in Computational and Genomic Biology), 2017-present

    University of California, Berkeley

  • MA in Biostatistics, 2017

    University of California, Berkeley

  • BA with a triple major in Molecular and Cell Biology (em. Neurobiology), Psychology, and Public Health, 2015

    University of California, Berkeley

Recent Publications

(see CV for a full list)

Quickly discover relevant content by filtering publications.
(2020). Targeted Learning: robust statistics for reproducible research.

Preprint Code


current courses

  • None for now. Check back later.

past courses

recent workshops

Carpentries workshops

I am an member of Software Carpentry and Data Carpentry, through which I work on curriculum development, maintenance of lesson materials, and workshop delivery.


Collected collateral damage from doing statistics research, hopefully useful to others.

Targeted Learning and the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I’ve made significant contributions include

Causal Inference with Machine Learning

A significant focus of my research program centers on the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for a range of problems: causal mediation analysis, evaluating stochastic interventions under two-phase sampling, conditional density estimation, and survival analysis.

  • medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
    [Docs] | [GitHub]

  • medoutcon: An R package for efficient estimation of interventional (in)direct effects subject to intermediate confounding, including one-step and targeted minimum loss estimators. Joint work with Iván Díaz and Kara Rudolph.
    [Docs] | [GitHub]

  • txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
    [Docs] | [GitHub]

  • haldensify: An R package for nonparametric conditional density estimation based on the highly adaptive lasso, designed for estimating the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
    [Docs] | [GitHub] | [CRAN]

  • survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in right-censored survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
    [Docs] | [GitHub] | [CRAN]

Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.