Mediation Analysis in Causal Inference

Investigations in estimating causal effects defined in the presence of mediation (e.g., natural direct effect), based on stochastic treatment regimes.

Causal Inference with Stochastic Interventions

Investigations in causal inference with stochastic treatment regimes, a flexible formalism more realistic than dynamic or deterministic treatment rules.


How I learned everything that I know.


The things that keep me from working.

R Packages

Software packages developed to extend the R programming language.


Brief adverts on recent teaching.

The PhD Years

Assorted notes on graduate school…


Records of recent professional travel.

Variance Shrinkage for Locally Efficient Estimators

Investigations in applying methods for variance moderation to stabilize locally efficient estimators for data analytic use in high-dimensional biology.

Data-Adaptive Identification of Differential Methylation

Investigations in the use of causal inference and ensemble machine learning to identify differentially methylated positions and regions.

Selected Publications

Mediation analysis in causal inference has traditionally focused on binary treatment regimes and deterministic interventions, as well as a decomposition of the average treatment effect in terms of direct and indirect effects. In this paper we present an analogous decomposition of the population intervention effect, defined through stochastic interventions. Population intervention effects provide a generalized framework in which a variety of interesting causal contrasts can be defined, including effects for continuous and categorical exposures. We show that identification of direct and indirect effects for the population intervention effect requires weaker assumptions than its average treatment effect counterpart. In particular, identification of direct effects is guaranteed in experiments that randomize the treatment and the mediator. We discuss various estimators of the direct and indirect effects, including substitution, re-weighted, and efficient estimators based on flexible regression techniques. Our efficient estimator is asymptotically linear under a condition requiring $n^{\frac{1}{4}}$-consistency of certain regression functions. We perform a simulation study in which we assess the finite-sample properties of our proposed estimators. We present the results of an illustrative study where we assess the effect of participation in a sports team on BMI among children, using mediators such as exercise habits, daily consumption of snacks, and overweight status.

The advent and subsequent widespread availability of preventive vaccines has altered the course of public health in the twentieth century. In spite of the overall success, vaccines are still lacking for many high-burden diseases, including HIV. An important step in the process of developing effective vaccines is identifying immune responses that are indicative of protective efficacy. In this work, we use a causal inference framework to propose a new approach to studying immune responses in the context of vaccines. We focus on causal quantities defined by stochastic interventions, which may be more relevant than alternative approaches for describing the effects of immune responses on risk of infection or disease. We propose methodology for efficiently estimating these quantities using data generated by preventive vaccine trials with two-phase sampling of immune responses. We propose and evaluate two strategies for estimating these quantities: an inverse probability weighting-based technique and an augmented approach. The latter of two approaches is shown to be nonparametric efficient and multiply robust to misspecification of nuisance estimators. We also provide techniques for constructing confidence intervals and hypothesis tests, and provide an open source software implementation of the proposed methodology. We illustrate the techniques using data from a recent preventive HIV vaccine trial.

We focus on variable importance analysis in high-dimensional biological data sets with modest sample sizes, using semiparametric statistical models. We present a method that is robust in small samples, but does not rely on arbitrary parametric assumptions, in the context of studies of gene expression and environmental exposures. Such analyses are faced not only with issues of multiple testing, but also the problem of teasing out the associations of biological expression measures with exposure, among numerous confounds such as age, race, and smoking. Specifically, we propose the use of targeted minimum loss-based estimation, coupled with generalizations of moderated empirical Bayes statistics, to obtain estimates of variable importance measures. The result is a data-adaptive approach that can estimate individual associations in high-dimensional data, even in the presence of relatively small samples.

Recent Talks

(see CV for a full list)

More Talks

Towards the Realistic, Robust, and Efficient Assessment of Causal Effects with Stochastic Shift Interventions
Fri, Sep 14, 2018 10:00 AM
Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths
Thu, Aug 2, 2018 8:00 AM
Data-Adaptive Estimation and Inference for Differential Methylation Analysis
Fri, Jul 27, 2018 11:15 AM
Robust Nonparametric Inference for Stochastic Interventions Under Multi-Stage Sampling
Mon, Apr 2, 2018 4:00 PM
Efficient Estimation of Survival Prognosis Under Immortal Time Bias
Mon, Mar 12, 2018 2:15 PM


  • @nhejazi on Keybase.
  • live:nima.hejazi7
  • 2121 Berkeley Way, University of California, Berkeley, CA 94704
  • By appointment only.