Contrastive and Sparse Dimension Reduction

Development of dimensionality reduction methods using contrastivity and sparsification.

Causal Mediation Analysis

Defining novel mediation effects and extensions using stochastic interventions.

Causal Effects of Stochastic Interventions

Extensions and applications of causal inference based on stochastic interventions in complex settings.

Nonparametric Variance Moderation

Moderated variance estimators for use with semiparametric data-adaptive estimators in high-dimensional biology.

Data-Adaptive Differential Methylation Analysis

Identification of differentially methylated positions and regions based on targeted learning.


The things that keep me from working.

The PhD Years

Assorted notes on graduate school.

Recent Publications

(see CV for a full list)

Quickly discover relevant content by filtering publications.

Motivation: Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad …

scPCA is a toolbox for sparse contrastive principal component analysis of high-dimensional biological data. scPCA combines the …

Mediation analysis in causal inference has traditionally focused on binary treatment regimes and deterministic interventions, as well …

Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects …

The widespread availability of high-dimensional biological sequencing data has made the simultaneous screening of numerous biological …

Recent & Upcoming Talks

Exploratory analysis of high-dimensional biological data has received much attention since the explosion of high-throughput …

Much of the focus of statistical causal inference has been devoted to assessing the effects of static interventions, which specify a …

We consider nonparametrically estimating a parameter of interest under the constraint that a functional of the parameter is bounded. We …

DNA methylation is amongst the best studied of epigenetic mechanisms impacting gene expression. While much attention has been paid to …


current courses

  • None. Check back later.

past courses

recent workshops

Carpentries workshops

I am an active member of Software Carpentry and Data Carpentry, through which I engage in curriculum development, maintenance of lesson materials, and workshop delivery.


Collected collateral damage from doing statistics research, hopefully useful to others.

Targeted Learning and the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I have made significant contributions include

  • sl3: An R package providing a modern implementation of the Super Learner ensemble modeling algorithm that simultaneously exposes a grammar for composing arbitrary pipelines for machine learning. Joint work with Jeremy Coyle, Ivana Malenica, and Oleg Sofrygin.
    [Docs] | [GitHub]

  • origami: An R package exposing a generalized framework for the application of a variety of cross-validation schemes to arbitrary functions, facilitating the extension of cross-validation to (and its use in) a diversity of applications. Joint work with Jeremy Coyle.
    [Docs] | [GitHub] | [CRAN]

  • hal9001: An R package providing an efficient implementation of the Highly Adaptive Lasso (HAL), a nonparametric regression estimator with fast convergence guarantees under mild assumptions. Joint work with Jeremy Coyle and Mark van der Laan.
    [Docs] | [GitHub] | [CRAN]

  • tmle3shift: An R package providing a targeted maximum likelihood estimator(s) for the causal effects of modified treatment policies on continuous-valued exposures. Incorporates nonparametric working marginal structural models for summarization of effect estimates. Joint work with Jeremy Coyle and Mark van der Laan.
    [Docs] | [GitHub]

Causal Inference with Machine Learning

A significant focus of my research program lies at the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for settings ranging from causal mediation analysis and the assessment of stochastic intervention effects with two-phase sampling to nonparametric conditional density estimation and survival analysis.

  • medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
    [Docs] | [GitHub]
  • txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
    [Docs] | [GitHub]

  • haldensify: An R package for nonparametric conditional density estimation using techniques based on the highly adaptive lasso, designed primarily for estimation of the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
    [Docs] | [GitHub] | [CRAN]

  • survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
    [Docs] | [GitHub] | [CRAN]

Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.