current courses

  • None for now. Check back later.

past courses

recent workshops

Carpentries workshops

I am an active member of Software Carpentry and Data Carpentry, through which I work on curriculum development, maintenance of lesson materials, and workshop delivery.


Collected collateral damage from doing statistics research, hopefully useful to others.

Targeted Learning and the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I have made significant contributions include

  • sl3: An R package providing a modern implementation of the Super Learner ensemble modeling algorithm that simultaneously exposes a grammar for composing arbitrary pipelines for machine learning. Joint work with Jeremy Coyle, Ivana Malenica, and Oleg Sofrygin.
    [Docs] | [GitHub]

  • origami: An R package exposing a generalized framework for the application of a variety of cross-validation schemes to arbitrary functions, facilitating the extension of cross-validation to (and its use in) a diversity of applications. Joint work with Jeremy Coyle.
    [Docs] | [GitHub] | [CRAN]

  • hal9001: An R package providing an efficient implementation of the Highly Adaptive Lasso (HAL), a nonparametric regression estimator with fast convergence guarantees under mild assumptions. Joint work with Jeremy Coyle and Mark van der Laan.
    [Docs] | [GitHub] | [CRAN]

  • tmle3shift: An R package providing a targeted maximum likelihood estimator(s) for the causal effects of modified treatment policies on continuous-valued exposures. Incorporates nonparametric working marginal structural models for summarization of effect estimates. Joint work with Jeremy Coyle and Mark van der Laan.
    [Docs] | [GitHub]

Causal Inference with Machine Learning

A significant focus of my research program lies at the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for settings ranging from causal mediation analysis and the assessment of stochastic intervention effects with two-phase sampling to nonparametric conditional density estimation and survival analysis.

  • medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
    [Docs] | [GitHub]

  • medoutcon: An R package for the efficient estimation of stochastic interventional (in)direct effects identifiable under intermediate confounding, including one-step and targeted minimum loss estimators. Joint work with Iván Díaz and Kara Rudolph.
    [Docs] | [GitHub]

  • txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
    [Docs] | [GitHub]

  • haldensify: An R package for nonparametric conditional density estimation using techniques based on the highly adaptive lasso, designed primarily for estimation of the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
    [Docs] | [GitHub] | [CRAN]

  • survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
    [Docs] | [GitHub] | [CRAN]

Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.




Efficient estimation of functional target parameters based on the highly adaptive lasso minimum loss estimator (HAL-MLE).

Cross-validated selection; robust, sparsified estimation; and dimension reduction for high-dimensional (contrastive) covariance matrices.

Defining novel, more flexible causal effects for mediation analysis, primarily using the formalism of stochastic interventions.

Estimating the causal effects of stochastic treatment regimes, including conditional density estimation and two-phase sampling corrections.

Introducing empirical Bayes variance moderation for data adaptive variable importance in high-dimensional biology applications.

The things that keep me from working.

Assorted notes on graduate school.