Projects

*

Contrastive and Sparse Dimension Reduction

Development of dimensionality reduction methods using contrastivity and sparsification.

Causal Mediation Analysis

Defining novel mediation effects and extensions using stochastic interventions.

Causal Effects of Stochastic Interventions

Extensions and applications of causal inference based on stochastic interventions in complex settings.

Nonparametric Variance Moderation

Moderated variance estimators for use with semiparametric data-adaptive estimators in high-dimensional biology.

Data-Adaptive Differential Methylation Analysis

Identification of differentially methylated positions and regions based on targeted learning.

Distractions

The things that keep me from working.

The PhD Years

Assorted notes on graduate school.

Recent Publications

(see CV for a full list)

Quickly discover relevant content by filtering publications.

Motivation: Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad …

The widespread availability of high-dimensional biological sequencing data has made the simultaneous screening of numerous biological …

Mediation analysis in causal inference has traditionally focused on binary treatment regimes and deterministic interventions, as well …

Arsenic exposure is a worldwide health concern associated with an increased risk of skin, lung, and bladder cancer but arsenic trioxide …

Recent & Upcoming Talks

Exploratory analysis of high-dimensional biological data has received much attention since the explosion of high-throughput …

Much of the focus of statistical causal inference has been devoted to assessing the effects of static interventions, which specify a …

We consider nonparametrically estimating a parameter of interest under the constraint that a functional of the parameter is bounded. We …

DNA methylation is amongst the best studied of epigenetic mechanisms impacting gene expression. While much attention has been paid to …

Teaching

current courses

  • Public Health 242C & Statistics 247C: Longitudinal Data Analysis (Fall 2019), as graduate student instructor with Prof. Alan Hubbard

past courses

recent workshops

Carpentries workshops

I am an active member of Software Carpentry and Data Carpentry, through which I engage in curriculum development, maintenance of lesson materials, and workshop delivery.

Software

Collected collateral damage from doing statistics research, hopefully useful to others.

Targeted Learning and the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I have made significant contributions include

  • sl3: An R package providing a modern implementation of the Super Learner ensemble modeling algorithm that simultaneously exposes a grammar for composing arbitrary machine learning pipelines. Joint work with Jeremy Coyle, Ivana Malenica, and Oleg Sofrygin.
    [Docs] | [GitHub]

  • origami: An R package providing a general framework for the application of various cross-validation schemes to arbitrary functions, facilitating the extension of cross-validation to a diversity of applications. Joint work with Jeremy Coyle.
    [Docs] | [GitHub] | [CRAN]

  • hal9001: An R package providing an efficient implementation of the Highly Adaptive Lasso (HAL), a nonparametric regression estimator with optimality guarantees useful in semiparametric inference. Joint work with Jeremy Coyle and Mark van der Laan.
    [Docs] | [GitHub]

  • tmle3shift: An R package providing a targeted maximum likelihood estimator of effects of stochastic interventions, with summarization of effects via working marginal structural models. Joint work with Jeremy Coyle and Mark van der Laan.
    [Docs] | [GitHub]

Causal Inference with Machine Learning

A significant focus of my research program lies at the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for settings ranging from causal mediation analysis and the assessment of stochastic intervention effects with two-phase sampling to nonparametric conditional density estimation and survival analysis.

  • medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions, including both classical and efficient estimators for incremental propensity score intervention (causal) effects. Joint work with Iván Díaz.
    [Docs] | [GitHub]
  • txshift: An R package for efficient estimation of and inference on the causal effects of stochastic interventions, accommodating robust estimation and semiparametric-efficient inference in the presence of two-phased sampling. Joint work with David Benkeser.
    [Docs] | [GitHub]

  • haldensify: An R package for nonparametric conditional density estimation using techniques based on the highly adaptive lasso, designed specifically for estimation of the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
    [Docs] | [GitHub]

  • survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in survival settings with and without competing risks, including estimators that respect bounds. Joint work with David Benkeser.
    [Docs] | [GitHub] | [CRAN]

Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.

Miscellaneous

Contact