# Projects

#### Contrastive and Sparse Dimension Reduction

Development of dimensionality reduction methods using contrastivity and sparsification.

#### Causal Mediation Analysis

Defining novel mediation effects and extensions using stochastic interventions.

#### Causal Effects of Stochastic Interventions

Extensions and applications of causal inference based on stochastic interventions in complex settings.

#### Nonparametric Variance Moderation

Moderated variance estimators for use with semiparametric data-adaptive estimators in high-dimensional biology.

Identification of differentially methylated positions and regions based on targeted learning.

#### Distractions

The things that keep me from working.

# Recent Publications

(see CV for a full list)

Quickly discover relevant content by filtering publications.

### Non-parametric efficient causal mediation with intermediate confounders

Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects …

### Exploring high-dimensional biological data with sparse contrastive principal component analysis

Motivation: Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad …

### A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology

The widespread availability of high-dimensional biological sequencing data has made the simultaneous screening of numerous biological …

### Causal mediation analysis for stochastic interventions

Mediation analysis in causal inference has traditionally focused on binary treatment regimes and deterministic interventions, as well …

### Efficient nonparametric inference on the effects of stochastic interventions under two-phase sampling, with applications to vaccine efficacy trials

The advent and subsequent widespread availability of preventive vaccines has altered the course of public health in the twentieth …

# Recent & Upcoming Talks

### Generalized Variance Moderation for Locally Efficient Estimation in High-Dimensional Biology

Exploratory analysis of high-dimensional biological data has received much attention since the explosion of high-throughput …

### Robust Inference on the Causal Effects of Stochastic Interventions Under Two-Phase Sampling, with Applications in Vaccine Efficacy Trials

Much of the focus of statistical causal inference has been devoted to assessing the effects of static interventions, which specify a …

### Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

We consider nonparametrically estimating a parameter of interest under the constraint that a functional of the parameter is bounded. We …

### Data-Adaptive Estimation and Inference for Differential Methylation Analysis

DNA methylation is amongst the best studied of epigenetic mechanisms impacting gene expression. While much attention has been paid to …

# Teaching

## current courses

• None. Check back later.

## Carpentries workshops

I am an active member of Software Carpentry and Data Carpentry, through which I engage in curriculum development, maintenance of lesson materials, and workshop delivery.

# Software

Collected collateral damage from doing statistics research, hopefully useful to others.

## Targeted Learning and the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I have made significant contributions include

• sl3: An R package providing a modern implementation of the Super Learner ensemble modeling algorithm that simultaneously exposes a grammar for composing arbitrary machine learning pipelines. Joint work with Jeremy Coyle, Ivana Malenica, and Oleg Sofrygin.
[Docs] | [GitHub]

• origami: An R package providing a general framework for the application of various cross-validation schemes to arbitrary functions, facilitating the extension of cross-validation to a diversity of applications. Joint work with Jeremy Coyle.
[Docs] | [GitHub] | [CRAN]

• hal9001: An R package providing an efficient implementation of the Highly Adaptive Lasso (HAL), a nonparametric regression estimator with optimality guarantees useful in semiparametric inference. Joint work with Jeremy Coyle and Mark van der Laan.
[Docs] | [GitHub]

• tmle3shift: An R package providing a targeted maximum likelihood estimator of effects of stochastic interventions, with summarization of effects via working marginal structural models. Joint work with Jeremy Coyle and Mark van der Laan.
[Docs] | [GitHub]

## Causal Inference with Machine Learning

A significant focus of my research program lies at the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for settings ranging from causal mediation analysis and the assessment of stochastic intervention effects with two-phase sampling to nonparametric conditional density estimation and survival analysis.

• medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions, including both classical and efficient estimators for incremental propensity score intervention (causal) effects. Joint work with Iván Díaz.
[Docs] | [GitHub]
• txshift: An R package for efficient estimation of and inference on the causal effects of stochastic interventions, accommodating robust estimation and semiparametric-efficient inference in the presence of two-phased sampling. Joint work with David Benkeser.
[Docs] | [GitHub]

• haldensify: An R package for nonparametric conditional density estimation using techniques based on the highly adaptive lasso, designed specifically for estimation of the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
[Docs] | [GitHub]

• survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in survival settings with and without competing risks, including estimators that respect bounds. Joint work with David Benkeser.
[Docs] | [GitHub] | [CRAN]

## Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.

## Miscellaneous

• nima: An R package housing my personal R toolbox, written to support statistical computing for research.
[Docs] | [GitHub] | [CRAN]