# Projects

#### Contrastive and Sparse Dimension Reduction

Development of dimensionality reduction methods using contrastivity and sparsification.

#### Causal Mediation Analysis

Defining novel mediation effects and extensions using stochastic interventions.

#### Causal Effects of Stochastic Interventions

Extensions and applications of causal inference based on stochastic interventions in complex settings.

#### Nonparametric Variance Moderation

Moderated variance estimators for use with semiparametric data-adaptive estimators in high-dimensional biology.

Identification of differentially methylated positions and regions based on targeted learning.

#### Distractions

The things that keep me from working.

# Recent Publications

(see CV for a full list)

Quickly discover relevant content by filtering publications.

### Exploring high-dimensional biological data with sparse contrastive principal component analysis

Motivation: Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad …

### scPCA: A toolbox for sparse contrastive principal component analysis in R

scPCA is a toolbox for sparse contrastive principal component analysis of high-dimensional biological data. scPCA combines the …

### Causal mediation analysis for stochastic interventions

Mediation analysis in causal inference has traditionally focused on binary treatment regimes and deterministic interventions, as well …

### Non-parametric efficient causal mediation with intermediate confounders

Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects …

### A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology

The widespread availability of high-dimensional biological sequencing data has made the simultaneous screening of numerous biological …

# Recent & Upcoming Talks

### Generalized Variance Moderation for Locally Efficient Estimation in High-Dimensional Biology

Exploratory analysis of high-dimensional biological data has received much attention since the explosion of high-throughput …

### Robust Inference on the Causal Effects of Stochastic Interventions Under Two-Phase Sampling, with Applications in Vaccine Efficacy Trials

Much of the focus of statistical causal inference has been devoted to assessing the effects of static interventions, which specify a …

### Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

We consider nonparametrically estimating a parameter of interest under the constraint that a functional of the parameter is bounded. We …

### Data-Adaptive Estimation and Inference for Differential Methylation Analysis

DNA methylation is amongst the best studied of epigenetic mechanisms impacting gene expression. While much attention has been paid to …

# Teaching

## current courses

• None. Check back later.

## Carpentries workshops

I am an active member of Software Carpentry and Data Carpentry, through which I engage in curriculum development, maintenance of lesson materials, and workshop delivery.

# Software

Collected collateral damage from doing statistics research, hopefully useful to others.

## Targeted Learning and the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I have made significant contributions include

• sl3: An R package providing a modern implementation of the Super Learner ensemble modeling algorithm that simultaneously exposes a grammar for composing arbitrary pipelines for machine learning. Joint work with Jeremy Coyle, Ivana Malenica, and Oleg Sofrygin.
[Docs] | [GitHub]

• origami: An R package exposing a generalized framework for the application of a variety of cross-validation schemes to arbitrary functions, facilitating the extension of cross-validation to (and its use in) a diversity of applications. Joint work with Jeremy Coyle.
[Docs] | [GitHub] | [CRAN]

• hal9001: An R package providing an efficient implementation of the Highly Adaptive Lasso (HAL), a nonparametric regression estimator with fast convergence guarantees under mild assumptions. Joint work with Jeremy Coyle and Mark van der Laan.
[Docs] | [GitHub] | [CRAN]

• tmle3shift: An R package providing a targeted maximum likelihood estimator(s) for the causal effects of modified treatment policies on continuous-valued exposures. Incorporates nonparametric working marginal structural models for summarization of effect estimates. Joint work with Jeremy Coyle and Mark van der Laan.
[Docs] | [GitHub]

## Causal Inference with Machine Learning

A significant focus of my research program lies at the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for settings ranging from causal mediation analysis and the assessment of stochastic intervention effects with two-phase sampling to nonparametric conditional density estimation and survival analysis.

• medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
[Docs] | [GitHub]
• txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
[Docs] | [GitHub]

• haldensify: An R package for nonparametric conditional density estimation using techniques based on the highly adaptive lasso, designed primarily for estimation of the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
[Docs] | [GitHub] | [CRAN]

• survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
[Docs] | [GitHub] | [CRAN]

## Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.

• biotmle: An R package for the model-free discovery of biomarkers from biological expression data, introducing a generalization of moderated statistics for variance stabilization of semiparametric estimators. Joint work with Alan Hubbard and Mark van der Laan.
[Docs] | [GitHub] | [Bioconductor]

• scPCA: An R package for sparse contrastive principal component analysis, facilitating the recovery of stable and low-dimensional patterns from high-dimensional biological data while removing technical artifacts by making use of control samples. Joint work with Philippe Boileau and Sandrine Dudoit.
[GitHub] | [Bioconductor]

• methyvim: An R package for genome-wide assessment of differential methylation based on estimation of variable importance measures at the level of CpG sites. Joint work with Mark van der Laan.
[Docs] | [GitHub] | [Bioconductor]

• adaptest: An R package for multiple hypothesis testing with data adaptive target parameters in high-dimensional settings using Targeted Learning. Joint work with Weixin Cai and Alan Hubbard.
[GitHub] | [Bioconductor]

## Miscellaneous

• nima: An R package housing my personal R toolbox, written to support statistical computing for research.
[Docs] | [GitHub] | [CRAN]