# Biography

I am an NSF postdoctoral research fellow in biostatistics at Weill Cornell Medicine, working with Iván Díaz and collaborating with David Benkeser. I just completed my PhD in biostatistics at UC Berkeley, under the guidance of Mark van der Laan and Alan Hubbard. During my graduate studies, I served as a founding core developer of the tlverse project, the software ecosystem for targeted learning, and enjoyed collaborations with the Bill & Melinda Gates Foundation, Fred Hutchinson Cancer Research Center, Kaiser Permanente Division of Research, Pandora, and Netflix.

My research interests sit at the intersection of nonparametric causal inference and machine learning, particularly in the development of statistical procedures tailored for efficient estimation and robust inference in flexible statistical models. Broadly, I am motivated by methodological issues arising from high-dimensional inference, targeted loss-based estimation, and non/semi-parametric theory, usually inspired by applications in vaccine clinical trials, epidemiology, and computational biology. I am also interested in high-performance statistical/numerical computing, research software engineering, and open source software for reproducible data science.

### Interests

• Causal Inference and Censored Data Models
• Nonparametric Estimation and Machine Learning
• Semiparametric Theory and Robust Statistics
• Statistical Computing and Research Software Engineering
• High-Dimensional and Computational Biology

### Education

• PhD in Biostatistics (designated emphasis in Computational & Genomic Biology), 2021

University of California, Berkeley

• MA in Biostatistics, 2017

University of California, Berkeley

• BA with a triple major in Molecular & Cell Biology (em. Neurobiology), Psychology, and Public Health, 2015

University of California, Berkeley

# Recent Publications

(see CV for a full list)

Quickly discover relevant content by filtering publications.
(2022). Nonparametric causal mediation analysis for stochastic interventional (in)direct effects. In Biostatistics.

(2022). medoutcon: Nonparametric efficient causal mediation analysis with machine learning in R. In Journal of Open Source Software.

(2021). Semiparametric statistical methods for causal inference with stochastic treatment regimes. PhD dissertation, Graduate Division, University of California, Berkeley.

(2021). cvCovEst: Cross-validated covariance matrix estimator selection and evaluation in R. In Journal of Open Source Software.

# Recent & Upcoming Talks

Efficient Estimation of Modified Treatment Policy Effects Based on the Generalized Propensity Score
A Framework for Causal Segmentation Analysis with Machine Learning in Large-Scale Digital Experiments
Nonparametric Estimation of the Generalized Propensity Score Based on the Highly Adaptive Lasso
Leveraging the Causal Effects of Stochastic Interventions to Evaluate Vaccine Efficacy in Two-phase Trials
Leveraging the Causal Effects of Stochastic Interventions to Evaluate Vaccine Efficacy in Two-phase Trials

# Teaching

## current courses

Nothing on tap for the 2021-2022 academic year. Maybe next year…

## Carpentries workshops

I am a member of Software Carpentry and Data Carpentry, through which I work on curriculum development, maintenance of lesson materials, and workshop delivery.

# Software

Collected collateral damage from doing statistics research, hopefully useful to others.

## Targeted Learning in the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I’ve made significant contributions include

## Causal Machine Learning

A significant focus of my research program centers on the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for a range of problems: causal mediation analysis, evaluating the effects of stochastic interventions under two-phase sampling, conditional density estimation, causal segment discovery and offline policy evaluation, and survival analysis.

• sherlock: An R package for employing causal machine learning and non/semi-parametric estimation to discover population segments (or subgroups) based on treatment effect heterogeneity. Flexible techniques for defining segment-specific treatment rules and efficient estimators of the causal effects of these dynamic treatment regimes are implemented. Joint work with Wenjing Zheng as part of an internship at Netflix Research.
[Docs] | [GitHub]

• medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
[Docs] | [GitHub]

• medoutcon: An R package for efficient estimation of interventional (in)direct effects subject to intermediate confounding, including one-step and targeted minimum loss estimators. Joint work with Iván Díaz and Kara Rudolph.
[Docs] | [GitHub]

• txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
[Docs] | [GitHub] | [CRAN] | [Paper]

• haldensify: An R package for nonparametric conditional density estimation based on the highly adaptive lasso, designed for estimating the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
[Docs] | [GitHub] | [CRAN]

• survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in right-censored survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
[Docs] | [GitHub] | [CRAN]

## High-Dimensional Biology

A parallel thread of my research concerns the development of novel statistical methodology for application in high-dimensional and computational biology. I have (co-)developed several R packages extending the Bioconductor Project.