# Biography

I will soon start as an NSF postdoctoral research fellow at Weill Cornell Medicine, working with Iván Díaz and collaborating with David Benkeser. I am completing my PhD in biostatistics at UC Berkeley, under the guidance of Mark van der Laan and Alan Hubbard. During my graduate studies, I served as a founding core developer of the tlverse project, the software ecosystem for targeted learning, and enjoyed collaborations with the Bill & Melinda Gates Foundation, Fred Hutchinson Cancer Research Center, Kaiser Permanente Division of Research, Pandora, and Netflix.

My research interests sit at the intersection of nonparametric causal inference and machine learning, particularly in the development of statistical procedures tailored for efficient estimation and robust inference, in flexible statistical models. Broadly, I am motivated by methodological issues arising from high-dimensional inference, loss-based estimation, semiparametric theory, and complex study designs, usually inspired by applications in computational biology, epidemiology, and vaccine trials. I am also complementarily interested in high-performance statistical computing, research software engineering, and open source software for applied statistics.

### Interests

• Causal Inference and Censored Data Models
• Nonparametric Estimation and Machine Learning
• Semiparametric Theory and Robust Statistics
• Statistical Computing and Research Software Engineering
• High-Dimensional and Computational Biology

### Education

• PhD in Biostatistics (designated emphasis in Computational & Genomic Biology), 2021

University of California, Berkeley

• MA in Biostatistics, 2017

University of California, Berkeley

• BA with a triple major in Molecular & Cell Biology (em. Neurobiology), Psychology, and Public Health, 2015

University of California, Berkeley

# Recent Publications

(see CV for a full list)

Quickly discover relevant content by filtering publications.
(2021). cvCovEst: Cross-validated covariance matrix estimation in R. In Journal of Open Source Software.

(2021). Cross-validated loss-based covariance matrix estimator selection in high dimensions.

(2020). Non-parametric efficient causal mediation with intermediate confounders. In Biometrika.

(2020). txshift: Efficient estimation of the causal effects of stochastic interventions in R. In Journal of Open Source Software.

(2020). hal9001: Highly adaptive lasso regression in R. In Journal of Open Source Software.

# Recent & Upcoming Talks

Leveraging the Causal Effects of Stochastic Interventions to Evaluate Vaccine Efficacy in Two-phase Trials
Leveraging the Causal Effects of Stochastic Interventions to Evaluate Vaccine Efficacy in Two-phase Trials
Evaluating the Causal Impacts of Vaccine-induced Immune Responses in Two-phase Vaccine Efficacy Trials
Evaluating the Causal Impacts of Vaccine-induced Immune Responses in Two-phase Vaccine Efficacy Trials
Efficient Estimation of Stochastic Intervention Effects in Causal Mediation Analysis

# Teaching

## current courses

I will not be teaching during the 2021-2022 academic year. Check back later.

## past courses

• Public Health 290: Biomedical Big Data Capstone Seminar (Targeted Learning in Practice), as graduate student instructor with Prof. Mark van der Laan; Spring 2021 at the University of California, Berkeley.

• Public Health 240B / Statistics 245B: Survival Analysis and Causality, as graduate student instructor with Prof. Mark van der Laan; Fall 2020 at the University of California, Berkeley.

• Public Health 290: Biomedical Big Data Capstone Seminar, as graduate student instructor with Prof. Alan Hubbard; Spring 2020 at the University of California, Berkeley.

• Public Health 242C / Statistics 247C: Longitudinal Data Analysis, as graduate student instructor with Prof. Alan Hubbard; Fall 2019 at the University of California, Berkeley.

• Public Health 290: Targeted Learning in Biomedical Big Data, as graduate student instructor with Prof. Mark van der Laan; Spring 2018 at the University of California, Berkeley.

## upcoming workshops

Nothing on tap, for now. Check back later.

## Carpentries workshops

I am a member of Software Carpentry and Data Carpentry, through which I work on curriculum development, maintenance of lesson materials, and workshop delivery.

# Software

Collected collateral damage from doing statistics research, hopefully useful to others.

## Targeted Learning with the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I’ve made significant contributions include

## Causal Inference Meets Machine Learning

A significant focus of my research program centers on the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for a range of problems: causal mediation analysis, evaluating stochastic interventions under two-phase sampling, conditional density estimation, and survival analysis.

• medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
[Docs] | [GitHub]

• medoutcon: An R package for efficient estimation of interventional (in)direct effects subject to intermediate confounding, including one-step and targeted minimum loss estimators. Joint work with Iván Díaz and Kara Rudolph.
[Docs] | [GitHub]

• txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
[Docs] | [GitHub] | [CRAN] | [Paper]

• haldensify: An R package for nonparametric conditional density estimation based on the highly adaptive lasso, designed for estimating the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
[Docs] | [GitHub] | [CRAN]

• survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in right-censored survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
[Docs] | [GitHub] | [CRAN]

## Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.