# Biography

I am a PhD candidate in Biostatistics, working jointly with Mark van der Laan and Alan Hubbard. I am a founding core developer of the tlverse project, the software ecosystem for Targeted Learning. At UC Berkeley, I am affiliated with the Center for Computational Biology and the NIH Biomedical Big Data initiative. During my time in graduate school, I have also enjoyed scientific and statistical collaborations with the Bill & Melinda Gates Foundation, the Kaiser Permanente Division of Research, the Fred Hutchinson Cancer Research Center, Netflix, and SiriusXM+Pandora.

My research interests sit primarily at the intersection of causal inference and machine learning, with a particular concern towards developing efficient and robust statistical procedures for evaluating complex target estimands within observational studies and randomized trials. Broadly, my work draws on ideas from non/semi-parametric estimation in large, flexible statistical models; high-dimensional inference; targeted loss-based estimation; statistical computing; computational biology; and statistical epidemiology. Of late, my methodological work has touched on causal mediation analysis, stochastic treatment regimes, robust inference in two-phase designs, and efficient estimation with sieve-type methods. I am also quite keenly interested in designing open source statistical software to promote computational reproducibility in applied scientific practice.

### Interests

• Causal Inference and Censored Data Models
• Nonparametric Estimation and Machine Learning
• Semiparametric Theory and Robust Statistics
• High-Dimensional and Computational Biology
• Statistical Computing and Reproducible Research

### Education

• PhD in Biostatistics (designated emphasis in Computational and Genomic Biology), 2017-2021 (expected)

University of California, Berkeley

• MA in Biostatistics, 2017

University of California, Berkeley

• BA with a triple major in Molecular and Cell Biology (em. Neurobiology), Psychology, and Public Health, 2015

University of California, Berkeley

# Recent Publications

(see CV for a full list)

Quickly discover relevant content by filtering publications.
(2020). Non-parametric efficient causal mediation with intermediate confounders. In Biometrika.

(2020). txshift: Efficient estimation of the causal effects of stochastic interventions in R. In Journal of Open Source Software.

(2020). hal9001: Highly adaptive lasso regression in R. In Journal of Open Source Software.

# Recent & Upcoming Talks

Leveraging the Causal Effects of Stochastic Interventions to Evaluate Vaccine Efficacy in Two-phase Trials
Leveraging the Causal Effects of Stochastic Interventions to Evaluate Vaccine Efficacy in Two-phase Trials
Evaluating the Causal Impacts of Vaccine-induced Immune Responses in Two-phase Vaccine Efficacy Trials
Evaluating the Causal Impacts of Vaccine-induced Immune Responses in Two-phase Vaccine Efficacy Trials
Efficient Estimation of Stochastic Intervention Effects in Causal Mediation Analysis

# Teaching

## current courses

• Public Health 290: Biomedical Big Data Capstone Seminar (Targeted Learning in Practice), as graduate student instructor with Prof. Mark van der Laan; Spring 2021 at University of California, Berkeley.

## past courses

• Public Health 240B & Statistics 245B: Survival Analysis and Causality, as graduate student instructor with Prof. Mark van der Laan; Fall 2020 at University of California, Berkeley.

• Public Health 290: Biomedical Big Data Capstone Seminar, as graduate student instructor with Prof. Alan Hubbard; Spring 2020 at University of California, Berkeley.

• Public Health 242C & Statistics 247C: Longitudinal Data Analysis, as graduate student instructor with Prof. Alan Hubbard; Fall 2019 at University of California, Berkeley.

• Public Health 290: Targeted Learning in Biomedical Big Data, as graduate student instructor with Prof. Mark van der Laan; Spring 2018 at University of California, Berkeley.
Course materials here | GitHub repositories here

## Carpentries workshops

I am a member of Software Carpentry and Data Carpentry, through which I work on curriculum development, maintenance of lesson materials, and workshop delivery.

# Software

Collected collateral damage from doing statistics research, hopefully useful to others.

## Targeted Learning with the tlverse

The tlverse is an ecosystem of R packages for Targeted Learning, of which I am a co-founder and core developer. A few of the tlverse packages to which I’ve made significant contributions include

## Causal Inference meets Machine Learning

A significant focus of my research program centers on the intersection of causal inference and statistical machine learning. I’ve (co-)developed R packages for a range of problems: causal mediation analysis, evaluating stochastic interventions under two-phase sampling, conditional density estimation, and survival analysis.

• medshift: An R package for estimating the population intervention (in)direct effects based on stochastic interventions. Classical and efficient estimators are supported for the effects of incremental propensity score interventions and modified treatment policies. Joint work with Iván Díaz.
[Docs] | [GitHub]

• medoutcon: An R package for efficient estimation of interventional (in)direct effects subject to intermediate confounding, including one-step and targeted minimum loss estimators. Joint work with Iván Díaz and Kara Rudolph.
[Docs] | [GitHub]

• txshift: An R package for efficient estimation of and inference on causal effects of stochastic interventions on continuous-valued exposures. Robust estimation and efficient inference under two-phased sampling is supported. Joint work with David Benkeser.
[Docs] | [GitHub] | [CRAN] | [Paper]

• haldensify: An R package for nonparametric conditional density estimation based on the highly adaptive lasso, designed for estimating the generalized propensity score. Joint work with David Benkeser and Mark van der Laan.
[Docs] | [GitHub] | [CRAN]

• survtmle: An R package for the construction of targeted maximum likelihood estimates of marginal cumulative incidence in right-censored survival settings with and without competing risks, including estimation procedures that respect bounds. Joint work with David Benkeser.
[Docs] | [GitHub] | [CRAN]

## Computational Biology and Bioconductor

A parallel thread of my research concerns the development of novel statistical methodologies for application in high-dimensional and computational biology settings. Consequently, I have (co-)developed several R packages extending the Bioconductor Project.