I am a PhD candidate in biostatistics, with a designated emphasis in computational and genomic biology, working with Mark van der Laan and Alan Hubbard. I am a founding core developer of the tlverse, the software ecosystem for targeted learning, and a workshop instructor with Software Carpentry. At UC Berkeley, I am affiliated with the Center for Computational Biology and the NIH Biomedical Big Data initiative. I have also served in biostatistical collaborations with the Bill & Melinda Gates Foundation and the Kaiser Permanente Division of Research.
My research interests primarily concern the development of robust and efficient statistical methodologies that lie at the intersection of causal inference and statistical machine learning, with the aim of facilitating flexible estimation and inference for complex data from observational studies or randomized trials. My interests further span nonparametric estimation, high-dimensional inference, targeted learning, statistical computing, survival analysis, and computational biology. A few recent methodological interests have included stochastic treatment regimes, causal mediation analysis, robust inference with two-phase sampling, variance moderation of semiparametric estimators, and nonparametric conditional density estimation. I am also quite interested in the design of open source software and the use of automated testing practices for the promotion of reproducible applied statistics and replicable science.
PhD in Biostatistics, with a designated emphasis in Computational and Genomic Biology, 2016-present
University of California, Berkeley
MA in Biostatistics, 2017
University of California, Berkeley
BA with a triple major in Molecular and Cell Biology (em. Neurobiology), Psychology, and Public Health, 2015
University of California, Berkeley
Extending effects defined through mediation (e.g., direct effects) to stochastic interventions and intermediate confounders.
Estimation and inference for causal effects based on stochastic treatment regimes, under two-phase sampling, in mediation settings, and for variable importance analysis.
The things that keep me from working.
Software packages developed to extend the R programming language.
Assorted notes on graduate school.
Extending variance moderation for the stabilization of data-adaptive, efficient semiparametric estimators in high-dimensional biology.
Identification of differentially methylated positions and regions using techniques from causal inference and statistical machine learning.
(see CV for a full list)
adaptest
is an R package for performing multiple hypothesis testing in problem settings commonly encountered in high-dimensional …
I am an active member of Software Carpentry and Data Carpentry, through which I engage in curriculum development, maintenance of lesson materials, and workshop delivery.
Software Carpentry: Shell, Git, and
R at the Berkeley Institute
for Data Science; 2019 Jan. 17-18; co-taught
with S. Peterson, N. Varoquaux.
Course materials
here | GitHub repository
here
Software Carpentry: Shell, Git, and
Python at the Berkeley Institute
for Data Science; 2018 Jul. 16-17; co-taught
with K. Marwaha.
Course materials
here | GitHub repository
here
Data Carpentry: Genomics at
Lawrence Berkeley National Laboratory; 2018 May
3-4; co-taught with A. Orr.
Course materials
here | GitHub
repository here