I am a PhD candidate in biostatistics, with a designated emphasis in computational and genomic biology, working with Mark van der Laan and Alan Hubbard. I am a co-founder of the tlverse software ecosystem and a workshop instructor with Software Carpentry. At UC Berkeley, I am affiliated with the Center for Computational Biology and am a former trainee of the NIH Biomedical Big Data initiative. Currently, I serve as a biostatistician on several collaborations with the Bill & Melinda Gates Foundation and the Kaiser Permanente Division of Research.
My research interests span causal inference, nonparametric inference and machine learning, targeted loss-based (or likelihood) estimation, survival analysis and censored data models, statistical computing, and reproducible research. My work is chiefly driven by the nonparametric estimation of quantities defined in causal models – motivated by and applied to pressing scientific and policy problems – with substantive applications most recently including vaccine efficacy trials, precision medicine, high-dimensional biology, epidemiology, and algorithmic fairness. My recent methodological interests have included stochastic treatment regimes, causal mediation analysis, robust inference under two-phase sampling, variance shrinkage for locally efficient estimators, and data-adaptive conditional density estimation. I also am passionate about software development for applied statistics, including automated testing, free open source software, and computational reproducibility.
PhD in Biostatistics, with a designated emphasis in Computational & Genomic Biology, 2016-present
University of California, Berkeley
MA in Biostatistics, 2017
University of California, Berkeley
BA with a triple major in Molecular & Cell Biology, Psychology, and Public Health, 2015
University of California, Berkeley
Investigations in estimating causal effects defined in the presence of mediation (e.g., natural direct effect), based on stochastic treatment regimes.
Investigations in causal inference with stochastic treatment regimes, a flexible formalism more realistic than dynamic or deterministic treatment rules.
How I learned everything that I know.
The things that keep me from working.
Software packages developed to extend the R programming language.
Brief adverts on recent teaching.
Assorted notes on graduate school…
Records of recent professional travel.
Investigations in applying methods for variance moderation to stabilize locally efficient estimators for data analytic use in high-dimensional biology.
Investigations in the use of causal inference and ensemble machine learning to identify differentially methylated positions and regions.