Robust Inference on the Causal Effects of Stochastic Interventions Under Two-Phase Sampling, with Applications in Vaccine Efficacy Trials

Abstract

Much of the focus of statistical causal inference has been devoted to assessing the effects of static interventions, which specify a fixed contrast of counterfactual intervention values to evaluate a given causal effect. Under violations of the assumption of positivity, the evaluation of such interventions faces a host of problems, chief among them non-identification and inefficiency. Stochastic interventions provide a promising solution to these fundamental issues, by allowing for the counterfactual intervention distribution to be defined as a function of its natural (observed) distribution. While such approaches are promising, real data analyses are often further complicated by economic constraints, such as when the primary variable of interest is far more expensive to collect than auxiliary covariates. Two-phase sampling schemes offer a promising solution to such problems – unfortunately, their use produces side effects that require further adjustment when inference remains the principal goal of a study. We present a novel approach for use in such settings: An augmented targeted minimum loss-based estimator for the causal effects of stochastic interventions, with guarantees of consistency, efficiency, and multiple robustness even in the presence of two-phase sampling. We illustrate the utility of employing our proposed nonparametric estimator via simulation study, demonstrating that it attains fast convergence rates even when incorporating flexible machine learning estimators; moreover, we introduce two recent open source software implementations of the methodology, the txshift and tmle3shift R packages. Using data from a recent HIV vaccine efficacy trial, we show that the proposed methodology obtains efficient inference on a parameter defined as the overall risk of HIV infection in the vaccine arm of an efficacy trial, under arbitrary posited shifts of the distribution of an immune response marker away from its observed distribution in the efficacy trial. The resultant technique provides a highly interpretable variable importance measure for ranking multiple immune responses based on their utility as immunogenicity study endpoints in future HIV-1 vaccine trials that evaluate putatively improved versions of the vaccine.

Date
Event
Berkeley Statistics Annual Research Symposium, Department of Statistics, University of California, Berkeley
Location
Berkeley, California, United States
Avatar
Nima Hejazi
PhD Candidate in Biostatistics

My research interests lie at the intersection of causal inference and machine learning, especially as applied to the statistical analysis of complex data from observational studies and experiments in the biomedical and health sciences.