Efficient nonparametric inference on the effects of stochastic interventions under two-phase sampling, with applications to vaccine efficacy trials


The advent and subsequent widespread availability of preventive vaccines has altered the course of public health in the twentieth century. Despite such success, efforts to develop improved vaccines to combat many high-burden diseases, including HIV, yet require the identification of immune responses indicative of protective efficacy as well as elucidation of the vaccine-induced action mechanisms of such biological responses. As in many epidemiological and clinical trials, the measurement of immune response via biological sequencing assays remains prohibitively costly to perform on many subjects. In such trials, a prominent solution is the use of two-phase sampling designs, in which the measurement of key biological markers is performed on a subset of participants in the second stage, based on knowledge of the outcome of interest and inexpensive covariates across all units in the first stage. Focusing on population-level causal quantities defined by stochastic interventions, we propose methodology for efficiently estimating such effects using data generated by preventive vaccine trials with two-phase sampling of immune responses. We evaluate two strategies for estimating these quantities: an inverse probability re-weighting technique and an augmented approach. We provide the first demonstration that the latter of the two approaches is nonparametric-efficient and multiply robust to misspecification of several nuisance functions. Further, we provide conditions under which the latter class of estimators achieves the efficiency bound in the nonparametric model. Techniques for constructing confidence intervals and hypothesis tests are presented, and an open source software implementation of the proposed methodology, the txshift R package, is introduced. We illustrate the proposed techniques using data from a recent preventive HIV vaccine trial.