Finite-Sample Inference and Moderated Statistics for Asymptotically Linear Parameters

Abstract

Two important questions have received less attention than deserved in the analysis of high-dimensional biological data: (1) how can individual estimates of independent associations be derived in the context of many competing causes while avoiding model mis-specification, and (2) how can accurate small-sample inference be obtained when data-adaptive techniques are employed in such contexts. We focus on variable importance analysis in high-dimensional biological data sets with modest sample sizes, using semi-parametric statistical models. We present a method that is robust in small samples, but does not rely on arbitrary parametric assumptions, in the context of studies of gene expression and environmental exposures. Such analyses are faced not only with issues of multiple testing, but also the challenge of teasing out the associations of biological expression measures with exposure, among confounds such as age, race, and smoking. Specifically, we propose the use of targeted minimum loss-based estimation, along with a generalization of the moderated empirical Bayes statistics of Smyth, relying on the influence curve representation of a statistical target parameter to obtain estimates of variable importance measures. The result is a data-adaptive approach that can estimate individual associations in high-dimensional data, even in the presence of relatively small sample sizes.

Date
Mon, Mar 20, 2017 4:00 PM
Event
Biostatistics Seminar Series, Division of Biostatistics, University of California, Berkeley
Location
Berkeley, California, United States
Nima Hejazi
Nima Hejazi
Assistant Professor of Biostatistics

My research lies at the intersection of causal inference and machine learning, developing flexible methodology for statistical inference tailored to modern experiments and observational studies in the biomedical and public health sciences.