Published

23 June 2025

We develop and share open-source software for statistics, causal inference, and machine learning. The lab’s GitHub organization page is located at https://github.com/nshlab, where most of our open-source software is hosted. Read more about the role of open-source software in the lab’s research program here.

Consistent with our commitment to open science, we exclusively work in open-source programming languages. Most often, these include

open-source software archive

What follows is a probably non-exhaustive list of open-source software packages which members of the lab have led the development of, co-developed, or to which we have otherwise made significant contributions.

CausalTables.jl

Tools for storing, manipulating, and simulating tabular data for statistical causal inference, including a versatile Tables.jl-compliant tabular data structure associated to an underlying directed acyclic graph (DAG) and an interface to encode structural causal models (SCMs).

Website Repo Paper

haldensify

Non-parametric conditional density estimation based on the highly adaptive lasso (HAL) algorithm. Though broadly applicable, this tool was designed for estimation of the generalized propensity score, a nuisance parameter that arises in causal inference settings with continuous exposures.

Website Repo Package Paper

hal9001

An efficient implementation of the highly adaptive lasso (HAL) algorithm, a non-parametric regression estimator capable of achieving favorable convergence rates under global structural assumptions on the class of functions being estimated. Part of the TLverse project.

Website Repo Package Paper

medoutcon

Efficient cross-fitted estimation of natural direct and indirect effects and interventional direct and indirect effects in settings possibly subject to intermediate confounding. Both one-step and targeted maximum likelihood estimators are provided.

Website Repo Paper

txshift

Efficient estimation of the causal effects of additive modified treatment policies for continuous exposures, including corrections for efficient inference under outcome-dependent, two-phased sampling designs. Both one-step and targeted maximum likelihood estimators are provided.

Website Repo Package Paper

sl3

An R6-based implementation of a flexible grammar for composing arbitrary pipelines of prediction functions for standard machine learning tasks. Emphasizes the super learner ensemble modeling (or model stacking) framework. Part of the TLverse project.

Website Repo

origami

A generalized framework for applying a great variety of cross-validation schemes to arbitrary estimation functions. Part of the TLverse project.

Website Repo Package Paper

tmle3shift

Targeted maximum likelihood estimation of the causal effects of modified treatment policies for continuous exposures, including marginal structural models to provide working summaries of the effects of several independent hypothetical interventions. Part of the TLverse project.

Website Repo

sherlock

Causal machine learning and semi-parametric estimation to for population segment discovery based on treatment effect heterogeneity. Flexible techniques for defining segment-specific treatment rules and efficient estimators of the causal effects of such dynamic treatment regimes are provided. Supported by Netflix Research.

Website Repo Paper

cvCovEst

Asymptotically optimal, cross-validated, loss-based selection of covariance matrix estimators, tailored for use in high-dimensional settings.

Website Repo Package Paper

scPCA

Sparse contrastive principal component analysis, facilitating the recovery of stable and low-dimensional patterns from high-dimensional biological data while removing technical artifacts through the use of negative controls.

Repo Package Paper

biotmle

Model-agnostic discovery of biomarkers from biological sequencing and expression data, introducing a hypothesis testing strategy that applies standard variance moderation to stabilize semi-parametric estimators in small-sample settings.

Website Repo Package Paper

medshift

Estimation of population intervention direct and indirect effects based on stochastic interventions. Classical and asymptotically efficient estimators for the effects of incremental propensity score interventions are supported.

Website Repo

survtmle

Targeted maximum likelihood estimation of marginal cumulative incidence in right-censored survival settings with and without competing risks, including estimation procedures that respect bounds.

Repo

LtAtStructuR

Restructuring of a collection of time-stamped measurements (e.g., electronic health record data) into a standard long-format analytic dataset suitable for evaluating the causal effects of multiple time-point interventions in the presence of time-dependent confounding or selection bias.

Repo