CEM: Coarsened Exact Matching Software

Authors:  Stefano Iacus, Gary King, Giuseppe Porro

This program is designed to improve the estimation of causal effects via an extremely powerful method of matching that is widely applicable and exceptionally easy to understand and use (if you understand how to draw a histogram, you will understand this method). The program implements the Coarsened Exact Matching (CEM) algorithm described in:

"Causal Inference Without Balance Checking: Coarsened Exact Matching" (Political Analysis, 2012) and "Multivariate Matching Methods That are Monotonic Imbalance Bounding" (JASA, 2011), “CEM: Coarsened Exact Matching in Stata” (Stata Journal, 2009, with Matthew Blackwell), “CEM: Software for Coarsened Exact Matching.” (Journal of Statistical Software, 2009), “A Theory of Statistical Inference for Matching Methods in Causal Research” (2017).  See also An Explanation of CEM Weights.

Matching is a nonparametric method of preprocessing data to control for some or all of the potentially confounding influence of pretreatment control variables by reducing imbalance between the treated and control groups. After preprocessing in this way, any method of analysis that would have been used without matching can be applied to estimate causal effects, although some methods will have even better properties. CEM is a Monotonoic Imbalance Bounding (MIB) matching method --- which means that the balance between the treated and control groups is chosen by the user ex ante rather than discovered through the usual laborious process of checking after the fact and repeatedly reestimating, and so that adjusting the imbalance on one variable has no effect on the maximum imbalance of any other. CEM also strictly bounds through ex ante user choice both the degree of model dependence and the average treatment effect estimation error, eliminates the need for a separate procedure to restrict data to common empirical support, meets the congruence principle, is robust to measurement error, works well with multiple imputation methods for missing data, can be completely automated, and is extremely fast computationally even with very large data sets. After preprocessing data with CEM, the analyst may then use a simple difference in means or whatever statistical model they would have applied without matching. CEM also works well for multicategory treatments, determining blocks in experimental designs, and evaluating extreme counterfactuals.

CEM has officially been "Qualified for Scientific Use" by the U.S. Food and Drug Administration.