Authors: Stefano Iacus, Gary King, Giuseppe Porro
This program is designed to improve the estimation of causal effects via an extremely powerful method of matching that is widely applicable and exceptionally easy to understand and use (if you understand how to draw a histogram, you will understand this method). The program implements the Coarsened Exact Matching (CEM) algorithm described in:
"Causal Inference Without Balance Checking: Coarsened Exact Matching" (Political Analysis, 2012) and "Multivariate Matching Methods That are Monotonic Imbalance Bounding" (JASA, 2011), “CEM: Coarsened Exact Matching in Stata” (Stata Journal, 2009, with Matthew Blackwell), “CEM: Software for Coarsened Exact Matching.” (Journal of Statistical Software, 2009), “A Theory of Statistical Inference for Matching Methods in Causal Research” (2017). See also An Explanation of CEM Weights.
Matching is a nonparametric method of preprocessing data to control for some or all of the potentially confounding influence of pretreatment control variables by reducing imbalance between the treated and control groups. After preprocessing in this way, any method of analysis that would have been used without matching can be applied to estimate causal effects, although some methods will have even better properties. CEM is a Monotonoic Imbalance Bounding (MIB) matching method --- which means that the balance between the treated and control groups is chosen by the user ex ante rather than discovered through the usual laborious process of checking after the fact and repeatedly reestimating, and so that adjusting the imbalance on one variable has no effect on the maximum imbalance of any other. CEM also strictly bounds through ex ante user choice both the degree of model dependence and the average treatment effect estimation error, eliminates the need for a separate procedure to restrict data to common empirical support, meets the congruence principle, is robust to measurement error, works well with multiple imputation methods for missing data, can be completely automated, and is extremely fast computationally even with very large data sets. After preprocessing data with CEM, the analyst may then use a simple difference in means or whatever statistical model they would have applied without matching. CEM also works well for multicategory treatments, determining blocks in experimental designs, and evaluating extreme counterfactuals.
CEM has officially been "Qualified for Scientific Use" by the U.S. Food and Drug Administration.
-
Reporting Bugs and Issues: Please use our Github Issue form.
-
Questions and feature requests: Discuss the software on our Discussions page.
-
CEM Package for R:
- Can be installed from CRAN: install.packages(“cem”)
-
To install, from R:
library(devtools); (install.packages(“devtools”) first if necessary)
install_github(“https://github.com/IQSS/cem.git”) - For documentation, from R, type library(cem), and then ?cem (or the published Journal of Statistical Software version)
-
Github repository: https://github.com/IQSS/cem
-
CEM in MatchIt for R: Most of the features of CEM are also available through the R Package MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.
-
CEM for SAS, by Stefano Verzillo, Paolo Berta, and Matteo Bossi
Download the SAS CEM Macro (Version: 2/2017, Questions: stefano.verzillo@ec.europa.eu)
See also JSCS article: "%CEM: A SAS macro to perform coarsened exact matching"
-
CEM for Stata (version 10 or later):
-
To install, type:
net from https://www.mattblackwell.org/files/stata
net install cem -
You can also install from the SSC:
ssc install cem -
For documentation, type help cem or download PDF (or the published version in The Stata Journal: PDF).
-
To install, type:
-
CEM for SPSS: Website
-
CEM for SQL (works with billions of observations): ZaliQL
-
CEM for Python: on github