We address a major discrepancy in matching methods for causal inference in observational data. Since these data are typically plentiful, the goal of matching is to reduce bias and only secondarily to keep variance low. However, most matching methods seem designed for the opposite goal, guaranteeing sample size ex ante but limiting bias by controlling for covariates through reductions in the imbalance between treated and control groups only ex post and only sometimes. (The resulting practical difficulty may explain why so many published applications do not check whether imbalance was reduced and so may not even be decreasing bias.) We introduce ``Coarsened Exact Matching'' (CEM) which, unlike most existing approaches, bounds through ex ante user choice the degree of maximal imbalance, model dependence, and causal effect estimation error; eliminates the need for a separate procedure to restrict data to common support; meets the congruence principle; is approximately invariant to measurement error; works well with multicategory treatment variables and with modern methods of imputation for missing data; is computationally efficient even with massive data sets; and is easy to understand and use. CEM can improve causal inferences in a wide range of applications, and may be preferred for simplicity of use even when it is possible to design superior methods for particular problems. We also make available open source software for R and Stata which implements all our suggestions.
Also see related research.