We develop an approach to conducting large scale randomized public policy experiments intended to be more robust to the political interventions that have ruined some or all parts of many similar previous efforts. Our proposed design is insulated from selection bias in some circumstances even if we lose observations and our inferences can still be unbiased even if politics disrupts any two of the three steps in our analytical procedures and and other empirical checks are available to validate the overall design. We illustrate with a design and empirical validation of an evaluation of the Mexican Seguro Popular de Salud (Universal Health Insurance) program we are conducting. Seguro Popular, which is intended to grow to provide medical care, drugs, preventative services, and financial health protection to the 50 million Mexicans without health insurance, is one of the largest health reforms of any country in the last two decades. The evaluation is also large scale, constituting one of the largest policy experiments to date and what may be the largest randomized health policy experiment ever.
A basic feature of many field experiments is that investigators are only able to randomize clusters of individuals–-such as households, communities, firms, medical practices, schools, or classrooms–-even when the individual is the unit of interest. To recoup the resulting efficiency loss, some studies pair similar clusters and randomize treatment within pairs. However, many other studies avoid pairing, in part because of claims in the literature, echoed by clinical trials standards organizations, that this matched-pair, cluster-randomization design has serious problems. We argue that all such claims are unfounded. We also prove that the estimator recommended for this design in the literature is unbiased only in situations when matching is unnecessary and and its standard error is also invalid. To overcome this problem without modeling assumptions, we develop a simple design-based estimator with much improved statistical properties. We also propose a model-based approach that includes some of the benefits of our design-based estimator as well as the estimator in the literature. Our methods also address individual-level noncompliance, which is common in applications but not allowed for in most existing methods. We show that from the perspective of bias, efficiency, power, robustness, or research costs, and in large or small samples, pairing should be used in cluster-randomized experiments whenever feasible and failing to do so is equivalent to discarding a considerable fraction of one’s data. We develop these techniques in the context of a randomized evaluation we are conducting of the Mexican Universal Health Insurance Program.
Background: We assessed aspects of Seguro Popular, a programme aimed to deliver health insurance, regular and preventive medical care, medicines, and health facilities to 50 million uninsured Mexicans. Methods: We randomly assigned treatment within 74 matched pairs of health clusters–-i.e., health facility catchment areas–-representing 118,569 households in seven Mexican states, and measured outcomes in a 2005 baseline survey (August 2005, to September 2005) and follow-up survey 10 months later (July 2006, to August 2006) in 50 pairs (n=32 515). The treatment consisted of encouragement to enrol in a health-insurance programme and upgraded medical facilities. Participant states also received funds to improve health facilities and to provide medications for services in treated clusters. We estimated intention to treat and complier average causal effects non-parametrically. Findings: Intention-to-treat estimates indicated a 23% reduction from baseline in catastrophic expenditures (1·9% points and 95% CI 0·14-3·66). The effect in poor households was 3·0% points (0·46-5·54) and in experimental compliers was 6·5% points (1·65-11·28), 30% and 59% reductions, respectively. The intention-to-treat effect on health spending in poor households was 426 pesos (39-812), and the complier average causal effect was 915 pesos (147-1684). Contrary to expectations and previous observational research, we found no effects on medication spending, health outcomes, or utilisation. Interpretation: Programme resources reached the poor. However, the programme did not show some other effects, possibly due to the short duration of treatment (10 months). Although Seguro Popular seems to be successful at this early stage, further experiments and follow-up studies, with longer assessment periods, are needed to ascertain the long-term effects of the programme.