Ecological inference is the process of learning about individual behavior from aggregate data. We study a partially identified linear contextual effects model for ecological inference and describe how to estimate the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. This may be regarded as a first attempt in this venerable literature to limit the scope of the key form of non-identifiability in ecological inference. To study the operating characteristics of our methodology, we have amassed the largest collection of data with known ground truth ever applied to evaluate solutions to the ecological inference problem. We collect and study 459 datasets from a variety of fields including public health, political science and sociology. The datasets contain a total of 2,370,854 geographic units (e.g., precincts), with an average of 5,165 geographic units per dataset. Our replication data are publicly available via the Harvard Dataverse (Jiang et al. 2018) and may serve as a useful resource for future researchers. For all real data sets in our collection that fit our proposed rules, our methodology reduces the width of the Duncan and Davis (1953) deterministic bound, on average, by about 45%, while still capturing the true district level parameter in excess of 97% of the time.
The enormous Nazi voting literature rarely builds on modern statistical or economic research. By adding these approaches, we find that the most widely accepted existing theories of this era cannot distinguish the Weimar elections from almost any others in any country. Via a retrospective voting account, we show that voters most hurt by the depression, and most likely to oppose the government, fall into separate groups with divergent interests. This explains why some turned to the Nazis and others turned away. The consequences of Hitler's election were extraordinary, but the voting behavior that led to it was not.
Although not widely known until much later, Al Gore received 202 more votes than George W. Bush on election day in Florida. George W. Bush is president because he overcame his election day deficit with overseas absentee ballots that arrived and were counted after election day. In the final official tally, Bush received 537 more votes than Gore. These numbers are taken from the official results released by the Florida Secretary of State's office and so do not reflect overvotes, undervotes, unsuccessful litigation, butterfly ballot problems, recounts that might have been allowed but were not, or any other hypothetical divergence between voter preferences and counted votes. After the election, the New York Times conducted a six month long investigation and found that 680 of the overseas absentee ballots were illegally counted, and no partisan, pundit, or academic has publicly disagreed with their assessment. In this paper, we describe the statistical procedures we developed and implemented for the Times to ascertain whether disqualifying these 680 ballots would have changed the outcome of the election. The methods involve adding formal Bayesian model averaging procedures to King's (1997) ecological inference model. Formal Bayesian model averaging has not been used in political science but is especially useful when substantive conclusions depend heavily on apparently minor but indefensible model choices, when model generalization is not feasible, and when potential critics are more partisan than academic. We show how we derived the results for the Times so that other scholars can use these methods to make ecological inferences for other purposes. We also present a variety of new empirical results that delineate the precise conditions under which Al Gore would have been elected president, and offer new evidence of the striking effectiveness of the Republican effort to convince local election officials to count invalid ballots in Bush counties and not count them in Gore counties.