In this book, I present a solution to the ecological inference problem: a method of inferring individual behavior from aggregate data that works in practice. Ecological inference is the process of using aggregate (i.e., ``ecological'') data to infer discrete individual-level relationships of interest when individual-level data are not available. Existing methods of ecological inference generate very inaccurate conclusions about the empirical world--which thus gives rise to the ecological inference problem. Most scholars who analyze aggregate data routinely encounter some form of this problem.
The ecological inference problem has been among the longest standing,
hitherto unsolved problems in quantitative social science. It was
originally raised over seventy-five years ago as the first statistical problem
in the nascent discipline of political science, and it has held back
research agendas in most of its empirical subfields. Ecological
inferences are required in political science research when
individual-level surveys are unavailable (for example, local or comparative
electoral politics), unreliable (racial politics), insufficient
(political geography), or infeasible (political history). They are
also required in
numerous areas of major significance in public policy
(for example, for applying the Voting Rights Act) and other academic
disciplines, ranging from epidemiology and marketing to sociology and
quantitative history.
Because the ecological inference problem is caused by the lack of individual-level information, no method of ecological inference, including that introduced in this book, will produce precisely accurate results in every instance. However, potential difficulties are minimized here by models that include more available information, diagnostics to evaluate when assumptions need to be modified, and realistic uncertainty estimates for all quantities of interest. For political methodologists, many opportunities remain, and I hope the results reported here lead to continued research into and further improvements in the methods of ecological inference. But most importantly, the solution to the ecological inference problem presented here is designed so that empirical researchers can investigate substantive questions that have heretofore proved intractable. Perhaps it will also lead to new theories and empirical research in areas where analysts have feared to tread due to the lack of reliable ecological methods or individual-level data.