
Next: The Problem Up: Chapter 1: Qualitative Overview Previous: Chapter 1: Qualitative Overview
Contrary to the pessimistic claims in the ecological inference literature (since Robinson, 1950), aggregate data are sometimes useful even without inferences about individuals. Studies of incumbency advantage, the political effects of redistricting plans, forecasts of macro-economic conditions, and comparisons of infant mortality rates across nations are just a few of the cases where both questions and data coincide at the aggregate level.
Nevertheless, even studies such as these that ask questions about aggregates can usually be improved with valid inferences about the individuals who make up the aggregates. And more importantly, numerous other questions exist for which only valid ecological inferences will do.
Fundamental questions in most empirical subfields of political science require ecological inferences. Researchers in many other fields of academic inquiry, as well as the real world of public policy, also routinely try to make inferences about the attributes of individual behavior from aggregate data. If a valid method of making such inferences were available, scholars could provide accurate answers to these questions with ecological data, and policymakers could base their decisions on reliable scientific techniques. Many of the ecological inferences pursued in these other fields are also of interest to political scientists, which reemphasizes the close historical connection between the ecological inference problem and political science research. The following list represents a small sample of ecological inferences that have been attempted in a variety of fields.
- In American public policy, ecological inferences are required to implement key features of federal law. For example, the U.S. Voting Rights Act of 1965 (and its extensions in 1970, 1975, and 1982) prohibited voting discrimination on the basis of race, color, or language. If discrimination is found, the courts or the U.S. Justice Department can order a state or local jurisdiction to redistrict its political boundaries, or to impose or prevent various other changes in electoral laws. Under present law, legally significant discrimination only exists when plaintiffs (or the Justice Department) can first demonstrate that members of a minority group (usually African American or Hispanic) vote both cohesively and differently from other voters.
Sometimes they must also prove that majority voters consistently prevent minorities from electing a candidate of their choice. Since survey data are rarely available in these cases, and because they are not often trustworthy in racially polarized contests, an application of the Voting Rights Act requires a valid ecological inference from electoral data and U.S. Census data.
Voting Rights Act assessments of minority and majority voting begins with electoral returns from precincts, the smallest geographic unit for which electoral data are available. In addition to the numbers of votes received by each candidate in a precinct, census data also gives the fraction of voters in the same precinct who are African American (or other minority) or white.
With these two sets of aggregate data, plaintiffs must make an ecological inference about how each racial group casts its ballots. That is, since the secret ballot prevents analysts from following voters into the voting booth and peering over their shoulders as they cast their ballots, the voting behavior of each racial group must be inferred using only aggregate electoral and census data. Because of the inadequacy of current methods, in some situations the wrong policies are being implemented: the wrong districts are being redrawn, and the wrong electoral laws are being changed. (Given the great importance and practicality of this problem, I will use it as a running example.)
- In one election to the German Reichstag in September 1930, Adolf Hitler's previously obscure and electorally insignificant National Socialist German Worker's party became the Weimar Republic's second largest political party. The National Socialists continued their stunning electoral successes in subsequent state, local, and presidential elections, and ultimately reached 37.3% of the vote in the last election prior to their taking power. As so many have asked, how could this have happened? Who voted for the Nazis (and the other extreme groups)? Was the Nazi constituency dominated by the downwardly mobile lower middle class or was support much more widespread? Which religious groups and worker categories supported the National Socialists? Which sectors of which political parties lost votes to the Nazis? The data available to answer these questions directly include aggregate data from some of the 1,200 Kreise (districts) for which both electoral data and various census data are available. Because survey data are not available, accurate answers to these critical questions will only be possible with a valid method of ecological inference (see Hamilton, 1982; Childers, 1983; and Falter, 1991).
- Epidemiologists and public policy makers need to know whether and to what extent residential levels of radioactive radon are a risk factor for lung cancer (Stidley and Samet, 1993; Greenland and Robins, 1994a). Radon leaks through basement floors and may pose a significant health risk. Legislators in many states are considering bills that would require homeowners to test for radon and, if high levels are found, to install one of several mechanical means of reducing future exposure.
Policymakers' decisions about such legislation obviously depend in part on the demonstrated health effects of radon. Unfortunately, collecting random samples of individual-level data would be impractical, as it would require measures of radon exposure over many years for each subject. Moreover, because only a small fraction of people with or without radon exposure get lung cancer, and because other variables like smoking are powerful covariates, reliably estimating the differences in lung cancer rates for those with different levels of radon exposure in an individual-level study would require measurements for tens of thousands of individuals. This would be both prohibitively expensive and ethically unacceptable without altering the radon levels for individuals in a way that would probably also ruin the study. Researchers have tried case-control studies, which avoid the necessity of large samples but risk sample selection bias, and extreme-case analyses of coal miners, where the effects are larger but their high levels of radon exposure makes the results difficult to extrapolate back to residential settings. The most extensive data that remain include information such as county-level counts of lung cancer deaths from the federal Centers for Disease Control, and samples of radon concentration from each county. Ecological inferences are therefore the only hope of ascertaining the dose-response effect of radon exposure from these data. Unfortunately, without a better method of making ecological inferences, the evidence from these data will likely remain inconclusive (Lubin, 1994).
- In the academic field of marketing (and its real-world counterpart), researchers try to ascertain who has bought specific products, and where advertising is most likely to be effective in influencing consumers to buy more. In many situations, researchers do not have data on the demographic and socio-economic characteristics of individuals who buy particular products, data that would effectively answer many of the research questions directly. Instead, they have extensive indirect data on the average characteristics of people in a geographic area, such as at the level of the zip code (or sometimes 9-digit zip code) in the United States. Researchers generally also have information from the company about how much of a product was sold in each of these areas. The question is, given the number of new products sold in each geographic area and, for example, the fraction of households in each area that have children, are in the upper quartile of income, are in single-parent families, or have other characteristics, how does demand for the product vary by these characteristics within each community? Only with a valid ecological inference in each geographic area can researchers learn the answers they seek. With this information, scholars will be able to study how product demand depends on these family and individual characteristics, and companies will be able to decide how to target advertising to consumers likely to be interested in their products.
- Since voter surveys are neither always possible nor necessarily reliable, candidates for political office study aggregate election returns in order to decide what policies to favor, and also to tailor campaign appeals. Understanding how the support for policies varies among demographic and political groups is critical to the connections between elected officials and their constituents, and for the smooth operation of representative democracy.
- Historians are also interested in the political preferences of demographic groups, and usually for time periods for which modern survey research had not even been invented. For example, only valid ecological inferences will enable these scholars to ascertain the extent to which working-class voters supported the Socialist party in depression-era America.
- An important sociological question is the relationship between unemployment and crime, especially as affected by race and as mediated by divorce and single parenthood. Unfortunately, the best available data are usually aggregated at the level of cities or counties (Blau and Blau, 1982; Messner, 1982; Byrne and Sampson, 1986). Official U.S. government data on race-specific crime rates (in the form of the Uniform Crime Report) are usually insufficient, and individual-level survey data are in very short supply and, because they are based on self-reports, are often of dubious quality (Sampson, 1987). Only better data or a valid method of ecological inference will enable scholars to determine the critical linkages between unemployment, family disruption, race, and crime.
- The ecological inference problem, and other related aggregation problems, are central to the discipline of economics, as explained by Theil in his classic study (1954: 1): ``A serious gap exists between the greater part of rigorous economic theory and the pragmatic way in which economic systems are empirically analyzed. Axiomatically founded theories refer mostly to individuals, for instance the consumer or the entrepreneur. Empirical descriptions of economic actions in large communities, on the other hand, are nearly always extremely global: they are confined to the behavior of groups of individuals. The necessity of such a procedure can scarcely be questioned
But the introduction of relations pretending to describe the reactions of groups of individuals instead of single individuals raises questions of fundamental importance, which are not very well understood.'' Economists have made much progress in clarifying the links between microeconomic and macroeconomic behavior in the more than forty years since these words were written (see Stoker, 1993). They also have some good survey data, and much more impressive formal theories, but a method of ecological inference would enable economists to evaluate some of their sophisticated individual-level theoretical models more directly. This would be especially important in a field where there is much reason to value individual responses to surveys less than revealed preference measures that are best gathered at the aggregate level. Economists are also interested in developing models of aggregate economic indicators that are built from and consistent with individual-level economic theories and data, even when the individual level is not of direct interest (see Section 14.3).
- A controversial issue in education policy is the effects of school choice voucher programs, where states or municipalities provide vouchers to students who cannot afford to attend private schools. Private schools are then composed of students from wealthy families and from those who pay with state vouchers. One of the many substantive and methodological issues in this field is determining the differential performance of students who take advantage of the voucher system to attend private schools, compared to those who would be there even without the program. Thus, data exist on aggregate school-level variables such as the dropout rate or the percent who attend college, as well as on the proportion of each private school's students who paid with a voucher. Because of privacy concerns, researchers must make ecological inferences in order to learn about the fraction of voucher students who attend college, or the fraction of non-voucher students who drop out.
The point of this list is to provide a general sense of the diversity of questions that have been addressed by (necessarily) inadequate methods of ecological inference. No tiny sample of ecological inferences such as this could do justice to the vast array of important scholarly and practical questions about individual attributes for which only aggregate data are available.

Next: The Problem Up: Chapter 1: Qualitative Overview Previous: Chapter 1: Qualitative Overview
Gary King
Mon Jan 27 13:02:30 EST 1997