Next: The Problem
Up: Chapter 1: Qualitative Overview
Previous: Chapter 1: Qualitative Overview
Contrary to the pessimistic claims in the ecological inference literature
(since Robinson, 1950), aggregate data are sometimes useful
even
without inferences about individuals. Studies of incumbency
advantage, the political effects of redistricting plans, forecasts of
macro-economic conditions, and comparisons of infant mortality rates
across nations are just a few of the cases where both questions and
data coincide at the aggregate level.
Nevertheless, even studies
such as these that ask questions about aggregates can usually be
improved with valid inferences about the individuals who make up the
aggregates. And more importantly, numerous other questions exist for
which only valid ecological inferences will do.
Fundamental questions in most empirical subfields of political science require
ecological inferences. Researchers in many other fields of academic inquiry,
as well as the real world of public policy, also routinely try to make
inferences about the attributes of individual behavior from aggregate data. If
a valid method of making such inferences were available, scholars could provide
accurate answers to these questions with ecological data, and policymakers
could base their decisions on reliable scientific techniques. Many of the
ecological inferences pursued in these other fields are also of interest to
political scientists, which reemphasizes the close historical connection
between the ecological inference problem and political science research. The
following list represents a small sample of ecological inferences that have
been attempted in a variety of fields.
- In American public policy, ecological inferences are required to
implement key features of federal law. For example, the U.S. Voting Rights
Act of 1965 (and its extensions in 1970, 1975, and 1982) prohibited voting
discrimination on the basis of race, color, or language. If discrimination is
found, the courts or the U.S. Justice Department can order a state or local
jurisdiction to redistrict its political boundaries, or to impose or prevent
various other changes in electoral laws. Under present law, legally significant
discrimination only exists when plaintiffs (or the Justice Department) can
first demonstrate that members of a minority group (usually African American or
Hispanic) vote both cohesively and differently from other voters.
Sometimes they must also prove that majority voters consistently prevent
minorities from electing a candidate of their choice. Since survey data are
rarely available in these cases, and because they are not often trustworthy in
racially polarized contests, an application of the Voting Rights Act requires a
valid ecological inference from electoral data and U.S. Census data.
Voting Rights Act assessments of minority and majority voting begins with
electoral returns from precincts, the smallest geographic unit for which
electoral data are available. In addition to the numbers of votes received by
each candidate in a precinct, census data also gives the fraction of voters in
the same precinct who are African American (or other minority) or
white.
With these two
sets of aggregate data, plaintiffs must make an ecological inference about how
each racial group casts its ballots. That is, since the secret ballot prevents
analysts from following voters into the voting booth and peering over their
shoulders as they cast their ballots, the voting behavior of each racial group
must be inferred using only aggregate electoral and census data. Because of
the inadequacy of current methods, in some situations the wrong policies are
being implemented: the wrong districts are being redrawn, and the wrong
electoral laws are being changed. (Given the great importance and practicality
of this problem, I will use it as a running example.)
- In one election to the German Reichstag in September 1930, Adolf Hitler's
previously obscure and electorally insignificant National Socialist German
Worker's party became the Weimar Republic's second largest political party.
The National Socialists continued their stunning electoral successes in
subsequent state, local, and presidential elections, and ultimately reached
37.3% of the vote in the last election prior to their taking power. As so
many have asked, how could this have happened? Who voted for the Nazis (and
the other extreme groups)? Was the Nazi constituency dominated by the
downwardly mobile lower middle class or was support much more widespread?
Which religious groups and worker categories supported the National Socialists?
Which sectors of which political parties lost votes to the Nazis? The data
available to answer these questions directly include aggregate data from some
of the 1,200 Kreise (districts) for which both electoral data and various
census data are available. Because survey data are not available, accurate
answers to these critical questions will only be possible with a valid method
of ecological inference (see Hamilton, 1982; Childers, 1983; and Falter, 1991).
- Epidemiologists and public policy makers need to know whether and to what
extent residential levels of radioactive radon are a risk factor for
lung cancer
(Stidley and Samet, 1993; Greenland and Robins, 1994a). Radon leaks through
basement floors and may pose a significant health risk. Legislators in many
states are considering bills that would require homeowners to test for radon
and, if high levels are found, to install one of several mechanical means of
reducing future exposure.
Policymakers' decisions about such legislation obviously depend in
part on the demonstrated health effects of radon. Unfortunately,
collecting random samples of individual-level data would be
impractical, as it would require measures of radon exposure over many
years for each subject. Moreover, because only a small fraction of
people with or without radon exposure get lung cancer, and because
other variables like smoking are powerful covariates, reliably
estimating the differences in lung cancer rates for those with
different levels of radon exposure in an individual-level study would
require measurements for tens of thousands of individuals. This would
be both prohibitively expensive and ethically unacceptable without
altering the radon levels for individuals in a way that would probably
also ruin the study. Researchers have tried case-control studies,
which avoid the necessity of large samples but risk sample selection
bias, and extreme-case analyses of coal miners, where the effects are
larger but their high levels of radon exposure makes the results
difficult to extrapolate back to residential settings. The most
extensive data that remain include information such as county-level
counts of lung cancer deaths from the federal Centers for Disease
Control, and samples of radon concentration from each county.
Ecological inferences are therefore the only hope of ascertaining the
dose-response effect of radon exposure from these data.
Unfortunately, without a better method of making ecological
inferences, the evidence from these data will likely remain
inconclusive (Lubin, 1994).
- In the
academic field of marketing (and its real-world counterpart),
researchers try to ascertain who has bought specific products, and
where advertising is most likely to be effective in influencing
consumers to buy more. In many situations, researchers do not have
data on the demographic and socio-economic characteristics of
individuals who buy particular products, data that would effectively
answer many of the research questions directly. Instead, they have
extensive indirect data on the average characteristics of people in a
geographic area, such as at the level of the zip code (or sometimes
9-digit zip code) in the United States. Researchers generally also
have information from the company about how much of a product was sold
in each of these areas. The question is, given the number of new
products sold in each geographic area and, for example, the fraction
of households in each area that have children, are in the upper
quartile of income, are in single-parent families, or have other
characteristics, how does demand for the product vary by these
characteristics within each community? Only with a valid ecological
inference in each geographic area can researchers learn the answers
they seek. With this information, scholars will be able to study how
product demand depends on these family and individual characteristics,
and companies will be able to decide how to target advertising to
consumers likely to be interested in their products.
- Since voter
surveys are neither always possible nor necessarily reliable,
candidates for political office study aggregate election returns in
order to decide what policies to favor, and also to tailor campaign
appeals. Understanding how the support for policies varies among
demographic and political groups is critical to the connections
between elected officials and their constituents, and for the smooth
operation of representative democracy.
- Historians are also
interested in the political preferences of demographic groups, and
usually for time periods for which modern survey research had not even
been invented. For example, only valid ecological inferences will
enable these scholars to ascertain the extent to which working-class
voters supported the Socialist party in depression-era America.
- An
important sociological question is the relationship between
unemployment and crime, especially as affected by race and as mediated
by divorce and single parenthood. Unfortunately, the best available
data are usually aggregated at the level of cities or counties (Blau
and Blau, 1982; Messner, 1982; Byrne and Sampson, 1986). Official
U.S. government data on race-specific crime rates (in the form of the
Uniform Crime Report) are usually insufficient, and individual-level
survey data are in very short supply and, because they are based on
self-reports, are often of dubious quality (Sampson, 1987). Only
better data or a valid method of ecological inference will enable
scholars to determine the critical linkages between unemployment,
family disruption, race, and crime.
- The ecological inference
problem, and other related aggregation problems, are central to the
discipline of economics, as explained by Theil in his classic study
(1954: 1): ``A serious gap exists between the greater part of rigorous
economic theory and the pragmatic way in which economic systems are
empirically analyzed. Axiomatically founded theories refer mostly to
individuals, for instance the consumer or the entrepreneur. Empirical
descriptions of economic actions in large communities, on the other
hand, are nearly always extremely global: they are confined to the
behavior of groups of individuals. The necessity of such a procedure
can scarcely be questioned
But the introduction
of relations pretending to describe the reactions of groups of
individuals instead of single individuals raises questions of
fundamental importance, which are not very well understood.''
Economists have made much progress in clarifying the links between
microeconomic and macroeconomic behavior in the more than forty years
since these words were written (see Stoker, 1993). They also have
some good survey data, and much more impressive formal theories, but a
method of ecological inference would enable economists to evaluate
some of their sophisticated individual-level theoretical models more
directly. This would be especially important in a field where there
is much reason to value individual responses to surveys less than
revealed preference measures that are best gathered at the aggregate
level. Economists are also interested in developing models of
aggregate economic indicators that are built from and consistent with
individual-level economic theories and data, even when the individual
level is not of direct interest (see Section 14.3). - A
controversial issue in education policy is the effects of school
choice voucher programs, where states or municipalities provide
vouchers to students who cannot afford to attend private schools.
Private schools are then composed of students from wealthy families
and from those who pay with state vouchers. One of the many
substantive and methodological issues in this field is determining the
differential performance of students who take advantage of the voucher
system to attend private schools, compared to those who would be there
even without the program. Thus, data exist on aggregate school-level
variables such as the dropout rate or the percent who attend college,
as well as on the proportion of each private school's students who
paid with a voucher. Because of privacy concerns, researchers must
make ecological inferences in order to learn about the fraction of
voucher students who attend college, or the fraction of non-voucher
students who drop out.
The point of this list is to provide a general sense of the diversity
of questions that have been addressed by (necessarily) inadequate
methods of ecological inference. No tiny sample of ecological
inferences such as this could do justice to the vast array of
important scholarly and practical questions about individual
attributes for which only aggregate data are available.
Next: The Problem
Up: Chapter 1: Qualitative Overview
Previous: Chapter 1: Qualitative Overview
Gary King
Mon Jan 27 13:02:30 EST 1997