next up previous external
Next: The Method Up: Chapter 1: Qualitative Overview Previous: The Solution

The Evidence

As a preview of Part IV, which reports extensive evaluations of the model from a variety of data sets, this section gives just two applications, one to demonstrate the accuracy of the method and the other to portray how much more information it reveals about the problem under study. The first application provides 3,262 evaluations of the ecological inference model presented in this book--67 times as many comparisons between estimates from an aggregate model and the truth as exist in the entire history of ecological inference research. The second is a brief geographic analysis in another application that serves to emphasize how much more information about individual behavior this method provides than even the (unrealized) goal of previous methods.

The data for the first application come from the state of Louisiana, which records by precinct the number of blacks who vote and the number of whites who vote (among those registered). These data make it possible to evaluate the ecological inference model described in this book as follows. For each of Louisiana's 3,262 precincts, the procedure uses only aggregate data: the fraction of those registered who are black and the fraction of registered people turning out to vote for the 1990 elections (as well as the number registered). These aggregate, precinct-level data are then used to estimate the fraction of blacks who vote in each precinct. Finally, I validate the model by comparing these estimates to the true fractions of blacks who turn out to vote. (That is, the true fractions of black and white turnout are not used in the estimation procedure.)gif

One brief summary of the results of this analysis appears in Figure 1.1. This figure plots the estimated fraction of blacks turning out to vote in 1990 (horizontally) by the true fraction of blacks voting in that year (vertically). Each precinct is represented in the figure by a circle with area proportional to the number of blacks in the precinct. If the model estimates were exactly correct in every precinct, each circle would be centered exactly on the tex2html_wrap_inline1344 line. In fact, almost all of the 3,262 precincts fall on or near this diagonal line, demonstrating the success of this method of making inferences about individual behavior using only aggregate data. The few precincts that are farther from the line have tiny numbers of African Americans, so the vast majority of individual voters are correctly estimated.

figure2421

The results are compelling. If Figure 1.1 were merely a plot of the observed values of a variable by the fitted values of the same variable used during the estimation procedure, any empirical researcher should be pleased: the fit is extremely good. If instead the figure were based on the harder problem of making out-of-sample predictions, where past realizations were used to calibrate the prediction, the result would be even better. But the result here is even more dramatic, since the estimates in the figure were computed from only aggregate data. The true fraction of blacks turning out to vote (the vertical dimension in the figure) was not part of the estimation procedure. Moreover, no past realizations of the truth being estimated were used.

Part IV provides many more model evaluations and of many types. These evaluations include data sets for which existing methods do reasonably well at estimating the statewide average, in which case the method offered here also gives reasonable statewide results and in addition much more information in the form of correct confidence intervals and accurate results for each precinct in the state. Part IV also gives examples of data sets where existing methods are hopelessly biased, but the method offered here gives highly accurate estimates. For example, the best existing method indicates that 20% fewer males in South Carolina fall below the poverty level than there are males in that state (see Table 11.2 on page 220). In contrast, the method offered here gives accurate answers for this statewide aggregate (see Figure 11.2 and on page 222) as well as for the fraction of males in poverty in each of the 3,187 precinct-sized geographic units (see Figure 11.3 on page 223).

The book also includes situations in which almost all information was aggregated away and standard methods give even more ridiculous results; in those cases, the method described here gives reasonable results with wider confidence intervals, reflecting accurately the degree of uncertainty in the ecological inference (see Chapter 12). The method usually even gives accurate estimates when all the conditions for ``aggregation bias'' are met, when the process of aggregation eliminates most of the variation in one of the aggregate variables, and when extrapolations far from the range of observed data are necessary. In all these difficult examples, the method offered here gives accurate answers with correct confidence intervals. The method will not always work: since information is lost during aggregation, no method of ecological inference could work in all data sets. However, the procedures introduced here come with diagnostics that researchers can use to evaluate the risks and avoid the problems in most cases.

Finally, I give a brief report of an analysis of 1990 turnout by race in New Jersey's 567 minor civil divisions (mostly cities and towns). These data cannot be used to verify ecological inferences since the true individual-level answers are not known, but they can be used to demonstrate how much more information the method offered here provides to users. The most popular existing method (Goodman's regression) gives only two numbers of relevance, the state-wide fractions of blacks who vote and whites who vote (the latter estimate, incidentally, is five standard deviations above its maximum possible value given by the method of bounds). In contrast, the solution to the ecological inference problem offered here gives reliable estimates of these two numbers for the state-wide average as well as for each of the 567 cities and towns.

figure2439

In order to emphasize the rich information this method unearths, Figure 1.2 maps the estimated degree of voter turnout among non-minorities. In this map, minor civil divisons in New Jersey are given darker shades when the estimated degree of non-minority voter turnout is higher. A few landmarks are labeled to give readers some bearing. The vast increase in information the method provides is represented by the interesting geographic variation in this map (and an additional complete map for minority turnout). For example, Figure 1.2 shows that non-minority turnout is substantially higher in the city of Newark than the neighboring city of Elizabeth. Is this because of a racial threat posed by Newark's larger minority population? Is the white mobilization in the wealthy towns of Bergen County near Englewood Cliffs a result of the state government's attempt to integrate schools by regionalizing its school districts? By providing reliable individual-level geographic-based information, the solution to the ecological inference problem can be used to raise numerous questions such as these. The method also provides opportunities for answering such questions by using the estimates provided as dependent variables in second-stage analyses (using, in this case, explanatory variables such as fraction minority population, or state attempts at integration).


next up previous external
Next: The Method Up: Chapter 1: Qualitative Overview Previous: The Solution

Gary King
Mon Jan 27 13:02:30 EST 1997