next up previous external
Up: Chapter 1: Qualitative Overview Previous: The Evidence

The Method

  This section gives a brief non-mathematical sketch of the nature of the basic model introduced. Although several approaches are discussed in the methodological literature, the only method of ecological inference widely used in practice is Goodman's model, which is based on a straightforward linear regression and effectively assumes that the quantities of interest (such as the proportion of blacks and whites who vote) are constant over all precincts (see Section 3.1). Allowing these quantities to vary over the precincts and estimating them all, as is done in this book, provides far more detailed information about the individual-level relationships, and moderately improves the overall results.

Applying the deterministic information from the method of bounds to each and every precinct-level quantity of interest provides very substantial improvements and makes inferences especially robust to aggregation bias. Goodman's regression does not restrict the quantities of interest (which are proportions) even to the [0,1] interval. Many have suggested modifying Goodman's regression by restricting these aggregate quantities of interest to this interval, but this results in implausible corner solutions and, more importantly, imposes no restrictions on any of the individual precinct quantities. In contrast, the method offered here uses the bounds on the quantities of interest in every precinct, most of which turn out in practice to be much narrower than [0,1]. Because, also, these bounds are known with certainty, this procedure adds a surprising amount of information to the statistical model.gif

This combination of the precinct-level deterministic bounds with a statistical model unifies the two primary competing parts of the ecological inference literature. First, by treating each precinct in isolation, the method uses all available information to give a range of possible values for its precinct-level quantities of interest. Then, in order to close in further on the right answer, the statistical model ``borrows strength'' from all the other precincts in the data set to give the probable location of each true quantity of interest within its known deterministic bounds.

The method introduced also includes a model of variability that matches the patterns in real aggregate data and that is internally consistent even in the presence of areal units that are modified. This and other features provide another significant boost in the performance of the model. Extensions of the model allow for the model assumptions to be evaluated, modified, or dropped, and for several types of external information to be included. A fully nonparametric version is also provided.

Some features of the model are related in part to variable parameter models in econometrics (e.g., Swamy, 1971); empirical Bayesian models in statistics and biostatistics (Efron and Morris, 1973; Rubin, 1980; Breslow, 1990); Manski's (1995) approach to identification via parameter bounds; models of multiple imputation for missing values in surveys (Rubin, 1987) and for coarse data problems (Heitjan, 1989; Heitjan and Rubin, 1990); hierarchical linear models in education research (Bryk and Raudenbush, 1992); and ``inverse problems'' in tomographic imaging (Vardi et al., 1985; Johnstone and Silverman, 1990). The solution to the ecological inference problem offered here is also related to some statistical models for the aggregation of individual-level continuous variables developed in econometrics (Stoker, 1993), as described in Section 14.3.


next up previous external
Up: Chapter 1: Qualitative Overview Previous: The Evidence

Gary King
Mon Jan 27 13:02:30 EST 1997