Applying the deterministic information from the method of bounds to
each and every precinct-level quantity of interest provides very
substantial improvements and makes inferences especially robust to
aggregation bias. Goodman's regression does not restrict the
quantities of interest (which are proportions) even to the [0,1]
interval. Many have suggested modifying Goodman's regression by
restricting these aggregate quantities of interest to this interval,
but this results in implausible corner solutions and, more
importantly, imposes no restrictions on any of the individual precinct
quantities. In contrast, the method offered here uses the bounds on
the quantities of interest in every precinct, most of which turn out
in practice to be much narrower than [0,1]. Because, also, these
bounds are known with certainty, this procedure adds a surprising
amount of information to the statistical model.
This combination of the precinct-level deterministic bounds with a statistical model unifies the two primary competing parts of the ecological inference literature. First, by treating each precinct in isolation, the method uses all available information to give a range of possible values for its precinct-level quantities of interest. Then, in order to close in further on the right answer, the statistical model ``borrows strength'' from all the other precincts in the data set to give the probable location of each true quantity of interest within its known deterministic bounds.
The method introduced also includes a model of variability that matches the patterns in real aggregate data and that is internally consistent even in the presence of areal units that are modified. This and other features provide another significant boost in the performance of the model. Extensions of the model allow for the model assumptions to be evaluated, modified, or dropped, and for several types of external information to be included. A fully nonparametric version is also provided.
Some features of the model are related in part to variable parameter models in econometrics (e.g., Swamy, 1971); empirical Bayesian models in statistics and biostatistics (Efron and Morris, 1973; Rubin, 1980; Breslow, 1990); Manski's (1995) approach to identification via parameter bounds; models of multiple imputation for missing values in surveys (Rubin, 1987) and for coarse data problems (Heitjan, 1989; Heitjan and Rubin, 1990); hierarchical linear models in education research (Bryk and Raudenbush, 1992); and ``inverse problems'' in tomographic imaging (Vardi et al., 1985; Johnstone and Silverman, 1990). The solution to the ecological inference problem offered here is also related to some statistical models for the aggregation of individual-level continuous variables developed in econometrics (Stoker, 1993), as described in Section 14.3.