Gary King Homepage Previous: Frequently Asked Questions Up: Frequently Asked Questions Next: In what precise statistical

How does EI relate to Goodman's regression and the method of bounds?

The only commonly used methods before EI were Goodman's regression and the method of bounds. Goodman's regression worked when the assumptions held but, as Leo Goodman made clear, it did not work when the assumptions were wrong. Within the Goodman framework, the data alone provided no information about whether the assumptions were right or wrong. The method of bounds always gave correct ranges into which the quantities of interest fell, but the ranges were often wider than was desirable (only in part because the wrong method of computing them was frequently used).

EI combines the two methods (hence resolving most controversies between adherents of these two popular approaches) and adds some additional features. Instead of there being two situations, as under Goodman's approach (i.e., the assumptions applied and the method worked or they don't and it doesn't), we now have five, only the last one of which is a problem for EI:

  1. Under EI, if the assumptions are correct, you get the right answer. For an example, see Chapter 10 or the Monte Carlos in Chapter 9.

  2. If the assumptions are wrong, EI still does ``well'' (in the sense of small MSE or squared bias) when the bounds (and other information in the tomography lines) are sufficiently informative. An important point is that the degree to which the bounds are informative can easily be assessed from the aggregate data, and so the risks of making ecological inferences are largely known. As one example, see Chapter 11.

  3. If the assumptions are wrong and the bounds are not sufficiently informative, but the diagnostics are sufficiently informative, then the assumptions can easily be changed, and EI will do well. The analyses reported in Figure 9.5 (p. 179) and the left graph in Figure 13.2 (p. 238) for aggregation bias and Figures 9.7 and 9.9 (Pp. 187, 195) for distributional violations are examples. The third assumption, no spatial autocorrelation, seems to have minor effects.

  4. If the assumptions are wrong and the bounds and the diagnostics are not sufficiently informative, but the researcher has additional qualitative knowledge of the problem, then appropriate assumptions can be chosen. In this case, either EI will do well, or the formal measures of uncertainty produced by EI (standard errors and confidence intervals, etc., which are based only upon the data and model) can be supplemented and expanded accordingly. Since the ecological inference problem is about information that has been aggregated away, only by adding some information is it possible to make reliable inferences in general. Qualitative information is of course subject to more interpretation and hence more uncertainty, but reliable inferences permit no other option other than to add assumptions or other information. The book discusses a lot of ways to bring in qualitative information (see also Gary King, Robert Keohane, and Sidney Verba. 1994. Designing Social Inquiry: Scientific Inference in Qualitative Data. Princeton University Press).

  5. If the assumptions are wrong and the bounds and the diagnostics are not sufficiently informative, and the researcher has no time or resources to collect additional qualitative information, then EI will perform poorly. An example of data like this appear in Figure 9.2 (p. 163). Even in this worst case scenario, and the others, EI will be more robust than Goodman's. By this I mean that the maximum amount of bias from EI is capped at a fixed and knowable level, in contrast to Goodman's approach. The dotted line (corresponding to $ \tau=0$ for the default model) in Figure 9.6 (p. 180) shows that bias in EI estimates increases with the degree of aggregation bias for small levels of aggregation bias; at some point, however, the maximum bias maxes out and increases no further. The point at which the error maxes out depends on the data. Under Goodman's approach, the error linearly increases without limit as aggregation bias increases.
The likelihood of the first four cases coming up relative to the fifth (as compared to the likelihood of the assumptions applying vs not applying under Goodman's) summarizes the advantage of EI. Basically what EI does is to chip off pieces of Goodman's worst case (the assumptions not applying). The benefits of EI will therefore quite obviously depend on the area and application and how much effort is put into collecting qualitative information.



Gary King 2006-09-13