Gary King Homepage Previous: What global values should Up: Frequently Asked Questions Next: What do I do

What do I do if my model doesn't fit the data?

There are several approaches, depending on the problem. Don't miss the first item.
  1. The most important technical problem with lack of fit occurs because of numerical issues related to imprecision in the lnCDFbvn function used in the likelihood function (the volume above the unit square under the bivariate normal, Equation 7.2, p.134). This imprecision can induce artificial local maxima in the likelihood function, leading to convergence at the wrong parameter values. It can also create artificial maxima higher than the correct global maximum. These problems occur most often when the maximization routine is looking far from optimal value. A good way to fix both problems is to give the program starting values in the region of the right answer (set _Estval), and constrain the search to a region that includes these values (set _Ebounds). One way to find better starting values, it is to pick the parameter values by looking at a tomography plot, as if we were on the truncated scale. That is, identify the region from where the ``emissions'' are coming (roughly, where the tomography lines are crossing) and record the coordinates for betaB and betaW, the width of the emissions at the central point, and then a likely correlation (or 0). Then transform these onto the scale of estimation with Equations 7.4 (p.136), or use eireparinv to do this automatically. Then set _Ebounds to regions around those starting values -- not too narrow because you determine the answer (a concern if the parameters turn out to be maximized at boundary values), and not too wide because you may run into numerical problems. If this does not work, it will be helpful to narrow _Ebounds. The new grid search procedure is especially helpful here (set _Estval to 0 or $ >5$). If it is an especially difficult problem, you may need to change the tolerance of the lncdfbvn function, with _EcdfTol.

  2. If you have very small values of $ T$, see the FAQ question below.

  3. One common problem is coding errors, or small precincts, for which $ T_i$ is very close to 0 or 1 (look at the corners of eigraph's tomog for a count of these); if these values are outliers, they can have a disproportionate effect on the likelihood results, despite the fact that in many applications $ T$ only gets to the corners when there are data errors. To delete them from the estimation stage but include them in the simulation stage, you could set
    _Eselect=(t.$ >$0.001).and(t.$ <$0.999);
    or perhaps an even narrower range would be wise. You could also delete them from the data set to skip both stages.

  4. Do you suspect extensive aggregation bias? Perhaps you should try the globals _Eeta=3; _EalphaB=0$ \sim$0.1; and _EalphaW=0$ \sim$0.1; to start (see Chapter 9).

  5. Does eigraph's tomog suggest multiple modes? Consider specifying Zb or Zw coded to pick up the modes.

  6. Is eiread's resamp much larger than 20? If so, you might try using a $ t$ distribution as the first approxiation for importance sampling by setting _EisT to 3 or higher, or adjusting _EisFac (usually downwards or set to $ -1$, especially if tomog fits but tomogP does not) or _Eisn (always upwards) (e.g., you might try _EisFac=1 or _EisFac=$ -1$).

  7. If eigraph's estsims does not look approximately like tomogp, or if the graphs in post are bimodal, you need to do something. You may try a different method of computing the variance matrix. You could also narrow the variance of the priors on $ \sigma_b$, $ \sigma_w$, and $ \rho$ by setting _Esigma and _Erho. Or more simply, you could use the maximum likelhood solution and set _EisFac=-2;.

  8. If the relationship between $ X_i$ and $ \beta_i^b$ or $ \beta_i^w$ does not correspond to your substantive knowledge of the problem, consider setting _Eeta=3 and adding a prior on $ \alpha^b$ and $ \alpha^w$ (with _EalphaB and _EalphaW).

  9. If you have additional information in the form of survey or qualitative evidence, you could change the priors, add covariates in Zb or Zw, or divide the data set.
See Chapters 9 and 16 for more detailed suggestions.



Gary King 2006-09-13