Gary King Homepage Previous: Example: Up: EI Next: Output Global:

Globals:

_EalphaB
(cols(Zb)$ \times 2$) matrix of means (in the first column) and standard deviations (in the second) of an independent normal prior distribution on elements of $ \alpha^b$. If you specify Zb, you should probably specify a prior, at least with mean zero and some variance (default={.}; which indicates no prior). (See Equation 9.2, page 170, to interpret $ \alpha^b$). (If you are using EzI, and have trouble setting something to 0, try 0.000001 or some such; this gets around an error in a Gauss proc and should give essentially the same answer empirically.)

_EalphaW
(cols(Zw)$ \times 2$) matrix of means (in the first column) and standard deviations (in the second) of an independent normal prior distribution on elements of $ \alpha^w$. If you specify Zw, you should probably specify a prior, at least with mean zero and some variance (default={.}; which indicates no prior). (See Equation 9.2, page 170, to interpret $ \alpha^w$). (If you are using EzI, and have trouble setting something to 0, try 0.000001 or some such; this gets around an error in a Gauss proc and should give essentially the same answer empirically.)

_Ebeta
Standard deviation of the ``flat normal'' prior on $ \breve{\mathfrak{B}}^b$ and $ \breve{\mathfrak{B}}^w$. The flat normal prior is uniform within the unit square and dropping outside the square according to the normal distribution. Set to zero for no prior (default). Setting to positive values probabilistically keeps the estimated mode within the unit square. 0.25 is a reasonable value to experiment with at first.

_Ebounds
1 if set CML bounds on parameters automatically unless z's are included; 0 if don't use bounds; $ k\times 2$ (where $ k$ is the number of starting values) or $ 1\times 2$ matrix to indicate upper$ \sim$lower bounds. (Do not confuse the bounds referred to here with the bounds on the quantities of interest.) Default=1.

_Ecdfbvn
Determines which procedure to use for computing the area of the bivariate normal distribution above the unit square: 1 based on the Gauss function CDFBVN; 2 Martin van der Ende's method (based on D.R. Divgi, ``Calculation of the univariate and bivariate Normal integral,'' Annals of Statistics, 1979, 903-910, with additional options available for this method in the proc cdfbvn_div); 3 Integration of log of the unit square; 4 Direct integration on unit square; 5, fairly accurate and fast, based on direct integration on the unit square from a new Gauss internal procedure (DEFAULT); 6, most accurate but slow, based on a cdfbvn procedure by Alan Genz (using results from Drezner, Z. and G.O. Wesolowsky, 1989. ``On the computation of the bivariate normal integral,'' Journal of Statist. Comput. Simul. 35: 101-107). See Appendix F.

Option 5 (the default) appears to be the best tradeoff between speed and accuracy currently available (and so this global should not be changed to anything other than 6, which is more accurate but much slower, unless you have a good reason to do so). However, fundamental progress remains to be made on methods of integrating the bivariate normal, as all currently available methods are innacurate and jump discontinuously and for very small values. Because of this, small values are truncated at the global _EcdfTol, which you may wish to adjust.

_EcdfTol
Tolerance for the lncdfbvn function (when _Ecdfbvn=5, its default), with smaller calculated values truncated at the value of this global (DEFAULT=2.220446e-11). This can be any positive number, although lncdfbvn gets imprecise for small values. Only set to smaller values if you think you need the precision, such as if most of your values of $ T_i$ or $ X_i$ are very small.

_Echeck
1 check inputs and globals and give nice error messages if problems (default); 0 don't check, which saves some time. There is little reason to choose 0 unless you are running a large number of estimations and you are certain all the inputs are correctly specified. (Inessential global: not stored in dbuf.)

_EdirTol
direction tolerance for CML convergence. Default=0.0001. Set to smaller values if most of your values of $ T_i$ or $ X_i$ are very small.

_EdoML
1 do maximum likelihood (default); 0 don't do maximum likelihood, using instead the values of $ \phi$ stored in _EdoML_phi and vcphi in _EdoML_vcphi.

_EdoML_phi
if _EdoML$ =1$, this should include a vector of values of $ \phi$ and will be used instead of the output of the likelihood maximization. (This global is ignored unless _EdoML=1.)

_EdoML_vcphi
if _EdoML=1, this should include a matrix of values of estimated variance matrix $ V(\phi)$ and will be used instead of the output of the likelihood maximization procedure. (This global is ignored unless _EdoML=1.)

_EdoSim
1 do simulations (default); 0 don't do simulations; $ -1$ don't do simulations or compute the maxlik variance (use this option for computing conditional log-likelihood of eta's).

_Eeta
Automatically includes $ X_i$ in the inputs Zb and/or Zw. The actual inputs Zb and Zw must be set to 1 if the default is changed. Using this global is better than explicitly including $ X_i$ in the inputs, because eiread and eigraph will be ``aware'' of the contents of Zb and Zw. If you set this global, it is generally best to also set the priors _EalphaB and _EalphaW. See Chapter 9, and the parameterization in Equation 9.2 (page 170). Options include:

_EI_vc
$ M\times 2$ matrix ($ M\geq 1$), each row of which represents instructions for one attempt to compute an estimated positive definite variance matrix of $ \phi$. The procedure exits after the first positive definite hessian is found. Options to include in various rows are: {1 0} the usual numerical hessian computation (using Gauss's hessp.src proc); {1 $ d$} use usual hessian procedure and then adjust eigenvalues together so they are greater than $ d$; {2 $ f$} use wide step lengths at $ f$ fraction falloff in the likelihood function; {3 $ f$} use quadradic approximation with falloff in likelihood function set at $ f$; {4 0} use a generalized inverse (to deal with singularity) and a generalized cholesky (to deal with non-positive definiteness) based on work in progress by Jeff Gill and Gary King; {5 0} use wide step lengths but check that the gradients for each are correct (and if necessary search for better ones); {-1 0} avoid the computation of the variance covariance matrix in case of non-positive definiteness and use the singular value decomposition for the multinomial normal sampling (i.e. _EisT has to be set to 0). In order to use this option, also make sure to define relatively narrow upper and lower bounds of the parameters by using _Ebounds. DEFAULT={1 0, 4 0, 2 0.1, 2 0.05, 3 0.1, 1 0.1, 1 0.2}. The variance computation only very rarely gets beyond the second try.

When the likelihood surface is normal (i.e., quadratic), which is true asymptotically, all options produce identical results. In practice, this procedure is useful for ensuring that a positive definite variance matrix can be found due to numerical, rather than theoretical or empirical, difficulties, as can happen when the mode of the truncated normal is far from the unit square due to imprecision in the function that computes the bivariate normal CDF. (Another, sometimes better, way to fix these numerical problems is to reduce the variances of the priors in _Erho and _Esigma.) Because importance sampling is used after this procedure, different values of the variance matrix can produce identical estimates of the quantities of interest. Be sure to verify that the simulations are being appropriately drawn from the estimated contours (see compare the right two figures in eigraph's tomogS).

_EIgraph_bvsmth
smoothing parameter for nonparametric estimation; used only if _Enonpar=1. Default=0.08. (The same parameter controls the nonparametric bivariate density estimation for diagnostic purposes in eigraph.) See Section 9.3.2.

_EisChk
0 to do nothing (default); 1 change lnir from the scalar mean importance ratio to a (_Esims*_Eisn)$ \times$(rows($ \phi$)+1) matrix containing the log of the importance ratio as the first column and normal simulations of $ \tilde{\phi}$ as the remaining columns. Also changes PhiSims from the mean and standard deviation of the posterior phi's to a _Esims$ \times$rows($ \phi$) matrix of normal simulations of phi.

_EiLlikS
1 if save (_Esims$ \times 1$) the log-likelihoods evaluated for each simulation; 0 saves only the means of these likelihoods (default). These can be used for computing the marginal likelihood.

_EisFac
factor to multiply by estimated variance matrix in the normal approximation for use in importance sampling, or set to $ -1$ to use normal approximation only or $ -2$ to condition on the maximum posterior estimates. Adjust this, _Eisn, or _Eist if eiread's resamp larger than 15 or 20. If this is set too low, estimation variability will not be sufficient and your confidence intervals may be too narrow; it must be greater than zero and should probably be at least one. See Section 7.5. (Default=4).

_Eisn
factor to multiply by _Esims to compute the number of normals to draw before resampling. This is used to to try to get _Esims samples from exact posterior. Increase this or change _EisFac or _EisT if resamp is larger than 15 or 20. Default=10. See Section 7.5.

_EisT
0 (default) to use multivariate normal density to draw random numbers for initial approximation for importance sampling; or if greater than 2, use the multivariate Student $ t$ density, with degrees of freedom _Eist. Use this, _EisFac, or _Eisn if resamp is larger than 15 or 20. See Section 7.5.

_EmaxIter
Maximum number of iterations for CML. Default=500.

_EnonEval
Number of nonparametric density evaluations for each tomography line (default=11). Only used if _EnonPar=1.

_EnonNumInt
Number of points to evaluate for numerical integration in computing the denominator for the bivariate kernel density (default=50). Only used if _EnonPar=1.

_EnonPar
0 do not run nonparametric model (default); 1 run nonparametric model. (When choosing nonparametric estimation, only relevant options will be available under eigraph and eiread.) See Section 9.3.2.

_EnumTol
Numerical tolerance. A homogeneous precinct is one for which $ X_i<$_EnumTol or $ X_i>$(1-_EnumTol). Default is 0.0001. Set to smaller values if most of your values of $ T_i$ or $ X_i$ are very small.

_Eprt
0 print nothing; 1 print only final output from each stage; 2 also prints friendly iteration numbers etc (default); 3 also prints all sorts of checks along the way. Use eiread and eigraph instead of this global to see output. (Inessential global: not stored in dbuf.)

_Eres
If items are vput into _Eres before running ei, they are passed through into dbuf. For example, identifiers for each aggregate unit would be useful in interpreting the results, or using them in subsequent analyses (try: _Eres=vput(_Eres,caseid,"caseid") before calling EI. If a title is vput and given the name titl, the title is printed in convenient places. See eiread for further information. Do not use the name of any globals to this procedure or options listed under eiread(), or your variable will be lost.

_Erho
The first element is the standard deviation of normal prior on $ \phi_5$ for the correlation; set to 0 to fix $ \phi_5$ to a second element, _Erho[2]; set to $ -1$ to estimate without a prior. Default=0.5. _Erho should be a scalar unless the first element is 0, in which case it should be a $ 2\times 1$ vector, where the second element is the value at which the $ \phi_5$ is fixed (and not estimated). See Section 7.4.

_Eselect
Controls which observations are included in the estimation stage, including both likelihood maximization and importance sampling. All observations are included in the simulation stage unless you delete them from the data set before starting $ {\mathfrak{E}I}$. This allows users to base the truncated bivariate normal contours on a subset of observations that might be more representative (such as those for which $ T_i$ is not 0 or 1). Set to $ p\times 1$ vector to of 1's to include and 0's to exclude individual observations.

_EselRnd
Set to scalar 1 to include all observations not already deleted by _Eselect (default), or a scalar greater than 0 and less than 1 to randomly select this fraction of observations in the estimation stage. This global is especially useful for speeding up estimation in very large datasets, since thousands of observations are not always needed for estimating $ \phi$. Since all observations will still be included in the simulation stage, precinct-level estimates of all quantities of interest will still be available. (If used with EI2, each iteration of EI includes a different randomly selected set of observations.)

_Esigma
Standard deviation of an underlying normal distribution, from which a half normal is constructed as a prior for both $ \breve{\sigma}_b$ and $ \breve{\sigma}_w$. Note: the expected value under this prior is _Esigma $ \sqrt{2/\pi}
\approx$_Esigma0.8. Set to zero or negative for no prior. Default = 0.5. See Section 7.4.

_Esims
Number of simulations. Default is 100.

_Estval
For gradient methods: Scalar 1, use best guess starting values (default); or set to $ k\times 1$ vector of starting values. If _Eeta[1]=0 (its default), $ k=5$ with elements guesses of $ \phi$, that is on the scale of estimation. If you have starting values on the untruncated normal scale, $ \breve{\psi}=\{{\mathfrak{B}}^b,{\mathfrak{B}}^w,\sigma_b,\sigma_w,\rho\}$, you can reparameterize as in this example: _Estval=eireparinv(.5|.5|.2|.2|-.1). If _Eeta[1]=1, 2, 4, or 5, $ k=6$; if _Eeta=3, $ k=7$; and if covariates are used and rows(_Eeta)=4, then $ k$ is 5 plus the number of covariates included, with Zb coming before Zw.

For a grid search: Set _Estval to scalar 0 (with 5 divisions per zoom), or to a scalar integer greater than or equal to 3 for a grid search with this number of divisions per zoom. (That is, the grid search procedure divides the parameter space into a number of divisions, evaluates the likelihood for every combination of values on all the parameters, chooses the region of highest likelihood, zooms in and repeats the procedure on the narrower parameter region. This continues until differences in the parameters differ by the global _Edirtol.

_EvTol
Numerical tolerance for the conditional variance calculation. Must be greater than 0; Default is $ 1e-322$.



Gary King 2006-09-13