Gary King Homepage Previous: How can I set Up: Frequently Asked Questions Next: How did you generate

How can I use $ {\mathfrak{C}}$larify to analyze compositional data?



The procedure involves four basic steps:

  1. Run tlogit to transform the vote shares (or other compositional data) into log ratios
  2. Run estsimp sureg to estimate a seemingly unrelated regression and simulate the parameters
  3. Run setx to choose real or hypothetical values for the explanatory variables ($ X$'s)
  4. Run simqi with the tfunc() option to simulate the distribution of votes, conditional on the simulated parameters and chosen $ X$'s.

Suppose that we are studying a political system with 500 electoral districts. Each observation or row in the dataset pertains to one of those districts. In this example, we have three political parties that each garner a percentage of the vote. Their vote shares, collected in variables v1, v2, and v3, sum to 100 percent.

First, we select party 3 as our reference party and transform the vote shares of the other two parties into log ratios with respect to party 3. Thus, $ y1 = ln(v1/v3)$ and $ y2 = ln(v2/v3)$. The appropriate syntax in $ {\mathfrak{C}}$larify is tlogit v1 y1 v2 y2, base(y3) percent, which will create two new variables: y1 and y2, which are the log ratios for v1 and v2 with respect to the base variable v3.

Second, use the estsimp command to run a seemingly unrelated regression model with the log ratios y1 and y2 as our dependent variables. The syntax is estsimp sureg (y1 x1 x2) (y2 x3 x4). Each equation is enclosed in parentheses. Thus, the first equation states that the log ratio y1 is a linear function of the explanatory variables x1 and x2. The program will automatically add a constant term, as well, unless the user asks that it be suppressed. Likewise, the second equation states that y2 is a linear function of x3, x4, and a constant. The estsimp command will estimate the model and simulate the parameters. By default, estsimp will draw 1000 values for each parameter. In this example, the program would draw 1000 sets of betas (each set has six elements: three betas for equation 1 and three for equation two); the program would also generate 1000 simulations of $ \Sigma$, a 2x2 matrix that governs the relationship between the errors of the two equations. Clarify will store these simulations in memory for subsequent use.

Third, use the setx command to choose some hypothetical or real values for our explanatory variables. For instance, type setx (x1 x2) mean x3 15 x4 p20 to set variables x1 and x2 at their respective means, x3 equal to the number 15, and x4 equal to its twentieth percentile.

Finally, use the simqi command to simulate quantities of interest, such as the predicted distribution of votes. The command is simqi, pv tfunc(logiti), where tfunc(logiti) tells the program to apply the inverse logistic function to transform the log ratios into shares of the total vote.



Gary King 2006-01-04