Case Control and Rare Events Bias Corrections
Hidden Region 1
We offer the first independent scholarly evaluation of the claims, forecasts, and causal inferences of the State Failure Task Force and their efforts to forecast when states will fail. State failure refers to the collapse of the authority of the central government to impose order, as in civil wars, revolutionary wars, genocides, politicides, and adverse or disruptive regime transitions. This task force, set up at the behest of Vice President Gore in 1994, has been led by a group of distinguished academics working as consultants to the U.S. Central Intelligence Agency. State Failure Task Force reports and publications have received attention in the media, in academia, and from public policy decision-makers. In this article, we identify several methodological errors in the task force work that cause their reported forecast probabilities of conflict to be too large, their causal inferences to be biased in unpredictable directions, and their claims of forecasting performance to be exaggerated. However, we also find that the task force has amassed the best and most carefully collected data on state failure in existence, and the required corrections which we provide, although very large in effect, are easy to implement. We also reanalyze their data with better statistical procedures and demonstrate how to improve forecasting performance to levels significantly greater than even corrected versions of their models. Although still a highly uncertain endeavor, we are as a consequence able to offer the first accurate forecasts of state failure, along with procedures and results that may be of practical use in informing foreign policy decision making. We also describe a number of strong empirical regularities that may help in ascertaining the causes of state failure.
Estimating Base Probabilities
Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.