Evaluating U.S. Social Security Administration Forecasts

The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals, put forward by both political parties. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods, no one in or out of government has been able to produce fully independent alternative forecasts or policy scorings. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else. We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors all turn out to be in the same potentially dangerous direction, each making the Social Security Trust Funds look healthier than they actually are. We also discover the cause of these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security and influence the forecasts, SSA's actuaries hunkered down trying hard to insulate themselves from the intense political pressures. Unfortunately, this otherwise laudable resistance to undue influence, along with their ad hoc qualitative forecasting models, led them to also miss important changes in the input data such as retirees living longer lives, and drawing more benefits, than predicted by simple extrapolations. We explain that solving this problem involves using (a) removing human judgment where possible, by using formal statistical methods -- via the revolution in data science and big data; (b) instituting formal structural procedures when human judgment is required -- via the revolution in social psychological research; and (c) requiring transparency and data sharing to catch errors that slip through -- via the revolution in data sharing & replication.

An article at Barron's about our work.

Articles and Presentations

Explaining Systematic Bias and Nontransparency in US Social Security Administration Forecasts
Konstantin Kashin, Gary King, and Samir Soneji. 2015. “Explaining Systematic Bias and Nontransparency in US Social Security Administration Forecasts.” Political Analysis, 3, 23: 336-362. Publisher's VersionAbstract

The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, other government programs, industry decision making, and the evidence base of many scholarly articles. Because SSA makes public little replication information and uses qualitative and antiquated statistical forecasting methods, fully independent alternative forecasts (and the ability to score policy proposals to change the system) are nonexistent. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else --- until a companion paper to this one (King, Kashin, and Soneji, 2015a). We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors are all in the same potentially dangerous direction, making the Social Security Trust Funds look healthier than they actually are. We extend and then attempt to explain these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security, SSA's actuaries hunkered down trying hard to insulate their forecasts from strong political pressures. Unfortunately, this otherwise laudable resistance to undue influence, along with their ad hoc qualitative forecasting models, led the actuaries to miss important changes in the input data. Retirees began living longer lives and drawing benefits longer than predicted by simple extrapolations. We also show that the solution to this problem involves SSA or Congress implementing in government two of the central projects of political science over the last quarter century: [1] promoting transparency in data and methods and [2] replacing with formal statistical models large numbers of qualitative decisions too complex for unaided humans to make optimally.

Systematic Bias and Nontransparency in US Social Security Administration Forecasts
Konstantin Kashin, Gary King, and Samir Soneji. 2015. “Systematic Bias and Nontransparency in US Social Security Administration Forecasts.” Journal of Economic Perspectives, 2, 29: 239-258. Publisher's VersionAbstract

The financial stability of four of the five largest U.S. federal entitlement programs, strategic decision making in several industries, and many academic publications all depend on the accuracy of demographic and financial forecasts made by the Social Security Administration (SSA). Although the SSA has performed these forecasts since 1942, no systematic and comprehensive evaluation of their accuracy has ever been published by SSA or anyone else. The absence of a systematic evaluation of forecasts is a concern because the SSA relies on informal procedures that are potentially subject to inadvertent biases and does not share with the public, the scientific community, or other parts of SSA sufficient data or information necessary to replicate or improve its forecasts. These issues result in SSA holding a monopoly position in policy debates as the sole supplier of fully independent forecasts and evaluations of proposals to change Social Security. To assist with the forecasting evaluation problem, we collect all SSA forecasts for years that have passed and discover error patterns that could have been---and could now be---used to improve future forecasts. Specifically, we find that after 2000, SSA forecasting errors grew considerably larger and most of these errors made the Social Security Trust Funds look more financially secure than they actually were. In addition, SSA's reported uncertainty intervals are overconfident and increasingly so after 2000. We discuss the implications of these systematic forecasting biases for public policy.

Frequently Asked Questions

You write that that no other institution makes fully independent forecasts? What about the Congressional Budget Office?

The Congressional Budget Office (CBO) uses the SSA’s fertility forecast as an input to its forecasting model. Before 2013, CBO also used SSA's mortality forecasts as inputs to its model (here is why they changed).  

CBO explains on page 103 of The 2014 Long-Term Budget Outlook: "CBO used projected values from the Social Security trustees for fertility rates but produced its own projections for immigration and mortality rates. Together, those projections imply a total U.S. population of 395 million in 2039, compared with 324 million today. CBO also produced its own projection of the rate at which people will qualify for Social Security’s Disability Insurance program in coming decades."

How many ultimate rates of mortality decline does the Social Security Administration choose?

The number of ultimate rates of mortality decline has changed over time. Between 1982 and 2011, the number chosen equaled 210 (5 broad age groups x 2 sexes x 7 causes of death x 3 cost scenarios). Since 2012, SSA reduced the number of causes of death from 7 to 5, applied uniform ultimate rates of decline for males and females, and uniformly scale the ultimate rates of decline for the low and high cost as ½ and 5/3 of the set of intermediate cost rates of decline.

How do you measure uncertainty of SSA policy scores?

As an analogy, we can think of policy scores as the coefficient (an intended causal effect) in a regression of a policy output (such as the balance or cost rate) on the treatment variable (whether or not the proposed policy is adopted) plus an error term. SSA offers no uncertainty estimates for this estimated causal effect, although of course some causal effects are likely to be better estimated or better known than others. Sometimes by known ex ante assumptions we may think the effects are known with a high degree of certainty. However, causal effects are never observed in the real world; only the policy outputs are ever observed. To empirically estimate what will happen in the real world if a policy is adopted, or to evaluate a claim about a causal effect’s size or its uncertainty in a way that makes oneself vulnerable to being proven wrong, we must rely on forecasts under present law and forecasts under the counterfactual condition of the policy being adopted. It is the uncertainty of the forecast under present law that our papers show how to estimate using the observed forecast errors. In this evaluation, we find that most of what could be observable from the impact of the causal effects are swamped by these uncertainty estimates. For example, the most recent SSA evaluation of a policy proposal gives a graphic illustration in Figure 1 which plots the point estimate of the Trust Fund Ratio for each year in the future, under both present law and a proposed law under consideration; each of these lines has uncertainty at least as large as we estimate in our paper. There is also additional uncertainty, over and above forecast errors, because we do not know exactly what would happen if the policy were actually changed, and how all the workers, beneficiaries, government officials, and others would respond under the new regime.

When is it acceptable for the Social Security Administration to bias today’s forecast towards yesterday’s forecast, producing artificially smooth forecasts over time?

Smoothing in this way can be advantageous statistically to reduce variance, and possibly mean square error if there exists no systematic bias. Unfortunately, SSA forecasts are systematically biased and so smoothing is not helpful here. Another possibility is to protect the public so that it does not worry about the future of Social Security. Whether this paternalistic position is appropriate is a normative choice of course. Our own view is that, whenever possible, the government should be in the position of giving accurate forecasts and telling the public the truth as soon as they know it. The government can and should accompany point estimates with accurate uncertainty estimates. If public officials or the public do not understand these uncertainty estimates, then it is incumbent upon government officials, and those of us who pay attention to what they do, to be good teachers. Politicians and the public may not have the time to deal with the details very often, but in our experience it is not difficult to convey important points like these.

How soon could SSA become aware of errors in their forecasts?

For all the financial indicators, the error in last year’s one-year-ahead forecast is known before this year’s forecast is issued. However, SSA receives mortality data from the National Center for Health Statistics with a 2 to 4 year lag.

Who did you interview and how did you select them?

We interviewed a sample of participants in the forecasting process, including those who try to influence the process, use the forecasts, make proposals to change the Social Security, and comment publicly or privately on the process. Our sample included current or former high and low profile public officials in Congress, the White House, and the Social Security Administration, and including Democrats, Republicans, liberals, conservatives, and those on various advisory boards. We also included some in academia and the private sector. Our design was a stratified sequential quota sample, with strata defined based on their role in the process. The sequential part of the process involved sampling and conducting interviews within each stratum until we heard the same stories and the same points sufficiently often so that we could reliably predict what the next person was going to say when prompted with the same question. We tested this hypothesis, making ourselves vulnerable to being proven wrong, by making predictions and seeing what the next person would say. Of course, each person added more color and detail and information, but at some point the information we gathered about our essential questions reached well past the point of diminishing returns and so we stopped. We found individuals by enumeration and snowball sampling techniques; we were able to find all but a few people we attempted to find, and almost everyone we asked freely gave of their time to speak with us. Part of the reason for this success in reaching people is that we promised confidentiality to each respondent; we did this whether or not they asked for it.

Related Materials

Statistical Security for Social Security
Samir Soneji and Gary King. 2012. “Statistical Security for Social Security.” Demography, 3, 49: 1037-1060 . Publisher's versionAbstract

The financial viability of Social Security, the single largest U.S. Government program, depends on accurate forecasts of the solvency of its intergenerational trust fund. We begin by detailing information necessary for replicating the Social Security Administration’s (SSA’s) forecasting procedures, which until now has been unavailable in the public domain. We then offer a way to improve the quality of these procedures due to age-and sex-specific mortality forecasts. The most recent SSA mortality forecasts were based on the best available technology at the time, which was a combination of linear extrapolation and qualitative judgments. Unfortunately, linear extrapolation excludes known risk factors and is inconsistent with long-standing demographic patterns such as the smoothness of age profiles. Modern statistical methods typically outperform even the best qualitative judgments in these contexts. We show how to use such methods here, enabling researchers to forecast using far more information, such as the known risk factors of smoking and obesity and known demographic patterns. Including this extra information makes a sub¬stantial difference: For example, by only improving mortality forecasting methods, we predict three fewer years of net surplus, $730 billion less in Social Security trust funds, and program costs that are 0.66% greater of projected taxable payroll compared to SSA projections by 2031. More important than specific numerical estimates are the advantages of transparency, replicability, reduction of uncertainty, and what may be the resulting lower vulnerability to the politicization of program forecasts. In addition, by offering with this paper software and detailed replication information, we hope to marshal the efforts of the research community to include ever more informative inputs and to continue to reduce the uncertainties in Social Security forecasts.

This work builds on our article that provides forecasts of US Mortality rates (see King and Soneji, The Future of Death in America), a book developing improved methods for forecasting mortality (Girosi and King, Demographic Forecasting), all data we used (King and Soneji, replication data sets), and open source software that implements the methods (Girosi and King, YourCast).  Also available is a New York Times Op-Ed based on this work (King and Soneji, Social Security: It’s Worse Than You Think), and a replication data set for the Op-Ed (King and Soneji, replication data set).

The Future of Death in America
Gary King and Samir Soneji. 2011. “The Future of Death in America.” Demographic Research, 1, 25: 1--38. WebsiteAbstract

Population mortality forecasts are widely used for allocating public health expenditures, setting research priorities, and evaluating the viability of public pensions, private pensions, and health care financing systems. In part because existing methods seem to forecast worse when based on more information, most forecasts are still based on simple linear extrapolations that ignore known biological risk factors and other prior information. We adapt a Bayesian hierarchical forecasting model capable of including more known health and demographic information than has previously been possible. This leads to the first age- and sex-specific forecasts of American mortality that simultaneously incorporate, in a formal statistical model, the effects of the recent rapid increase in obesity, the steady decline in tobacco consumption, and the well known patterns of smooth mortality age profiles and time trends. Formally including new information in forecasts can matter a great deal. For example, we estimate an increase in male life expectancy at birth from 76.2 years in 2010 to 79.9 years in 2030, which is 1.8 years greater than the U.S. Social Security Administration projection and 1.5 years more than U.S. Census projection. For females, we estimate more modest gains in life expectancy at birth over the next twenty years from 80.5 years to 81.9 years, which is virtually identical to the Social Security Administration projection and 2.0 years less than U.S. Census projections. We show that these patterns are also likely to greatly affect the aging American population structure. We offer an easy-to-use approach so that researchers can include other sources of information and potentially improve on our forecasts too.

Demographic Forecasting
Federico Girosi and Gary King. 2008. Demographic Forecasting. Princeton: Princeton University Press.Abstract

We introduce a new framework for forecasting age-sex-country-cause-specific mortality rates that incorporates considerably more information, and thus has the potential to forecast much better, than any existing approach. Mortality forecasts are used in a wide variety of academic fields, and for global and national health policy making, medical and pharmaceutical research, and social security and retirement planning.

As it turns out, the tools we developed in pursuit of this goal also have broader statistical implications, in addition to their use for forecasting mortality or other variables with similar statistical properties. First, our methods make it possible to include different explanatory variables in a time series regression for each cross-section, while still borrowing strength from one regression to improve the estimation of all. Second, we show that many existing Bayesian (hierarchical and spatial) models with explanatory variables use prior densities that incorrectly formalize prior knowledge. Many demographers and public health researchers have fortuitously avoided this problem so prevalent in other fields by using prior knowledge only as an ex post check on empirical results, but this approach excludes considerable information from their models. We show how to incorporate this demographic knowledge into a model in a statistically appropriate way. Finally, we develop a set of tools useful for developing models with Bayesian priors in the presence of partial prior ignorance. This approach also provides many of the attractive features claimed by the empirical Bayes approach, but fully within the standard Bayesian theory of inference.