Social science data are an unusual part of the past, present, and future of digital preservation. They are both an unqualified success, due to long-lived and sustainable archival organizations, and in need of further development because not all digital content is being preserved. This article is about the Data Preservation Alliance for Social Sciences (Data-PASS), a project supported by the National Digital Information Infrastructure and Preservation Program (NDIIPP), which is a partnership of five major U.S. social science data archives. Broadly speaking, Data-PASS has the goal of ensuring that at-risk social science data are identified, acquired, and preserved, and that we have a future-oriented organization that could collaborate on those preservation tasks for the future. Throughout the life of the Data-PASS project we have worked to identify digital materials that have never been systematically archived, and to appraise and acquire them. As the project has progressed, however, it has increasingly turned its attention from identifying and acquiring legacy and at-risk social science data to identifying on going and future research projects that will produce data. This article is about the project's history, with an emphasis on the issues that underlay the transition from looking backward to looking forward.
An essential aspect of science is a community of scholars cooperating and competing in the pursuit of common goals. A critical component of this community is the common language of and the universal standards for scholarly citation, credit attribution, and the location and retrieval of articles and books. We propose a similar universal standard for citing quantitative data that retains the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative data sets, and is consistent with most existing subfield-specific approaches. Although the digital library field includes numerous creative ideas, we limit ourselves to only those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.
In this paper, we present an overview of the Virtual Data Center (VDC) software, an open-source digital library system for the management and dissemination of distributed collections of quantitative data. (see http://TheData.org). The VDC functionality provides everything necessary to maintain and disseminate an individual collection of research studies, including facilities for the storage, archiving, cataloging, translation, and on-line analysis of a particular collection. Moreover, the system provides extensive support for distributed and federated collections including: location-independent naming of objects, distributed authentication and access control, federated metadata harvesting, remote repository caching, and distributed "virtual" collections of remote objects.
The Virtual Data Center (VDC) software is an open-source, digital library system for quantitative data. We discuss what the software does, and how it provides an infrastructure for the management and dissemination of disturbed collections of quantitative data, and the replication of results derived from this data.