Informatics and Data Sharing

Replication Standards New standards, protocols, and software for citing, sharing, analyzing, archiving, preserving, distributing, cataloging, translating, disseminating, naming, verifying, and replicating scholarly research data and analyses. Also includes proposals to improve the norms of data sharing and replication in science.
Replication, Replication
"The replication standard holds that sufficient information exists with which to understand, evaluate, and build upon a prior work if a third party can replicate the results without any additional information from the author." This, and the data sharing to support it, was proposed for political science, along with policy suggestions in
King, Gary. 1995. “

Replication, Replication

.” PS: Political Science and Politics 28: 443–499. Abstract
Political science is a community enterprise and the community of empirical political scientists need access to the body of data necessary to replicate existing studies to understand, evaluate, and especially build on this work. Unfortunately, the norms we have in place now do not encourage, or in some cases even permit, this aim. Following are suggestions that would facilitate replication and are easy to implement – by teachers, students, dissertation writers, graduate programs, authors, reviewers, funding agencies, and journal and book editors.
A Revised Proposal, Proposal
Comments from nineteen authors and a response to the above:
King, Gary. 1995. “A Revised Proposal, Proposal.” PS: Political Science and Politics XXVIII: 494–499.
Publication, Publication
King, Gary. 2006. “

Publication, Publication

.” PS: Political Science and Politics 39: 119–125. Publisher's Version Abstract
I show herein how to write a publishable paper by beginning with the replication of a published article. This strategy seems to work well for class projects in producing papers that ultimately get published, helping to professionalize students into the discipline, and teaching them the scientific norms of the free exchange of academic information. I begin by briefly revisiting the prominent debate on replication our discipline had a decade ago and some of the progress made in data sharing since.

The Dataverse Network Project

The Dataverse Network Project: a major ongoing project to write web applications, standards, protocols, and software for automating the process of citing, archiving, preserving, distributing, cataloging, translating, disseminating, naming, verifying, and replicating data and associated analyses (Website: TheData.Org). See also:
An Introduction to the Dataverse Network as an Infrastructure for Data Sharing
King, Gary. 2007. “An Introduction To The Dataverse Network As An Infrastructure For Data Sharing.” Sociological Methods and Research 36: 173–199. Abstract
We introduce a set of integrated developments in web application software, networking, data citation standards, and statistical methods designed to put some of the universe of data and data sharing practices on somewhat firmer ground. We have focused on social science data, but aspects of what we have developed may apply more widely. The idea is to facilitate the public distribution of persistent, authorized, and verifiable data, with powerful but easy-to-use technology, even when the data are confidential or proprietary. We intend to solve some of the sociological problems of data sharing via technological means, with the result intended to benefit both the scientific community and the sometimes apparently contradictory goals of individual researchers.
From Preserving the Past to Preserving the Future: The Data-PASS Project and the Challenges of Preserving Digital Social Science Data
Gutmann, Myron P, Mark Abrahamson, Margaret O Adams, Micah Altman, Caroline Arms, Kenneth Bollen, Michael Carlson, et al.. 2009. “

From Preserving The Past To Preserving The Future: The Data-Pass Project And The Challenges Of Preserving Digital Social Science Data

.” Library Trends 57: 315–337. Abstract
Social science data are an unusual part of the past, present, and future of digital preservation. They are both an unqualified success, due to long-lived and sustainable archival organizations, and in need of further development because not all digital content is being preserved. This article is about the Data Preservation Alliance for Social Sciences (Data-PASS), a project supported by the National Digital Information Infrastructure and Preservation Program (NDIIPP), which is a partnership of five major U.S. social science data archives. Broadly speaking, Data-PASS has the goal of ensuring that at-risk social science data are identified, acquired, and preserved, and that we have a future-oriented organization that could collaborate on those preservation tasks for the future. Throughout the life of the Data-PASS project we have worked to identify digital materials that have never been systematically archived, and to appraise and acquire them. As the project has progressed, however, it has increasingly turned its attention from identifying and acquiring legacy and at-risk social science data to identifying on going and future research projects that will produce data. This article is about the project's history, with an emphasis on the issues that underlay the transition from looking backward to looking forward.

Hidden Section 1

A symposium on replication, edited by Nils Petter Gleditsch and Claire Metelits, with several articles including mine,
King, Gary. 2003. “The Future Of Replication.” International Studies Perspectives 4: 443–499. Abstract
Since the replication standard was proposed for political science research, more journals have required or encouraged authors to make data available, and more authors have shared their data. The calls for continuing this trend are more persistent than ever, and the agreement among journal editors in this Symposium continues this trend. In this article, I offer a vision of a possible future of the replication movement. The plan is to implement this vision via the Virtual Data Center project, which – by automating the process of finding, sharing, archiving, subsetting, converting, analyzing, and distributing data – may greatly facilitate adherence to the replication standard.

The Virtual Data Center

The Virtual Data Center, the predecessor to the Dataverse Network. See:
A Digital Library for the Dissemination and Replication of Quantitative Social Science Research
Altman, Micah, Leonid Andreev, Mark Diggory, Gary King, Daniel L Kiskis, Elizabeth Kolster, Michael Krot, and Sidney Verba. 2001. “A Digital Library For The Dissemination And Replication Of Quantitative Social Science Research.” Social Science Computer Review 19: 458–470. Abstract
The Virtual Data Center (VDC) software is an open-source, digital library system for quantitative data. We discuss what the software does, and how it provides an infrastructure for the management and dissemination of disturbed collections of quantitative data, and the replication of results derived from this data.

See Also

A Proposed Standard for the Scholarly Citation of Quantitative Data
Altman, Micah, and Gary King. 2007. “

A Proposed Standard For The Scholarly Citation Of Quantitative Data

.” D-Lib Magazine 13. Publisher's Version Abstract
An essential aspect of science is a community of scholars cooperating and competing in the pursuit of common goals. A critical component of this community is the common language of and the universal standards for scholarly citation, credit attribution, and the location and retrieval of articles and books. We propose a similar universal standard for citing quantitative data that retains the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative data sets, and is consistent with most existing subfield-specific approaches. Although the digital library field includes numerous creative ideas, we limit ourselves to only those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.

Related Papers on New Forms of Data

Preserving Quantitative Research-Elicited Data for Longitudinal Analysis.  New Developments in Archiving Survey Data in the U.S.
Abrahamson, Mark, Kenneth A Bollen, Myron P Gutmann, Gary King, and Amy Pienta. 2009. “Preserving Quantitative Research-Elicited Data For Longitudinal Analysis. New Developments In Archiving Survey Data In The U.s..” Historical Social Research 34 (3): 51-59. Abstract
Social science data collected in the United States, both historically and at present, have often not been placed in any public archive -- even when the data collection was supported by government grants. The availability of the data for future use is, therefore, in jeopardy. Enforcing archiving norms may be the only way to increase data preservation and availability in the future.
Computational Social Science
Lazer, David, Alex Pentland, Lada Adamic, Sinan Aral, Albert-Laszlo Barabasi, Devon Brewer, Nicholas Christakis, et al.. 2009. “Computational Social Science.” Science 323: 721-723. Abstract
A field is emerging that leverages the capacity to collect and analyze data at a scale that may reveal patterns of individual and group behaviors.
Ensuring the Data Rich Future of the Social Sciences
King, Gary. 2011. “Ensuring The Data Rich Future Of The Social Sciences.” Science 331 (11 February): 719-721. Abstract
Massive increases in the availability of informative social science data are making dramatic progress possible in analyzing, understanding, and addressing many major societal problems. Yet the same forces pose severe challenges to the scientific infrastructure supporting data sharing, data management, informatics, statistical methodology, and research ethics and policy, and these are collectively holding back progress. I address these changes and challenges and suggest what can be done.
The Changing Evidence Base of Social Science Research
King, Gary. 2009. “The Changing Evidence Base Of Social Science Research.” In The Future of Political Science: 100 Perspectives, edited by Gary King, Kay Schlozman, and Norman Nie. New York: Routledge Press. Abstract
This (two-page) article argues that the evidence base of political science and the related social sciences are beginning an underappreciated but historic change.