ReadMe: Software for Automated Content Analysis

Authors: Daniel Hopkins, Gary King, Matthew Knowles, Steven Melendez

The ReadMe software package for R takes as input a set of text documents (such as speeches, blog posts, newspaper articles, judicial opinions, movie reviews, etc.), a categorization scheme chosen by the user (e.g., ordered positive to negative sentiment ratings, unordered policy topics, or any other mutually exclusive and exhaustive set of categories), and a small subset of text documents hand classified into the given categories.

If used properly, ReadMe will report, normally within sampling error of the truth, the proportion of documents within each of the given categories among those not hand coded. ReadMe computes quantities of interest to the scientific community based on the distribution within categories but does so by skipping the more error prone intermediate step of classifing individual documents. Other procedures are also included to make processing text easy.

ReadMe implements methods described in Daniel Hopkins and Gary King, A Method of Automated Nonparametric Content Analysis for Social Science, American Journal of Political Science, 54, 1 (January 2010): 229--247. (Paper: Article  | Abstract: HTML)

Related software Readme2 is available here.

  • Reporting Bugs and Issues: Please use our Github Issue form.
  • Questions and feature requests: Discuss the software on our Discussions page.
  • Documentation: readme.pdf explains how to install and use the package
  • ReadMe for R:
    • To install the package make sure that you have the devtools package installed and then run in R:
    • Source code is available at:
    • Note: on Windows you will need to ensure that Python is installed before installing ReadMe. To install Python see:
  • License: Creative Commons Attribution- Noncommercial-No Derivative Works 3.0 License, for academic use only. A commerical (and industrial strength) version has been built by, licensed to, and offered by Crimson Hexagon.