The ReadMe software package for R takes as input a set of text documents (such as speeches, blog posts, newspaper articles, judicial opinions, movie reviews, etc.), a categorization scheme chosen by the user (e.g., ordered positive to negative sentiment ratings, unordered policy topics, or any other mutually exclusive and exhaustive set of categories), and a small subset of text documents hand classified into the given categories.
If used properly, ReadMe will report, normally within sampling error of the truth, the proportion of documents within each of the given categories among those not hand coded. ReadMe computes quantities of interest to the scientific community based on the distribution within categories but does so by skipping the more error prone intermediate step of classifing individual documents. Other procedures are also included to make processing text easy.
- Documentation: readme.pdf
- All questions, bugs, and requests: ReadMe Mailing List, [Un]Subscribe, or Browse/Search Archives
- ReadMe implements methods described in Daniel Hopkins and Gary King, A Method of Automated Nonparametric Content Analysis for Social Science, American Journal of Political Science, 54, 1 (January 2010): 229--247. (Paper: Article | Abstract: HTML)
- To install on Linux, from R: install.packages("ReadMe",repos="http://r.iq.harvard.edu")
- To install on Windows: download and install this file, or run this code:
License: Creative Commons Attribution- Noncommercial-No Derivative Works 3.0 License, for academic use only. A commerical (and industrial strength) version has been built by, licensed to, and offered by Crimson Hexagon.