System for Estimating a Distribution of Message Content Categories in Source Data
Daniel Hopkins, Gary King, Ying Lu. 2012.
"System for Estimating a Distribution of Message Content Categories in Source Data".
United States of America 8,180,717.

Abstract
A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured soruce data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be euqal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.
See Also
- [Paper] A Method of Automated Nonparametric Content Analysis for Social Science (2010)
- [Paper] An Automated Information Extraction Tool For International Conflict Data With Performance As Good As Human Coders: A Rare Events Evaluation Design (2003)
- [Paper] An Improved Method of Automated Nonparametric Content Analysis for Social Science (2022)
- [Paper] Computer-Assisted Keyword and Document Set Discovery from Unstructured Text (2017)
- [Paper] General Purpose Computer-Assisted Clustering and Conceptualization (2011)
- [Paper] How Censorship in China Allows Government Criticism But Silences Collective Expression (2013)
- [Patent] Method and Apparatus for Selecting Clusterings to Classify A Predetermined Data Set (2013)
- [Patent] Participant Grouping for Enhanced Interactive Experience (2014)