System for Estimating a Distribution of Message Content Categories in Source Data

Daniel Hopkins, Gary King, Ying Lu. 2012. "System for Estimating a Distribution of Message Content Categories in Source Data". United States of America 8,180,717.

Patent

Abstract

A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured soruce data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be euqal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.

No results found

System for Estimating a Distribution of Message Content Categories in Source Data

Abstract

See Also