%0 Patent %D 2015 %T System for Estimating a Distribution of Message Content Categories in Source Data (2nd) %A Gary King %A Daniel Hopkins %A Ying Lu %X A method of computerized content analysis that gives "approximately unbiased and statistically consistent estimates" of a distribution of elements of structured, unstructured, and partially structured soruce data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be euqal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques. %7 United States of America %V US 9,189,538 B2 %G eng %N U.S Patent and Trademark Office %& US