Some Statistical Methods for Evaluating Information Extraction Systems
Will Lowe, Gary King. 2003.
"Some Statistical Methods for Evaluating Information Extraction Systems".
Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Pp. 19-26.

Abstract
We present new statistical methods for evaluating information extraction systems. The methods were developed to evaluate a system used by political scientists to extract event information from news leads about international politics. The nature of this data presents two problems for evaluators: 1) the frequency distribution of event types in international event data is strongly skewed, so a random sample of newsleads will typically fail to contain any low frequency events. 2) Manual information extraction necessary to create evaluation sets is costly, and most effort is wasted coding high frequency categories . We present an evaluation scheme that overcomes these problems with considerably less manual effort than traditional methods, and also allows us to interpret an information extraction system as an estimator (in the statistical sense) and to estimate its bias.
See Also
- [Paper] A Method of Automated Nonparametric Content Analysis for Social Science (2010)
- [Paper] An Automated Information Extraction Tool For International Conflict Data With Performance As Good As Human Coders: A Rare Events Evaluation Design (2003)
- [Paper] An Improved Method of Automated Nonparametric Content Analysis for Social Science (2022)
- [Paper] Computer-Assisted Keyword and Document Set Discovery from Unstructured Text (2017)
- [Paper] General Purpose Computer-Assisted Clustering and Conceptualization (2011)
- [Paper] How Censorship in China Allows Government Criticism But Silences Collective Expression (2013)
- [Patent] Method and Apparatus for Selecting Clusterings to Classify A Predetermined Data Set (2013)
- [Patent] Participant Grouping for Enhanced Interactive Experience (2014)