Publications by Author: Patrick Lam

Systems and Methods for Keyword Determination and Document Classification from Unstructured Text
Gary King, Margaret Roberts, and Patrick Lam. 4/30/2019. “Systems and Methods for Keyword Determination and Document Classification from Unstructured Text.” United States of America US 10,275,516 B2 (U.S Patent and Trademark Office).Abstract
In various embodiments, documents are searched and retrieved via receipt of a search query, electronically identifying a reference set of relevant documents, providing a search set of documents, creating a database comprising at least  some of the documents of the search set and the reference set , computationally classifying the documents in the database , extracting keywords from the search  set and one or more classified sets , optionally filtering the extracted keywords,  and electronically identifying at least some of the documents from the database that contain one or more of the extracted keywords.
Computer-Assisted Keyword and Document Set Discovery from Unstructured Text
Gary King, Patrick Lam, and Margaret Roberts. 2017. “Computer-Assisted Keyword and Document Set Discovery from Unstructured Text.” American Journal of Political Science, 61, 4, Pp. 971-988. Publisher's VersionAbstract

The (unheralded) first step in many applications of automated text analysis involves selecting keywords to choose documents from a large text corpus for further study. Although all substantive results depend on this choice, researchers usually pick keywords in ad hoc ways that are far from optimal and usually biased. Paradoxically, this often means that the validity of the most sophisticated text analysis methods depend in practice on the inadequate keyword counting or matching methods they are designed to replace. Improved methods of keyword selection would also be valuable in many other areas, such as following conversations that rapidly innovate language to evade authorities, seek political advantage, or express creativity; generic web searching; eDiscovery; look-alike modeling; intelligence analysis; and sentiment and topic analysis. We develop a computer-assisted (as opposed to fully automated) statistical approach that suggests keywords from available text without needing structured data as inputs. This framing poses the statistical problem in a new way, which leads to a widely applicable algorithm. Our specific approach is based on training classifiers, extracting information from (rather than correcting) their mistakes, and summarizing results with Boolean search strings. We illustrate how the technique works with analyses of English texts about the Boston Marathon Bombings, Chinese social media posts designed to evade censorship, among others.