The Blog Conversation Project

Recent years have seen the explosion of a new form of public expression via web logs, or blogs. In early 2005, the Perseus Development Corporation estimated that 31.6 million blogs were active. By early 2006, estimates reached 100 million. Over 7% of American adults report having their own blog. By all accounts, blogging is growing very fast. Blogs have become influential ways for ordinary citizens share views with large audiences that they would have previously shared only with a few friends, relatives, or coworkers.

Although conversations among people interested in national issues have always occurred, the new technology makes these previously local conversations global and also creates a record of them that may be useful for researchers. The Blog Conversation Project is a research study developing methods of automatically reading and coding all blogs on specific topics so that we can measure the views expressed in this national or international conversation. We have developed methods that make possible automated public opinion polls of those with opinions, coded directly from the blogs. This may be more informative about some issues than the usual strategy of polling randomly selected citizens, many of whom have not thought about the question they are being asked. This project might also help us understand better the process of agenda-setting, the expressed thoughts of activists and those who have thought about issues, public views about commerical products, or any other subject of interest.

The project is led by Gary King, David Florence Professor of Government and Director of The Institute for Quantitative Social Science at Harvard, and Ph.D. Candidate Daniel Hopkins, a Graduate Student Affiliate of the Institute. Our research team includes Katie Colton, Nicholas Christian Hayes, Matthew Knowles, Steven Michael Melendez, Andrew William Prokop, and Keneshia De'Shuan Washington, as assisted by the staff of the Harvard-MIT Data Center and executive assistant Beverly Macmillen.

Procedures and Privacy

Our main interest is developing aggregate summaries across millions of blogs, rather than providing information about any individual blogger: We are studying the conversation, not any individual participant in that conversation. Since we wish to hear only those who wish to be heard, any blogger who prefers to be excluded from our automated analyses (and from our reports on the national conversation) can use standard web protocols to exclude robots, which we follow. We view the comprehensiveness of our information as an important priority and so, as is standard practice among search engines, we only exclude information at the "request" (through these automated means) of the blogger or webmaster responsible for the blog.

We welcome your questions or comments.