Call for Applications: History as a Data Science

Free Two-Week Research Workshop at Columbia’s History Lab


Digital history is thriving, and there are many resources available for using it in the classroom. But the increasing volume of digitized and “born digital” collections of books, articles, and archives also presents tremendous new opportunities for original research. This is especially true in fields like international history, where the declassification of official papers in online databases has created some of the largest collections of historical documents in the public domain. But for those with little or no training in data science, the only way to explore these resources has been through keyword searching.

The goal of this workshop is to offer a very practical introduction to the many other methods of using large digitized archives only possible with direct access to the data. Participants will learn how to organize and analyze textual data and get an overview of advances in natural language processing and machine learning. Hands-on training will use textual data from History Lab, an NSF-funded project that has aggregated the largest database of declassified government documents in the world. Participants will also learn how to get their own data by “scraping” websites and downloading from online databases. More specifically, we will examine how to bring textual data into Python and R, how to use Python for web scraping, and how to explore textual data using string functions. These methods make it possible to grapple with old research problems with new rigor, and launch entirely new kinds of inquiries.

The workshop is timed to follow immediately after the 2020 American Historical Association meeting in New York City. Those interested in applying are encouraged to attend the AHA and participate in the digital history panels and workshops. Thanks to support from Columbia University and a grant from the American Council of Learned Societies, we have funds to subsidize the travel and lodging costs of out-of-town participants. But we also ask you to apply for your own funds so we can make this opportunity available to others.

When: January 6, 2020 – January 17, 2020. Sessions will be from 9am - 1pm each weekday, with individual and small-group meetings to follow. If you would like to participate but the timing poses a problem, tell us which dates and times you can commit to attending.

Where: Columbia University in NYC, at the Institute for Social and Economic Research and Policy.

Eligibility: This workshop is open rank: first year Ph.D. students through established scholars are encouraged to apply. Priority will be given to historians. Others will be eligible to participate on a space available basis.