The Paperless Archive
Seth Denbo, November 2014
In 2007, the Los Angeles Times reported on the apparent disappearance of thousands of e-mails sent by the Bush White House. This loss of crucial government records came to light only because Congress went looking for them as part of an investigation. The e-mails were eventually recovered, but the case made national news and highlighted a serious issue that should be a cause for concern among historians. Just as investigations into wrongdoing require access to evidence, historical research necessitates the survival of documents. The loss of documentary evidence puts at risk the historian’s enterprise.
Since before the turn of the 21st century, much of the historical record has been born digital. Government departments, media outlets, social movements, and many individuals no longer limit themselves to paper records that allow historians to trace the decisions, turning points, exchanges, discoveries, and mind-sets through which we build a narrative and create an understanding of the past. Even such venerable sources as newspapers are no longer solely or even primarily published in nondigital formats.
A small but growing number of historians are thinking about questions related to the shift from paper to digital records and the impact of these changes on historical scholarship. One of those at the forefront of this work is Matthew Connolly, professor of history at Columbia University. Connolly’s research interests are on “the history—and future—of world politics.” But, as he writes on his website, “the historical record is vast and a large but indeterminate part of it remains classified.”
To attempt to determine the shape and scope of the body of records still out of reach, Connolly works with both digitized and born-digital documents. He has assembled, according to his website, a team of “computer scientists and statisticians to try to uncover the scope and nature of official secrecy, and perhaps even venture predictions about what a fuller accounting might reveal” (www.matthewconnelly.net). In an effort to gain a better picture of the subject, Connolly and his collaborators are building what they call the Declassification Engine. This tool kit will allow the management and analysis of both digitized and born-digital materials at a scale that begins to approach the needs of doing history when vast amounts of data are the norm. This work is by necessity collaborative and challenges us to rethink research practices.
Photo: Jason Scott (www.flickr.com/photos/textfiles) CC BY 2.0
Servers that hold vast amounts of data are already an important part of the paperless archive.
All historians can easily recognize that government records are valuable sources, even if they are not paper on shelves, but instead were created electronically and stored as bits and bytes on hard drives and servers. But new types of sources that are actually products of digital technology and have no predigital analogues can be very useful evidence for historians. While these may not have the same status as State Department records, immense opportunities exist in the blog posts, YouTube videos, and tweets related, for example, to the Egyptian Revolution in 2011 that were created in the weeks before and after the resignation of Hosni Mubarak.
The value in these sources for historians also comes with challenges. The quantity of digital records is staggering: thousands of tweets per second, millions of e-mails from an administration, thousands of blog posts. These digital records are vast and hold far more information than would be possible to read or comprehend, so software is the key to navigation, and even analysis. Archiving and preservation also present problems, which archivists and librarians at national, state, and local archives have been addressing for decades. Several projects are archiving born-digital records of government administrations and political figures. The Kaine Email Project at the Library of Virginia collects, processes, and allows access to over one million e-mails from the administration of former Virginia Governor Tim Kaine. The Robert C. Byrd Center for Legislative Studies has a publicly accessible database of over one million pieces of legislative e-mail correspondence. Projects like these are attempting to prevent the loss of these materials and give scholars and the general public access to these vital records for contemporary political history.
Daniel Chudnov, director of scholarly technology at George Washington University’s Gelman Library, is working on an application that addresses another aspect of the problem. The Social Feed Manager, which recently received funding from the National Archives and Records Administration, was first developed in response to social scientists who wanted to track how media organizations use Twitter. Political scientists are using it to archive the feeds of congressional representatives. The Social Feed Manager will also benefit historians of the recent past, who will increasingly need to know more about social media and web archives.
One great place to start is at the AHA’s annual meeting in January. A session organized by Meg Phillips of the National Archives and Records Administration will feature Connolly and several other historians and archivists in a discussion about the relationship between historians and archivists and the new kinds of work that will be required to create and use digital archives.
Seth Denbo is the AHA’s director of scholarly communication and digital initiatives.
The Arab Spring, Chronicled Tweet by Tweet: aje.me/1oQJixe
Are We Losing History? Capturing Archival Records for a New Era of Research, a session at the upcoming AHA annual meeting: bit.ly/1oQIRmC
The Declassification Engine: www.declassification-engine.org
Kaine Email Project @ LVA: www.virginiamemory.com/collections/kaine
Matthew Connolly’s website: www.matthewconnelly.net
Robert C. Byrd Center for Legislative Studies: www.byrdcenter.org
Social Feed Manager at George Washington University Library: bit.ly/1oQJ9Kc
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Attribution must provide author name, article title, Perspectives on History, date of publication, and a link to this page. This license applies only to the article, not to text or images used here by permission.