Applying Machine Learning to Diplomatic Cable Review
US federal agencies must evaluate classified records for continuing risk to national security before they can be made available at the National Archives. The rapid proliferation of electronic records is outpacing capacity in the declassification programs. The resulting challenge, what has been described as a “digital tsunami,” will directly affect historical researchers’ access to primary sources.
Currently, the US Department of State’s declassification reviewers can process a maximum of eight million paper and electronic pages, or just over one terabyte of data (approximately six and a half million pages), annually. The Clinton Presidential Library includes roughly 4 terabytes of digital records, many requiring declassification review by the Department of State. The George W. Bush Library holds 80 terabytes, and the Obama Presidential Library holds 250 terabytes. That pattern of rapid growth is reflected across the executive branch’s various collections of classified records.
Human labor alone cannot be expanded to address this challenge. The approaching wave of digital records threatens to overwhelm declassification programs and thus public access to declassified records. The Public Interest Declassification Board, the Information Security Oversight Office, the Department of State’s Advisory Committee on Historical Diplomatic Documentation (a.k.a. the Historical Advisory Committee, or HAC, which includes an AHA representative), and others have drawn attention to this potential barrier to historical research and government accountability. The Department of State regularly updates the HAC on its efforts to improve the efficiency of declassification review. Those updates have described waiving authority to review certain types of information prior to public release; adopting risk-based methods that more than doubled the productivity of paper records review, including a declassification review module in the department’s state-of-the-art tool for managing digital records; and developing complex Boolean logic searches within that tool to simplify human analysis.
Such incremental gains could not scale up to address the approaching challenge. But they did establish a risk-tolerant, pro-technology culture for something that might. In March 2023, the department began combining human expertise with the power of machine learning for the declassification review of its 1998 diplomatic cable collection. That first use of machine learning in US declassification programs is consistent with Secretary of State Antony J. Blinken’s commitment to modernizing diplomacy and accepting intelligent risks.
Digital records threaten to overwhelm public access to declassified records.
Section 3.3 of Executive Order 13526 requires federal agencies to review their permanent classified records for possible exemption from automatic declassification after 25 years. The Systematic Review Program (SRP), within the Office of Information Programs and Services, conducts that review for the Department of State. SRP review efforts annually include over 100,000 diplomatic cables—and by 2030, it must exceed 650,000 cables annually.
Those diplomatic cables will be the first element of the digital tsunami to strike SRP. They document important developments in international relations, placing them among the most requested records. Their format and underlying metadata have remained relatively consistent over the years, in contrast to far less structured and therefore more challenging email and its attachments. As born-digital records, diplomatic cables do not require the expensive and problematic scanning of paper records. And the department has access to years of previously reviewed cables for use in training machine learning software. All of which makes the classified cable collection ideal as a proof of concept for machine-assisted declassification review.
The Machine Learning Declassification Pilot (MLDP) began in October 2022 as a joint effort by the Center for Analytics (CfA), responsible for the department’s data management and analysis capabilities; the Bureau of Information Resource Management (IRM), responsible for computer systems and software; and SRP personnel. MLDP would initially address only diplomatic cables, not the more complex challenges of other record formats. Results, positive or negative, would inform eventual efforts to address those records as part of a calculated crawl-walk-run strategy. The pilot evaluated the application of discriminative machine learning to declassification review. Unlike generative machine learning, which can actively predict missing data, discriminative machine learning categorizes available data. In this case, CfA data scientists sought to categorize cables as those containing no information of continuing concern to the department, those with information the department must protect, and those the software could not accurately categorize. They did this by training the machine on results from human review of the 1995 and 1996 diplomatic cables completed in 2020 and 2021. The algorithm does not actually understand the information it is categorizing. Once CfA adjusted software to mimic human review results from 1995–96, the successfully trained algorithm was tested against 2022’s human review of cables from 1997.
Those test results were available in mid-January 2023. The size and reliability of the sample dataset and the ability of the adjusted algorithm to mimic human review were the critical factors. Both proved more than adequate, and the project, no longer a pilot, was redesignated the Machine Learning Declassification Program. The CfA, IRM, and SRP personnel are now using the machine-assisted review procedures developed in the pilot phase to complete this year’s declassification analysis of diplomatic cables.
From the beginning, MLDP recognized the limitations inherent in machine learning. Historians understand that records must be considered in both their original and current contexts. Both the 1998 cables’ historical context and the present or foreseeable national security concerns of 2023 are different from what the model had been trained to evaluate previously. The program had to be able to adapt to such changes. Testing on the 1997 cables confirmed its ability to partially replicate a specialized and continuously evolving type of historical analysis when guided by human experts. Inclusion of human expertise is the key aspect of the department’s machine-assisted declassification review methodology. Humans teach the machine, use it to provide initial recommendations within a specified level of certainty, and confirm the results. This is and will remain machine-assisted review, with human experts exercising final judgment.
In March 2023, SRP manually reviewed a sample of classified cables from 1998, the initial phase of this new methodology. CfA experts used the results of that human review to retrain the computer algorithm, adjusting it for changing 2023 sensitivities and topics emerging in 1998. They next used the retrained algorithm to assess all classified diplomatic cables from 1998. Inclusion of the initial sample set, with its known accurate results, provided an easy check on the software’s assessment after it replicated a year’s labor for the SRP team in just 20 minutes of computing time. The adjusted algorithm confidently identified 72,891 of the 121,536 total cables from 1998 as not requiring any Department of State exemptions from declassification. Software also identified 1,427 cables requiring continued classification. Human reviewers are now confirming those results, much as they would otherwise conduct quality control for each other. The software simply multiplies the results of initial human review, reducing labor without any lost quality and leaving humans in control of the process. In testing against the 1997 cables, the software made correct declassification decisions 99.29 percent of the time, laudable accuracy for anyone. By design, it proved to be overprotective in continuing classification at only 81.43 percent correct. This conservatism and related procedures ensure that human reviewers both validate all decisions to delay public access and see all material that approaches the threshold for such a delay. This process provides a critical safety check against mistaken release of information damaging national security and ensures maximum transparency in the final product. As the nature of each mistaken exemption is identified, CfA scientists can evaluate possible improvements, seeking to both maximize the software’s accuracy over time and improve human performance. Yes, artificial intelligence is drawing attention to human inconsistencies and errors.
The software simply multiplies the results of initial human review.
MLDP software was unable to reach a decision on 47,218 cables from 1998 given its current settings and capabilities. Assessment of similar uncertainties in the 1997 cables showed them to be reasonable in that context—ambiguous content where even human experts must consider nuances, may reach conflicting results, or might otherwise require additional information. The presence of another agency’s classified information also confused the algorithm. SRP has long used Boolean searches to help humans manually locate such concerns, which must be referred to the appropriate agency for a decision. As it continues to mature and other agencies are consulted, the MLDP will further increase the efficiency of government-wide review by making such referrals more accurate.
MLDP has already reduced State Department labor involved in declassifying diplomatic cables by over 60 percent, even when data scientists’ time is included. Those cost reductions, and thus the department’s ability to review cables within current budget constraints, continue to mount rapidly less than a year after the pilot project’s start. Given the need to review nearly six times more cables by 2030, this is welcome news for researchers seeking information and other federal agencies confronting the rapid growth of digital records.
This initial application of machine learning capitalized on an ideal records collection and the Department of State’s established willingness to accept reasonable risk, and even to surrender some direct human oversight, in the declassification review process. The approach it validated is scalable and can be incrementally applied to more challenging types of records, with implications for other agency declassification programs. As MLDP successfully multiplies the results of skilled human labor in declassifying diplomatic cables, CfA continues to work on further enhancements and applications, including the Freedom of Information Act. The new capability is a beacon of hope in the shadow of the digital tsunami.
Jeffery A. Charlston is chief of the Systematic Review Program Division at the US Department of State.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Attribution must provide author name, article title, Perspectives on History, date of publication, and a link to this page. This license applies only to the article, not to text or images used here by permission.
Please read our commenting and letters policy before submitting.