The Department of State’s pilot project approach to AI adoption

Senior IT leaders at State argue that small-scale pilots of AI technology can help bring a wealth of benefits to federal government, such as increased transparency.

By Samuel Stehle, Matthew Graviss and Eric Stein

August 15, 2023

Digital Futuristic abstract background, data visualization. (Image credit: Getty Images)

With the release of ChatGPT and other large language models, generative AI has clearly caught the public’s attention. This new awareness, particularly in the public sector, of the tremendous power of artificial intelligence is a net good. However, excessive focus on chatbot-style AI capabilities risks overshadowing applications that are both innovative and practical and seek to serve the public through increased government transparency.

Within government, there are existing projects that are more mature than AI chatbots and are immediately ready to deliver more efficient government operations. Through a partnership between three offices, the Department of State is seeking to automate the cumbersome process of document declassification and prepare for the large volume of electronic records that will need to be reviewed in the next several years. The Bureau of Administration’s Office of Global Information Services (A/GIS), the Office of Management Strategy and Solutions’ Center for Analytics (M/SS CfA), and the Bureau of Information Resource Management’s (IRM) Messaging Systems Office have piloted and are now moving toward production-scale deployment of AI to augment an intensive, manual review process that normally necessitates a page-by-page human review of 25-year-old classified electronic records. The pilot focused mainly on cable messages which are communications between Washington and the department’s overseas posts.

The 25-year declassification review process entails a manual review of electronic, classified records at the confidential and secret levels in the year that their protection period elapses; in many cases, 25 years after original classification. Manual review has historically been the only way to determine if information can be declassified for eventual public release, or exempt from declassification to protect information critical to our nation’s security.

However, manual review is a time-intensive process. A team of about six reviewers works year-round to review classified cables and must use a triage method to prioritize reviewing the cables most likely to require exemption from automatic declassification. In most years, they are unable to review every one of the between 112,000 and 133,000 electronic cables under review from 1995-1997. The risk of not being able to review each document for any sensitive material is exacerbated by the increasing volume of documents.

This manual review strategy is quickly becoming unsustainable. Around 100,000 classified cables were created each year between 1995 and 2003. The number of cables created in 2006 that will require review grew to over 650,000 and remains at that volume for the following years. While emails are currently an insignificant portion of 25-year declassification reviews, the number of classified emails doubles every two years after 2001, rising to over 12 million emails in 2018. To get ahead of this challenge, we have turned to artificial intelligence.

Considering AI is still a cutting-edge innovation with uncertainty and risk, our approach started with a pilot to test the impact of the process on a small scale. We trained a model, using human declassification decisions made in 2020 and 2021 on cables classified confidential and secret in 1995 and 1996, to recreate those decisions on cables classified in 1997. Over 300,000 classified cables were used for training and testing during the pilot. The pilot took three months and five dedicated data scientists to develop and train a model that matches previous human declassification review decisions at a rate of over 97 percent and with the potential to reduce over 65 percent of the existing manual workload. The pilot approach allowed us to consider and plan for three AI risks: lack of human oversight of automated decision-making, the ethics of AI, and overinvestment of time and money on products that aren’t usable.

The new declassification tool will not replace jobs. The AI-assisted declassification review process requires human reviewers to remain part of the decision-making process. During the pilot and the subsequent weeks of work to put the model into production, reviewers were consistently consulted and their feedback integrated into the automated decision process. This combination of technological review with human review and insight is critical to the success of the model. The model cannot make a decision with confidence on every cable, necessitating that human reviewers make a decision as they normally would on a portion of all cables. Reviewers also conduct quality control. A small, yet significant, percentage of cables with automated confident decisions are given to reviewers for confirmation. If enough of the AI-generated decisions are contradicted during the quality control check, the model can be re-trained to consider the information that it missed and integrate reviewer feedback. This feedback is critical to sustaining the model in the long term and for considering evolving geopolitical contexts. During the pilot, we determined that additional input from the Department’s Office of the Historian (FSI/OH) could help strengthen future declassification review models by providing input about world events during the years of records being reviewed.

There are ethical concerns that innovating with AI will lead to governing by algorithm. Although the descriptive AI used in our pilot does not construct narrative conversations like large language models (LLMs) and ChatGPT, it is designed to make decisions by learning previous human inputs. The approximation of human thought raises concerns of ethical government when it replaces what is considered sensitive and specialized experience. In our implementation, AI is a tool that works in concert with humans for validation, oversight, and process refinement. Incorporating AI tools into our workflows requires continually addressing the ethical dimensions of automated decision-making.

This project also saves money — potentially millions of dollars’ worth of personnel hours. Innovation for the sake of being innovative can result in overinvestment in dedicated staff and technology, which is unable to sustain itself or end up in long-term cost savings. Because we tested our short-term pilot within the confines of existing technology, when we forecast the workload reduction across the next ten years of reviews, we anticipate an almost $8 million savings on labor costs. Those savings can be applied to piloting AI solutions for other governmental programs managing increased volumes of data and records with finite resources, such as information access requests for electronic records and Freedom of Information Act requests.

Rarely in government do we prioritize the time to try, and potentially fail, in the interest of innovation and efficiency. The small-scale declassification pilot allowed for a proof of concept before committing to sweeping changes. In our next phase, the Department is bringing the pilot to scale so that the AI technology is integrated with existing Department technology as part of the routine declassification process.

Federal interest in AI use cases has exploded in only the last few months, with many big and bold ideas being debated. While positive, these debates should not detract from use cases like this, which can rapidly improve government efficiency and transparency through the release of information to the public. Furthermore, the lessons learned from this use case – having clear metrics of success upfront, investing in data quality and structure, starting with a small-scale pilot — can also be applied to future generative AI use cases as well. AI’s general-purpose capabilities mean that it will eventually be a part of almost all aspects of how the government operates, from budget and HR to strategy and policy making. We have an opportunity to help shape how the government modernizes its programs and services within and across federal agencies to improve services for the public in ways previously unimagined or possible.

Matthew Graviss is chief data and AI officer at the Department of State, and director of the agency’s Center for Analytics. Eric Stein is the deputy assistant secretary for the office of Global Information Services at State’s Bureau of Administration. Samuel Stehle is a data scientist within the Center for Analytics.