Advertisement

National Archives getting a big boost from AI to transform its search capabilities

NARA’s CTO says the agency has gone all in on the technology, with pilots on auto-filling metadata, PII redaction, FOIA processing and more.
Listen to this article
0:00
Learn more. This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.
Gulam Shakir, the chief technology officer at the National Archives & Records Administration, participates in a panel discussion at the Google Public Sector Summit on Oct. 16, 2024, in Washington, D.C. (Scoop News Group photo)

In the pre-GPT era of artificial intelligence, there wasn’t a ton for the National Archives and Records Administration — and its billions of unstructured documents — to do with the technology.

According to Gulam Shakir, NARA’s chief technology officer, those more nascent AI tools “did not really fit” with the agency’s needs — namely the processing and management of massive “free-form” datasets lacking any logical formatting or rules.  

Needless to say, things have changed considerably for NARA since OpenAI’s ChatGPT and other generative AI applications exploded onto the scene.

“When the GPT models hit the market, we knew that it was a great fit for our mission,” Shakir said during a panel discussion last week at the Scoop News Group-produced Google Public Sector Summit.

Advertisement

For NARA, that meant a deep dive into the emerging technology, with a particular focus on tools that could improve the agency’s search capabilities. The overarching goal, Shakir said, is having “whatever people search for … to appear on the first page” of results.

That’s no small feat for an agency that, as of October 2023, had nearly 240 million pages of digitized records in the National Archives Catalog, almost 18 million digital images in the Digital Public Library of America, and 1.7 billion views of NARA records on all Wiki platforms.

Shakir pointed specifically to an AI-based semantic search pilot at NARA that, according to the agency’s use case inventory, “aims to enhance the search functionality of its vast catalog” by leveraging technology that “goes beyond keyword matching, understanding the user’s intent and the contextual meaning behind their search terms.”

Semantic search is intended to essentially meet head-on NARA’s unstructured data challenges by streamlining and making the search process more efficient for the public as well as researchers and historians. Semantic search should also be able to make connections between seemingly disparate records and documents housed in NARA’s catalog, providing “a more comprehensive understanding of the historical events and processes represented in the records” and leading to “new insights and discoveries,” the use case notes.

While NARA’s semantic search pilot explored tools from multiple industry partners, the agency ultimately settled on Google’s Vertex AI platform, leveraging the company’s Gemini large language model. 

Advertisement

“The Vertex AI has been pretty impressive in giving us the results that we want,” Shakir said.

Shakir also touted NARA’s AI-based knowledge chat interface pilot that uses Amazon Kendra to index and query search results. The tool can search through multiple types of content forms — the use case inventory names PDF, HTML and image files specifically — and deliver “relevant information, guidance, and support from a curated knowledge base of articles, policies, and procedures related to” records from the National Personnel Records Center and the Case Reference Guide

Other NARA AI pilots include a Freedom of Information Act processing project, an automated metadata tool to create archival descriptions or “self-describing records,” and a system to screen and redact personally identifiable information in digitized archival records. 

As NARA continues to upskill its in-house development team, Shakir said solutions are being identified to save the workforce time. The archival descriptions project, he noted, has simplified what had been a “pretty labor-intensive process” in generating and managing metadata by providing NARA staff with a summary of a document as well as its title, scope, and other identifiers, Shakir said.

“We’re just trying all of it all at once, because we just want to know … what works best for our mission,” Shakir said. “And I think the team has been doing great.”

Latest Podcasts