GAO in ‘experimentation phase’ with AI model to query reports, inform its work

The agency is currently in an 'experimentation phase' with a large language model that could lend itself to a variety of use cases, GAO's chief data scientist said.
Close-up of the Government Accountability Office (GAO) sign outside its main headquarters in Washington, DC. (Getty Images)

An artificial intelligence model currently being developed by the Government Accountability Office could one day be used to do things like pull information from the watchdog’s extensive catalog of reports for Congress, the agency’s top official told lawmakers this week. 

GAO Comptroller General Gene Dodaro told the House Administration’s Subcommittee on Modernization at a Wednesday hearing that the agency is working to figure out how to both audit AI use in government and use the technology to its own benefit.

As part of those efforts, Dodaro said, GAO’s Innovation Lab has brought in a large language model that it’s looking to use to query its reports in a more “sophisticated way.” Once the agency is satisfied with that tool, he said, they’d share it with Congress.

All the potential use cases of the model have yet to be defined, however, and that’s the point.


In an interview with FedScoop, GAO Chief Data Scientist and Director of the Innovation Lab Taka Ariga said the AI project is currently in an “experimentation phase” with a goal of establishing a foundational large language model (LLM) that allows that agency to build use cases on top of it.

While GAO is interested in using the tool for its work and the work of Congress, Ariga said it also has an interest in learning the mechanics of generative AI so it can figure out the appropriate uses and better perform its oversight of the government — which is increasingly adopting the new technology. 

In balancing all of those interests, Ariga said GAO arrived at building a model within its infrastructure that allows its staff to work with generative AI in a more secure way and experiment with their own use cases.

“We want to make sure that we are realizing the sort of potential opportunities for productivity gains, but making sure that we do so in a secure and appropriate way,” Ariga said. “And to us, the only way to do that is deploying an LLM inside GAO.”

GAO’s work on the model has been ongoing for roughly eight to 10 weeks, Ariga said, and has focused on using pre-trained models to figure out which work best for what it’s trying to achieve. “Now we’re embarking on the exercise of how do we then calibrate the parameters? How do we start thinking about training these models on GAO content?” he said.


Among GAO’s interests: Making sure that the model isn’t generating answers from things it learned from outside sources like Wikipedia and avoiding an “auto magical” type of interface that doesn’t explain its reasoning.

“Rather than making stuff up, we want to see this sort of step-by-step rational process behind the scenes so that any output that we end up seeing, we have some basis of … understanding how we arrive at that output,” Ariga said.

One particular use case the watchdog is thinking about for a generative AI tool that’s trained on GAO’s reports is using an application programming interface (API) to “plug into information within” on committee hearings and summarize what the agency has already reported on the topic, Ariga said. 

Other potential generative AI use cases GAO is looking at include analyzing public comments on to draw common themes, Ariga said. He noted the agency has a number of different hypotheses. “What we don’t want to end up doing is limiting what we can do based on what the Innovation Lab has thought about before,” he said.

As far as timing, Ariga said the agency is “very close” to a basic deployment of the technology itself, but there is still “some time before” it can be broadly available. The agency must go through the cybersecurity authorization process and testing, for example. 


But throughout the process, Ariga said, they want to make sure they’re testing with end users. That group will start small and eventually be expanded to more people over time. “Eventually, our goal is to then invite all the GAO to be able to experience what sort of a GAO-specific flavor of large language model can do,” Ariga said.

Madison Alder

Written by Madison Alder

Madison Alder is a reporter for FedScoop in Washington, D.C., covering government technology. Her reporting has included tracking government uses of artificial intelligence and monitoring changes in federal contracting. She’s broadly interested in issues involving health, law, and data. Before joining FedScoop, Madison was a reporter at Bloomberg Law where she covered several beats, including the federal judiciary, health policy, and employee benefits. A west-coaster at heart, Madison is originally from Seattle and is a graduate of the Walter Cronkite School of Journalism and Mass Communication at Arizona State University.

Latest Podcasts