The government is working to improve data access. An AI chatbot could be part of that.
A National Science Foundation demonstration project aimed at improving data infrastructure within the U.S. government is eyeing an artificial intelligence chatbot that could help answer questions about statistical information.
The tool is currently in the process of being prototyped by a contractor with an expected competition date of August 2025. If it’s proven to be feasible, the hope is that someday it could answer a variety of user queries about public data.
Sharon Boivin, research director at NSF’s National Center for Science and Engineering Statistics, told FedScoop those questions might cover topics like these: How many people graduated with STEM majors in a certain state, how a zoning change impacted traffic patterns, or whether a certain neighborhood is in a flood zone.
“The idea of that chatbot is that it would go into statistical agency public websites and public information and find the gold standard objective information to answer plain language questions through natural language processing,” Boivin said.
It would also be required to provide its sources, she added.
The chatbot effort is just one small part of the National Secure Data Service Demonstration project, which is housed within NCSES. That demonstration was a requirement of the 2022 CHIPS and Science Act and is tasked with finding ways to streamline federal data access and improve research and evidence-based decision making, while also ensuring that confidential information is protected.
Currently, the process of getting to that kind of information has some hurdles.
“People don’t have a way to learn about data options based on their topics of interest,” Boivin said. “If people want to get in and need to get into the gold standard microdata, at a minimum, it takes valuable time to find the right path.”
For certain confidential data, people have to pay to access it and some academics may even have to pay to travel for access, she said. The process of linking data together is also burdensome.
“Anybody who has tried to link data together — whether it’s agency to agency, agency to state, state to state, or any other configuration — knows all about the governance and legal hurdles and [memorandum of understanding] writing that’s needed to overcome concerns about privacy and confidentiality,” Boivin said.
That’s where the NSDS comes in.
A data service
The idea “is there should be no wrong door into the data ecosystem,” Boivin said. “So what we’re doing is we’re starting to think about how we can coordinate with the other actors in this data ecosystem to make sure that a person who wants to learn something with data has as few clicks as possible to get to the right place.”
At present, the NSDS demonstration has 30 projects in the works to stand up the service, according to Boivin. Those range from building out its .gov website to developing a secure compute environment to support its data linkage and analysis. The demonstration already has conducted research on state-of-the-art secure compute environments within government and privacy-protecting technologies.
Other projects include building out its “data concierge” to connect users with a real person to help them find the information they’re seeking and creating “toolkit” guides on how to use the service for user groups, such as policymakers.
Boivin said the website is expected to launch in spring to early summer of 2025, though it will provide just some of the planned resources at that time, with more to come. The demonstration project has a sunset in 2027 and expects to produce its minimum viable product by August of that year.
The idea for such a project was first introduced by the Commission on Evidence-Based Policymaking in its 2017 final report. Later, in 2022, an advisory committee on data and evidence-building that was established by the Foundations for Evidence-Based Policymaking Act recommended specific actions to establish it.
The demonstration has now been running for two years and in September issued a report to Congress, informing lawmakers about its progress, removing barriers and plans to scale the project up to a full NSDS.
Work underway
Some projects are already underway with contractors, such as the AI chatbot and secure compute testbed.
BrightQuery, Inc. won a $1.4 million contract for the AI chatbot in August, and NORC at the University of Chicago won a nearly $9 million contract to work on the secure compute environment in July.
One contract has also been awarded for efforts to improve how synthetic data generation — artificial data that can stand in to produce the same statistics as the original dataset while protecting private information — can be used with large-scale real-world datasets. Another will explore how the NSDS could provide state, local and territorial governments with tools to help inform their policymaking efforts. The synthetic data contract was awarded to Westat for nearly $1 million and the state, local and territorial government work contract was awarded to BrightQuery for roughly $1.3 million.
Many of the demonstration efforts link together. BrightQuery, for example, was awarded another contract for almost $1 million in September to look into how the NSDS might be able to provide shared resources and tools to make data more AI-ready. Boivin said those efforts are a “complement” to the chatbot work.
While the Evidence Act mandated that federal data has to be machine-readable, AI demands that data is “machine-understandable,” she said. To address that, the AI-ready data project is “producing an AI-readiness assessment that agencies can use with their own websites to see how AI-ready they are,” and while Boivin called it “a dream,” the project is also aiming to “create a prototype tool that can actually help turn federal statistical products into AI-ready data.
The next phase for the demonstration is to identify “high-impact” use cases to test the prototypes with, Boivin said. Those efforts are currently underway. The demonstration is working with the Interagency Council on Statistical Policy’s Subcommittee on the NSDS to identify those uses, she said.
Boivin said those use cases will help test the feasibility and the value of a future NSDS while also ensuring “that we’re building for need and not just because we thought it would be a good idea.”