‘Work yet to be done’ on AI-ready data standards for federal government, Commerce CDO says

Commerce Chief Data Officer Oliver Wise said a “policy shift” is needed across the government on the standard for public data in the era of AI.

By Madison Alder

June 27, 2024

(Getty Images)

Developing standards for data so that it can be better understood by artificial intelligence models is one area where more government efforts are needed, the Department of Commerce’s top data officer said.

While the Biden administration’s executive order on the technology and the Office of Management and Budget’s corresponding memo cover a lot of ground, Oliver Wise, the department’s chief data officer, said there’s “work yet to be done” on spelling out “standards of what AI-ready data really means.”

Wise, who was speaking on a panel at Scale’s Gov AI Summit on Tuesday, highlighted efforts by a working group within Commerce to address that issue. The AI and Open Government Data Assets Working Group is soliciting input from the public on potential guidelines for AI-ready government data through a request for information, and hopes to publish its findings by the end of the year. Comments on that RFI must be received before July 16.

Currently, the standard for publishing public data is machine readability, Wise said. That machine-readability standard was enshrined into federal law in the Foundations for Evidence-Based Policymaking Act, also known as the OPEN Government Data Act. OMB has also yet to issue further guidance on a portion of that legislation, which became law in 2019.

“Our contention, and really what’s at the forefront of this RFI, is that we believe that machine readability is a necessary but not sufficient condition to really meet user expectations in the AI era,” Wise said.

That means going beyond having data in a CSV format or common standard and having data that is machine-understandable, Wise said. He added that a “policy shift” is needed across federal, state and local government on “what should be that standard for public data dissemination in the AI era.”

For now, those standards are best off as something voluntary, Wise said. For its part, the working group will put out a report by the end of the year, he said, but that won’t be “dispositive.” Going forward, he said, this is an era that needs “sustained investment” so that the government can research standards and evolve with industry as new innovations are developed.

But AI-ready open data comes with risks, Helena Fu, the director of the Department of Energy’s Office of Critical and Emerging Technologies and its chief AI officer, cautioned on the same panel.

Fu said the Biden administration is working to make as much data as possible available and managing the risks of doing that, especially around chemical, biological, radiological and nuclear defense risks.

“One of the things that we struggle with, and that we’re thinking through very actively, is how do we meet both of these things,” Fu said.

She said Energy has a lot of historical open data on “all things nuclear.” But at the same time, Fu said the agency is “concerned about the proliferation of a lot of data combined with the capabilities of AI models” where things that used to be disparate can be put into one place and “potentially things could be done that we don’t really want to be done.”

The question around how you bridge open to closed data is “one that I think we’re going to be actively wrestling with in the months to come,” Fu said.

Wise agreed with Fu that the decision around what’s public and what isn’t is an important one, but said that if data is already public, the government has a “longstanding commitment” to “make it as public as possible.”

Just because a lot of people will be using generative AI isn’t a reason to “pause on what data should be public or not,” he said. “It’s just that we ought to do it correctly.”