Commerce guidance aims to improve how generative AI uses its data
The Department of Commerce published guidance Thursday to ensure public data is ready for use with generative artificial intelligence tools, providing actionable steps for the agency responsible for the census, as well as climate and economic data.
The publication of the guidance is an effort by the department to both improve accuracy of AI responses and increase the prioritization of its authoritative data in AI tools over nonauthoritative sources and potentially inaccurate information, the document said.
To make that happen, the department outlines guidelines and best practices under five major topic areas: documentation, formats of data and metadata, storage and dissemination, licensing and usage, and data integrity and quality.
The new guidance is the culmination of work that began in late 2023 with the launch of the AI and Open Government Data Assets Working Group under the agency’s Data Governance Board. That group, made up of data and AI experts in the department, set out to address the opportunities and difficulties with AI’s use of open-data assets and worked with outside sources in industry and academia to inform the report.
Specifically, some of the guidelines included in the report are adding “comprehensive variable-level metadata for machine understandability,” compressing large datasets or making them easier to download, updating data websites regularly, defining usage policies for machine readability, and automating AI-ready data quality control.
In a message included in the report, Oliver Wise, the department’s chief data officer, said AI is redefining how people interact with information, but the standard of data being machine-readable alone isn’t sufficient to meet the needs of those new tools.
“To ensure that public data retains its value and integrity in this new paradigm, we must embrace practices that make data not only machine-readable but machine-understandable,” Wise wrote. “This means preserving the meaning and context of the data in ways that generative AI systems can accurately interpret and utilize.”
Wise has previously cited the need for a policy shift at all levels of government when it comes to making data machine-understandable to meet the needs of the AI era.
An agency like Commerce — sometimes referred to as “America’s data agency” — taking steps to make its data AI-ready could have widespread impact, as it houses some of the federal government’s most prominent data resources, including the U.S. Census Bureau, National Oceanic and Atmospheric Administration, and U.S. Bureau of Economic Analysis.
But it’s also something that its authors hope will benefit more than just the department.
“This report is primarily intended to guide data stewards and publishers across the Department of Commerce as they navigate this new era,” Wise wrote. “However, we also hope it serves as a resource for data stewards everywhere — across governments, academia, and the private sector — by offering insights on how public data can be structured and disseminated to better interact with large language models (LLMs) and other generative AI systems.”
The report comes as agencies across the federal government have noted that data and data readiness are hurdles to their AI implementation journeys. It was a top issue cited by agencies in their plans for compliance with the Office of Management and Budget’s AI governance memo, according to a FedScoop analysis of a sampling of those plans in October.