Tech

Heads up developers, some census data just became easier to use

It will be more easy to use the American Community Survey's most granular data and build apps out of it because it has been transformed into linked data, the Census Chief Marketing Officer Jeff Meisel announced Saturday during a panel at SXSW in Austin.

By Samantha Ehlinger

March 11, 2017

From left: Former Director of the Health Data Initiative Damon Davis, Beth Beck of NASA, Jeff Meisel of the U.S. Census Bureau and Brett Hurt of data.world speak at a panel on open data at the SXSW Conference in Austin on March 11, 2017.

People have to overcome a steep learning curve to use granular data from the American Community Survey — perhaps the Census Bureau’s best-known product — because of its structure and lack of metadata.

Historically only available in common tabular file formats like CSV, the dataset requires reference to separate dictionary document to understand it. But now, developers and data scientists will be able to more easily use the ACS data and build apps from it because it has been transformed into linked data, the Census Chief Marketing Officer Jeff Meisel announced Saturday during a panel at the SXSW Conference in Austin.

The Austin-based data.world, funded by the National Science Foundation, brought on then-graduate student Jonathan Ortiz to address problems with the Public Use Microdata Sample, as it’s called.

“What comes to you in the microdata survey file … is essentially just: one piece is the CSV, which has coded values throughout, and you constantly have to refer back and forth to the data dictionary,” said Ortiz, who now works as a data scientist for data.world, in an interview with FedScoop. “And the data dictionary is a human-readable document, it’s not computer-readable at all.”

But semantic technology allows users to “put that metadata in to the data itself so that you’re consuming both at the same time, and you’re also able to use unique identifiers for each of the data resources in that data so the computer can actually understand them, make sense of them.”

The tradeoff in getting the metadata is that “the size of the data explodes when you start incorporating all this other information.”

To address the storage issue, Amazon Web Services is making it available as an AWS public dataset: Anyone can then analyze the data in the cloud without downloading or storing a copy. The old formats will still be available, Ortiz said. Most spreadsheet programs can easily read a CSV file.

“The intention was not to completely replace it and we don’t want to — that’s not the interest here. I think the people who are comfortable using that in its format and enjoy using it, and get value out of it, are going to continue to do so,” Ortiz said. “This is just a new way of modeling and distributing the data. And hopefully a new set of users get different use out of it.”

Ortiz says he hopes there are web and app developers who could use the data delivered in this way to make resources for the public.

“I believe that linked data is the future,” he said, adding “that by providing this it’ll provide other people, other folks out there, semantic web enthusiasts, data engineers, developers, researchers, etc. to begin to enrich their analysis and enrich their own data by linking it to the Census.”

To make use of the CSV version, for a data scientist or data engineer “it’s like learning a new language, it’s like learning a new programming language.” After translating the data into linked data, Ortiz said people can now use the real, human terms and concepts to query it. “And you can uncover things more quickly using that because you’re not learning a new language, essentially.”

Heads up developers, some census data just became easier to use

More Like This

VA software management woes linked in part to CIO vacancy, watchdog says

Drones in Dallas: How one World Cup host city enforced FAA flight restrictions

OSTP’s Kratsios denies White House influence over NSF grants decisions

Top Stories

Senators push Bisignano on DOGE’s SSA moves after ‘inadequate responses’

Governmentwide HR system in the clear on post-award GAO bid protests

Secret Service wants more AI robots for target practice

The walls we rebuilt with cloud modernization

DHS taps dataset of baggage images to improve TSA scans

Senate Dems say IRS chief may have ‘misled’ Congress on staffing answers

TMF-backed projects to bring $1B in cost savings in coming years, watchdog finds

CAISI would benefit from more resources, OSTP director tells lawmakers

More Scoops

Heads up, developers: DOT wants an address crowdsourcing app

Latest Podcasts

OPM’s governmentwide HR system in the clear on post-award GAO bid protests

The Energy Department recruits NASA, DOD and others to join the Genesis Mission

How agentic AI can help agencies prevent fraud in real time

Commerce selects six Tech Hubs winners for re-awarded funds

Tech

Defense

Cyber

FedScoop TV