NIST challenge targets better de-identification techniques for public data

The agency is worried about "linkage attacks," where hackers collate a bunch of datasets to glean personally identifiable information from otherwise anonymized data.
(Getty Images)

One barrier to opening up valuable government datasets is making sure that all necessary personally identifiable information (PII) is removed beforehand — a process called de-identification. It’s a balancing act intended to protect individuals’ privacy while maintaining the integrity of the data.

The National Institutes of Standards and Technology (NIST) says existing de-identification techniques aren’t good enough, however, and in a new challenge on, the agency asking for ways to improve them.

“Currently popular de-identification techniques are not sufficient,” the challenge page reads. “Either PII is not sufficiently protected, or the resulting data no longer represents the original data. This competition is about creating new methods, or improving existing methods of data de-identification, in a way that makes de-identification of privacy-sensitive datasets practical.”

The “Unlinkable Data Challenge” specifically wants to stop “linkage attacks” — where multiple and possibly unrelated datasets are combined to glean personal information contained across the datasets.


“This valid privacy concern is unfortunately limiting the use of data for research, including datasets with the Public Safety sector that might otherwise be used to improve protection of people and communities,” the challenge page reads. “Due to the sensitive nature of information contained in these types of datasets and the risk of linkage attacks, these datasets can’t easily be made available to analysts and researchers.”

The first stage in the multi-stage NIST competition seeks ideas and concepts — later phases will test the efficacy of submitted algorithms. Participants have until July 26 to submit ideas.

NIST isn’t the only agency grappling with the challenge of protecting privacy while moving toward a model where more data is shared. The Department of Health and Human Services’ chief data officer, Mona Siddiqui, spoke about this impediment at the annual South by Southwest festival in Austin this year.

Data de-identification is a hot topic in the private sector, too, as Facebook in particular has faced strong criticism about how it handles users’ information. The company recently said it halted discussions with medical institutions about sharing data for research, as re-identification became a concern.

Beyond this latest callout, NIST has launched a number of competitions via in recent months. The agency is also looking for crowdsourced ideas on how virtual reality might aid the public safety community in their work and how to build drones that can stay aloft for longer.

Latest Podcasts