National AI Research Resource must balance the value of its data with privacy
The task force developing recommendations on a National Artificial Intelligence Research Resource must balance the need to provide valuable data with the increased risk it could be used to triangulate personally identifiable information, given the large number of parties expected to have access, experts say.
Task force members want to include startups and small businesses developing privacy technologies among NAIRR‘s users, but exactly how resources, capabilities and policies would be integrated continues to be discussed, according to co-chair Manish Parashar.
Members previously stated that U.S.-based researchers and students — primarily in academia but also with companies that have received federal grants like Small Business Innovation Research or Small Business Technology Transfer funding — are target users of the NAIRR. Privacy technologies they’re developing could help the resource protect personally identifiable information (PII).
“Yes, the task force is certainly discussing how privacy-enabling technologies could help enhance the privacy aspects of NAIRR usage,” Parashar told FedScoop. “However, the task force has also discussed how privacy requires more than just technical solutions, and we expect a full range of considerations when contemplating privacy, civil rights and civil liberties.”
Data used to train machine learning (ML) algorithms can be anonymized to a degree, but the process is never absolute, which means PII can be correlate with enough effort.
Startups like integrate.ai, which advocates privacy by design, see an opportunity for the NAIRR to not only include them but use their privacy-enhancing technologies: federated learning, differential privacy, homomorphic encryption and secure multi-party computation.
“I would love to see a privacy track, a privacy initiative that both leverages the research value of the resource but also supports the whole initiative to actually protect the privacy of that information,” said Karl Martin, senior vice president of technology at integrate.ai.
Martin envisions a cluster of researchers and companies with a mandate to support the NAIRR with privacy-enhancing technologies that others may or are required to use to access the resource’s data, in addition to advancing their own work.
Database-style access controls are the “most basic” form of privacy limiting organizations based on data type, and they would likely become “frustrating” for NAIRR users, Martin said.
On the other hand, federated learning allows ML algorithms to be built without directly accessing data and can be compounded with additional layers of privacy, like differential privacy, to make reverse engineering back to the original data difficult, he added.
Whatever privacy technologies the task force ultimately recommends should be based on a smart data philosophy, opting for ones associated with the data rather than the systems.
“What’s the value of this data?” Martin said. “Then what are the protection mechanisms that can surround the data?”