Cloud computing offers scientists the chance to study petabytes of data easier than ever before. So why aren't they sharing their data?
In a sentiment familiar to those who follow the federal IT community, a number of research experts say the culture around scientific discoveries needs to change if scientists are to fully harness the capabilities of cloud computing.
Work like discovering gravitational waves or research tied to the Human Genome Project produce an astounding amount of data, which need to be shared amongst multiple scientific outlets if researchers are to answer the pressing problems of our time. Experts from the public, private and academic sectors all agreed Friday at a Center for Data Innovation event that silos need to be knocked down to make sense of the mountains of data associated with their projects.
“The problems we are trying to address are much more complex than ever before,” said Phil Bourne, associate director for data science at the National Institutes of Health. “We can’t solve them with one person sitting in a garage anymore.”
Angel Pizarro, who works on Amazon Web Services’ Scientific and Research Computing Team, said sharing the data tied to scientific discoveries gives a multitude of groups the ability to scale various breakthroughs to meet their unique needs.
“Not everybody out there has the money to buy supercomputers,” Pizarro said. “When you can lease or rent [computing power], that opens up science to a much broader set of researchers that would not be able to have done this before. That matters on a petabyte scale.”
The federal government has been using the cloud to manage its health-related research initiatives: the White House's Precision Medicine Initiative and the “moonshot” to cure cancer recently announced in the State of the Union.
Ben Shneiderman, a professor of computer science at the University of Maryland, said health care research is an ideal model for how data should be shared, given how research affects health care delivery.
“All of the 5000 hospitals [in the U.S.] should be using that data, finding out what treatments work, and propagating that news in a trustworthy way where people can apply those [results],” he said.
Yet Shneiderman was honest as to why he believes this sharing hasn’t become the norm: Countries, companies and college often compete with one another when it comes to scientific breakthroughs and want to reap the rewards of their own discoveries.
“The fundamental reason there isn’t sharing is groups and countries are in competition,” he said. They have worked hard to collect data. They are reluctant to give theirs.”
In order to combat data hoarding, Shneiderman said the government should change the way research grants are doled out, tying renewals to how much data has been shared or opened for the public.
“Data is not a piece of tech; it’s a social structure,” he said. “It’s about how you create the incentives for government to do the right thing. How do we create the incentives to clean, annotate and curate data?”
“The main thing is this is not a technical problem,” Shneiderman continued. “It’s all social. It’s all cultural.”
Contact the reporter on this story via email at firstname.lastname@example.org, or follow him on Twitter at @gregotto. His OTR and PGP info can be found here. Subscribe to the Daily Scoop for stories like this in your inbox every morning by signing up here: fdscp.com/sign-me-on.