For big data, NASA’s head is in the cloud

Ingenuity won NASA the Intel "Head in The Clouds" challenge. Cloud processing enabled it to bring the project to fruition.

NASA’s head is in the clouds in more ways than one.

A research team led by Compton Tucker of NASA’s Goddard Space Flight Center won Intel’s 2015 “Head in The Clouds” challenge for its efforts to estimate biomass in the southern Sahara Desert using satellite imagery. To actualize the project, the agency chose to delve into a cloud of a very different sort: virtual data analytics.

“These workloads are not standard enterprise — they’re big data, and they require tools and applications most people haven’t heard of,” said Tim Carroll — vice president of ecosystems at Cycle Computing, which assisted NASA with the project — during a presentation at the Amazon Web Services Government, Education, and Nonprofits Symposium last week in Washington, D.C. “The shift to the cloud is as monumental as the advent of Linux clusters or supercomputers.”


The ecological survey allowed scientists to assess carbon density in one of the most arid climates in the world, doing from space what would take decades to accomplish on land. NASA hoped to put the information toward climate change research.

The agency had possessed the logistical resources to launch the project — satellites, programs for detecting greenery and a long track record of success in aerospace ventures. Without cloud computing, however, it lacked the storage space and processing power to come to any meaningful conclusions about what promised to be a colossal set of data.

“We’re talking about something along the lines of 8 petabytes of data at the end of this program,” said Dan Duffy, high-performance computing lead at NASA’s Center for Climate Simulation. “The processing requirements are substantial.”

The sample area comprises a broad swath of desert across Africa’s widest portion, including most of the Nigerian Sahara. NASA needed to process 3,120 “scenes,” or images containing 900 GB of visual data, dividing each scene into 100 “tiles” to make the workload more manageable.

Using a virtual single core machine, each tile took 24 minutes to process through the carbon detection program, which scanned each image for greenery and calculated relative emissions. One scene, then, would take about 40 hours, meaning that by conventional means the data would not be churned out for more than 14 years.


With the cloud, however, tiles and scenes could run in parallel across multiple virtual machines. The result was that with between 175 and 200 cloud processors, it took only about a month to assess all of the data — an order of magnitude less than the traditional alternative.

The benefits of such increased efficiency are far-reaching, according to Carroll.

“The reason we’re talking about running these workloads in the cloud is because we want to get more resources to researchers who can use them,” he said. “We’re able to lower the cost of knowledge about the world we live in.”

Latest Podcasts