Tech

NSF, DOE’s Rubin Observatory will create a massive data trove. A cloud-based platform and nightly alerts will deliver it to researchers.

Rubin’s completion converged with technological advancements that are poised to spread the data it produces far and wide to researchers.

By Madison Alder

June 30, 2025

Listen to this article

0:00

This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.

This image captures not only Vera C. Rubin Observatory, a program of NSF’s NOIRLab, but one of the celestial specimens Rubin Observatory will observe when it comes online: the Milky Way. (Photo credit: RubinObs/NOIRLab/NSF/AURA/B. Quint)

Images from the U.S. government’s Vera C. Rubin Observatory unveiled last week provide a level of detail previously unseen, adding to the corpus of known space objects and representing the culmination of two decades of work and investment. They’re also merely the beginning of the decade-long data stream.

The new photos were taken in just over 10 hours of observation time at the Chile-based facility jointly funded by the National Science Foundation and U.S. Department of Energy. But when the observatory is done with its mission to survey the southern hemisphere sky, its telescope equipped with the largest digital camera in the world will have imaged each point 800 times, collecting over 2 million photos.

Ultimately, it’s projected to amass a data catalog of around 500 petabytes — the same volume of information as the total amount of written content in every language throughout history, per stats provided by NSF and DOE to the press. That new data is anticipated to create myriad opportunities for scientific exploration, including the study of dark matter, dark energy, inventorying our solar system, and mapping the Milky Way.

Yet getting that volume of information to researchers in an accessible way presented a challenge to the Rubin team. Through years of work and pivots to the latest technology as it emerged along the way, Rubin finally arrived at its state-of-the-art cloud-based science platform and nightly alerts system. On Monday, the cloud platform is getting its first look at real data.

“This is definitely a situation of a very large survey — that really is a data science project — [arriving] at the same time as the technology to be able to offer this kind of service,” Frossie Economou, the technical manager for data management at Rubin, told FedScoop in an interview.

When the plans for Rubin’s data infrastructure were sketched out roughly 15 years ago, the advancements in cloud computing technology and scalable services that it will now benefit from hadn’t yet been made. As those came along, it required the Rubin team to pivot, ultimately leading to the systems now in place.

The data science explosion, new technologies, and Rubin’s completion “have meshed together beautifully,” Economou said.

Many stars and galaxies including two spiral galaxies and three merging galaxies. — The image, which was one of those included in Rubin’s “First Look,” show just a portion of the observatory’s total field of view of the Virgo cluster. It shows two spiral galaxies in the lower right and three merging galaxies in the upper right. The color shown in the images is a product of an algorithm that’s meant to represent various wavelength ranges, with blue indicating hotter and things and red indicating they’re cooler. (Photo credit NSF-DOE Vera C. Rubin Observatory)

Science platform

Primarily, the way that data from Rubin will be made available is through a tool called the Rubin Science Platform, a cloud-based system that acts as a virtual computer through which researchers can interact with the information.

Via that platform, registered researchers can use application programming interfaces (APIs) and Jupyter Notebooks — an existing open-source web-based tool for code — to make calls to the data. While the notebooks and APIs run on Google Cloud, the data itself is housed at the Stanford Linear Accelerator Center (SLAC), which is the main data processing and archive for the observatory.

On Monday, a data preview, known as DP1, will hit the platform, giving researchers the first sample of what the information during the survey will look like. That will be followed by at least one more data preview, and when the survey process starts later this year, Rubin will begin producing data at a regular interval.

“The problem was always, ‘how do you allow scientists to interact with that data?’” William O’Mullane, the observatory’s data management project manager and a veteran of projects surveying and cataloging space, told FedScoop in an interview.

Previously, the old model of interacting with observatory data involved downloading all or part of that information onto a laptop and doing some processing on it, O’Mullane explained.

Some past platforms, known as “viewers,” have taken similar approaches in that the viewing of images and an overlay of the catalog of information happening on a server and not on a researcher’s laptop. But those tools didn’t allow for more complicated algorithms or machine learning, O’Mullane said. The Rubin team wanted to change that to allow researchers to bring their code to the data.

The photo displays a field of stars and galaxies, including M49, which is roughly 60 million light-years from Earth. Some of those objects have white text labeling them, but it's only a portion of what can be seen, demonstrating the detail that Rubin will be able to capture. — Another “First Look” image displays a region of the sky that includes M49 as well as other galaxies and stars in the Milky Way. While the labels identify some of the the known objects, many of the objects without labels haven’t been seen before. The image is also about 1.6 the size of the area of a typical capture from Rubin. (Photo credit NSF-DOE Vera C. Rubin Observatory)

In a demonstration of the science platform for FedScoop, for example, O’Mullane made a query for images from May 22 of M49, a galaxy located 60 million light-years from the Earth and one of the subjects of Rubin’s first release. The query returned 3,720 images for that single target, showing the volume of information in even a seemingly narrow request.

That number is so high because during each visit of Rubin’s camera, known as the Large Synoptic Survey Telescope Camera, it uses 189 sensors to capture as many individual science images. That means a single LSSTCam capture, with all of its individual images stitched together, is roughly 3.2 gigapixels in size — in other words, an image that would take 400 high-def TVs to display at its full size, per stats provided by NSF and DOE.

Some image processing platforms might be able to load an image of that size eventually, but it would be slow and could crash the user’s machine, O’Mullane said. That’s why the captures are treated individually.

Prior to the Monday release of data, the platform was tested with simulated data in partnership with Google, which helped the observatory build an interim data facility on its cloud. Per a 2021 blog about the simulated data test, Nicole DeSantis, a research marketing manager for Google, said the agreement with Rubin marked “the first time a cloud-based data facility has been used for an astronomy application of this magnitude.”

Reymund Dumlao, director of state and local government and education for Google Public Sector, said the Google Cloud team was “thrilled” about the first images. “We are enthusiastic supporters of the LSST mission and hope to continue material contribution to the next decade of scientific discovery,” he said.

Alerts system

While the platform requires credentials based on university and research affiliations, a second data stream — the alerts system — will work to make information about certain changes in the night sky public within minutes of the image being taken.

“We have things we’re looking for. We’re trying to understand things that change over time,” Federica Bianco, deputy project scientist of observing strategy at Rubin, said of the alerts during a panel for the release of the first images.

The alerts, in the magnitude of roughly 10 million per night, benefit from Rubin’s mission to essentially create a movie of the night sky because new and old images can be compared to identify changes. Those will first go through a series of data brokers and then out to the broader research community to quickly get as many eyes on potential events, such as asteroids, pulsating stars or stars going supernova.

“That particular data product has no proprietary restriction because we need all of the scientists in the world to find their telescope to learn more about these things that we discover,” Bianco said.

This video from Rubin demonstrates the ability of the facility to identify asteroids as they move across the sky. In the first 10 hours of observation, Rubin discovered 2,104 asteroids that weren’t previously known. Currently, all of the other ground-based observatories combined discover about 20,000 asteroids each year. Rubin is expected to discover millions within the first two years of the survey. (Video via Rubin Observatory/YouTube)

There are a total of nine brokers, operated by organizations like NOIRLab, that will use various software systems to process those alerts for the public and provide tools such as filters and identification of objects. That data in the “alert packets” will also be available on the science platform within 24 hours.

O’Mullane told FedScoop those alerts will start coming in a few months. At first those will come asynchronously, then in real-time for portions of the sky when the survey begins, and eventually they will build up to full capacity.

“Alerts are fully public because they are timely,” Economou said. Use of the Rubin data on the platform itself has some data rights restrictions, which is part of the reason why users need an affiliation to log on. But as long as the Rubin team is compliant with those restrictions, Economou said “anything that we can put out, we do put out.”

Other data will also be available to the public on the community science platform Zooniverse and in classroom materials provided by the Rubin team. And the information from annual data releases, which will be a more curated grouping of information, will also eventually be public. After a two-year period of limited access for researchers in the U.S., Chile and other supporting institutions, the data will also become available to anyone in the world.

Although Rubin isn’t the first observatory to leverage cloud technology or alerts to facilitate access to its data, its uniqueness lies in the fact that it’s a brand new facility with huge potential that happened to also be built with those technologies in mind, rather than being added later on. As a result, the information will be more broadly available to people than past projects.

O’Mullane said the tools are in a sense “democratizing data” by providing more accessible ways to interact with the information. “You don’t need a super computer. I can show you to do exactly the same thing as a Ph.D student at Princeton on your laptop using exactly the same tool, and that’s really quite nice,” he said.

Economou highlighted the access as well, noting that “one person cannot exploit all this data.” Part of Rubin’s mission has been to get the information to as many researchers as possible because that will translate into more findings. Even though she’s proud of the technology and what they’ve built, Economou said the ease of access will be the achievement.

“I think the real success here will be to fulfill the promise to be able to just have this data widely used,” she said. “The more people use it, the better it is.”

NSF, DOE’s Rubin Observatory will create a massive data trove. A cloud-based platform and nightly alerts will deliver it to researchers.

Science platform

Alerts system

More Like This

As Trump and Elon feud, government demand for Starlink grows

USAID is dead, but some departing staffers’ government phones still work

Anthropic makes generative AI widely available at major national lab

Top Stories

VA facilities faced ‘technology barriers’ in importing medical records, watchdog finds

GSA leader sees AI as catalyst for federal acquisition overhaul

Federal courts to ramp up filing system security after ‘recent escalated cyberattacks’

GSA inks governmentwide deal with AWS, touting $1B in potential savings

Former USPTO IT chief Jamie Holcombe joins US AI

House Democrats press USDA for answers on DOGE access to farmers’ data

Federal agencies can buy ChatGPT for $1 through GSA deal

OPM officially sunsets ‘five bullets’ emails for federal workers

More Scoops

NSF announces $75 million to create five biofoundries at research institutions

NSF, Energy announce first 35 projects to access National AI Research Resource pilot

How risky is ChatGPT? Depends which federal agency you ask

How agencies are moving zero trust from aspiration to transformation

Energy Department reveals blueprint for nationwide quantum internet

NSF to disburse $75M between 3 new quantum computing institutes

Two cybersecurity programs aiming to innovate and educate receive NSF Frontier awards

Latest Podcasts

Federal courts ramp up filing system security after ‘recent escalated cyberattacks’; Jamie Holcombe steps down as USPTO CIO

Federal agencies can buy ChatGPT for $1; New deal with AWS brings $1B in potential credits for agencies

Senate confirms national cyber director pick Sean Cairncross; A new commission to examine how to create an independent Cyber Force

Data leaders condemn Trump’s order to fire BLS commissioner; Trump nominates former private equity exec Edward Forst as GSA administrator

Tech

Defense

Cyber

FedScoop TV