NSF, DOE’s Rubin Observatory will create a massive data trove. A cloud-based platform and nightly alerts will deliver it to researchers.

Images from the U.S. government’s Vera C. Rubin Observatory unveiled last week provide a level of detail previously unseen, adding to the corpus of known space objects and representing the culmination of two decades of work and investment. They’re also merely the beginning of the decade-long data stream.
The new photos were taken in just over 10 hours of observation time at the Chile-based facility jointly funded by the National Science Foundation and U.S. Department of Energy. But when the observatory is done with its mission to survey the southern hemisphere sky, its telescope equipped with the largest digital camera in the world will have imaged each point 800 times, collecting over 2 million photos.
Ultimately, it’s projected to amass a data catalog of around 500 petabytes — the same volume of information as the total amount of written content in every language throughout history, per stats provided by NSF and DOE to the press. That new data is anticipated to create myriad opportunities for scientific exploration, including the study of dark matter, dark energy, inventorying our solar system, and mapping the Milky Way.
Yet getting that volume of information to researchers in an accessible way presented a challenge to the Rubin team. Through years of work and pivots to the latest technology as it emerged along the way, Rubin finally arrived at its state-of-the-art cloud-based science platform and nightly alerts system. On Monday, the cloud platform is getting its first look at real data.
“This is definitely a situation of a very large survey — that really is a data science project — [arriving] at the same time as the technology to be able to offer this kind of service,” Frossie Economou, the technical manager for data management at Rubin, told FedScoop in an interview.
When the plans for Rubin’s data infrastructure were sketched out roughly 15 years ago, the advancements in cloud computing technology and scalable services that it will now benefit from hadn’t yet been made. As those came along, it required the Rubin team to pivot, ultimately leading to the systems now in place.
The data science explosion, new technologies, and Rubin’s completion “have meshed together beautifully,” Economou said.

Science platform
Primarily, the way that data from Rubin will be made available is through a tool called the Rubin Science Platform, a cloud-based system that acts as a virtual computer through which researchers can interact with the information.
Via that platform, registered researchers can use application programming interfaces (APIs) and Jupyter Notebooks — an existing open-source web-based tool for code — to make calls to the data. While the notebooks and APIs run on Google Cloud, the data itself is housed at the Stanford Linear Accelerator Center (SLAC), which is the main data processing and archive for the observatory.
On Monday, a data preview, known as DP1, will hit the platform, giving researchers the first sample of what the information during the survey will look like. That will be followed by at least one more data preview, and when the survey process starts later this year, Rubin will begin producing data at a regular interval.
“The problem was always, ‘how do you allow scientists to interact with that data?’” William O’Mullane, the observatory’s data management project manager and a veteran of projects surveying and cataloging space, told FedScoop in an interview.
Previously, the old model of interacting with observatory data involved downloading all or part of that information onto a laptop and doing some processing on it, O’Mullane explained.
Some past platforms, known as “viewers,” have taken similar approaches in that the viewing of images and an overlay of the catalog of information happening on a server and not on a researcher’s laptop. But those tools didn’t allow for more complicated algorithms or machine learning, O’Mullane said. The Rubin team wanted to change that to allow researchers to bring their code to the data.

In a demonstration of the science platform for FedScoop, for example, O’Mullane made a query for images from May 22 of M49, a galaxy located 60 million light-years from the Earth and one of the subjects of Rubin’s first release. The query returned 3,720 images for that single target, showing the volume of information in even a seemingly narrow request.
That number is so high because during each visit of Rubin’s camera, known as the Large Synoptic Survey Telescope Camera, it uses 189 sensors to capture as many individual science images. That means a single LSSTCam capture, with all of its individual images stitched together, is roughly 3.2 gigapixels in size — in other words, an image that would take 400 high-def TVs to display at its full size, per stats provided by NSF and DOE.
Some image processing platforms might be able to load an image of that size eventually, but it would be slow and could crash the user’s machine, O’Mullane said. That’s why the captures are treated individually.
Prior to the Monday release of data, the platform was tested with simulated data in partnership with Google, which helped the observatory build an interim data facility on its cloud. Per a 2021 blog about the simulated data test, Nicole DeSantis, a research marketing manager for Google, said the agreement with Rubin marked “the first time a cloud-based data facility has been used for an astronomy application of this magnitude.”
Reymund Dumlao, director of state and local government and education for Google Public Sector, said the Google Cloud team was “thrilled” about the first images. “We are enthusiastic supporters of the LSST mission and hope to continue material contribution to the next decade of scientific discovery,” he said.
Alerts system
While the platform requires credentials based on university and research affiliations, a second data stream — the alerts system — will work to make information about certain changes in the night sky public within minutes of the image being taken.
“We have things we’re looking for. We’re trying to understand things that change over time,” Federica Bianco, deputy project scientist of observing strategy at Rubin, said of the alerts during a panel for the release of the first images.
The alerts, in the magnitude of roughly 10 million per night, benefit from Rubin’s mission to essentially create a movie of the night sky because new and old images can be compared to identify changes. Those will first go through a series of data brokers and then out to the broader research community to quickly get as many eyes on potential events, such as asteroids, pulsating stars or stars going supernova.
“That particular data product has no proprietary restriction because we need all of the scientists in the world to find their telescope to learn more about these things that we discover,” Bianco said.
There are a total of nine brokers, operated by organizations like NOIRLab, that will use various software systems to process those alerts for the public and provide tools such as filters and identification of objects. That data in the “alert packets” will also be available on the science platform within 24 hours.
O’Mullane told FedScoop those alerts will start coming in a few months. At first those will come asynchronously, then in real-time for portions of the sky when the survey begins, and eventually they will build up to full capacity.
“Alerts are fully public because they are timely,” Economou said. Use of the Rubin data on the platform itself has some data rights restrictions, which is part of the reason why users need an affiliation to log on. But as long as the Rubin team is compliant with those restrictions, Economou said “anything that we can put out, we do put out.”
Other data will also be available to the public on the community science platform Zooniverse and in classroom materials provided by the Rubin team. And the information from annual data releases, which will be a more curated grouping of information, will also eventually be public. After a two-year period of limited access for researchers in the U.S., Chile and other supporting institutions, the data will also become available to anyone in the world.
Although Rubin isn’t the first observatory to leverage cloud technology or alerts to facilitate access to its data, its uniqueness lies in the fact that it’s a brand new facility with huge potential that happened to also be built with those technologies in mind, rather than being added later on. As a result, the information will be more broadly available to people than past projects.
O’Mullane said the tools are in a sense “democratizing data” by providing more accessible ways to interact with the information. “You don’t need a super computer. I can show you to do exactly the same thing as a Ph.D student at Princeton on your laptop using exactly the same tool, and that’s really quite nice,” he said.
Economou highlighted the access as well, noting that “one person cannot exploit all this data.” Part of Rubin’s mission has been to get the information to as many researchers as possible because that will translate into more findings. Even though she’s proud of the technology and what they’ve built, Economou said the ease of access will be the achievement.
“I think the real success here will be to fulfill the promise to be able to just have this data widely used,” she said. “The more people use it, the better it is.”