Tech

NIH makes its coronavirus genomic data publicly accessible in the cloud

Researchers can now quickly access the data for free, so long as they have an NIH award.

August 18, 2020

(Getty Images)

The National Institutes of Health is making genomic data about the coronavirus publicly accessible to researchers in the cloud for the first time.

Created by the National Center for Biotechnology Information, the Coronavirus Genome Sequence Dataset consists of researcher-submitted data, including normalized Sequence Read Archive (SRA) file formats. The SRA is a bioinformatics repository of DNA sequences.

Researchers with active NIH awards can now quickly access the dataset at no cost via the Registry of Open Data on Amazon Web Services, and the agency plans to make it available on more public data cloud platforms.

“Containing COVID-19 outbreaks and preparing for future pandemics will require a deep understanding of the SARS-CoV-2 genome in the context of other COVID-19 patients and the broader Coronaviridae family,” said Ryan Layer, assistant professor at the University of Colorado Boulder’s BioFrontiers Institute, in a statement. “The NCBI Coronavirus Genome Sequence Dataset makes over a decade of viral genome data publicly accessible for researchers, empowering anyone in the research community to participate in the pandemic response.”

The dataset contains more than 13,000 SRA runs, NIH says. The project is part of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative. STRIDES is a collaboration between NIH and AWS to use the cloud to assist researchers with active NIH awards.

The data being made available will help researchers understand not only COVID-19 but other pandemic diseases. Differences in genetic sequences among infected patients help researchers determine how quickly the virus is evolving, and genetics are thought to play a role in how patients react to infection. Diagnostic testing can also be fine tuned.

The dataset itself consists of two buckets: one containing raw and normalized files categorized by SRA accession code and another containing accession metadata that will soon be queryable within the Amazon Athena interactive query service.

NIH makes its coronavirus genomic data publicly accessible in the cloud

More Like This

VA software management woes linked in part to CIO vacancy, watchdog says

Drones in Dallas: How one World Cup host city enforced FAA flight restrictions

Bill to modernize federal procurement passes House after stalling last Congress

Top Stories

Senate Dems say IRS chief may have ‘misled’ Congress on staffing answers

TMF-backed projects to bring $1B in cost savings in coming years, watchdog finds

CAISI would benefit from more resources, OSTP director tells lawmakers

NASA, DOD and others join Energy Department-led Genesis Mission

Energy Department demos Genesis Mission platform

OSTP’s Kratsios denies White House influence over NSF grants decisions

More Scoops

NIH taps former data chief Kristen Honey as first partnerships officer

From genomics breakthroughs to tracking Amazon fires: Flemming Awards turn spotlight on federal scientists

With shift to increased remote work and zero trust, NIH eyes cloud solution for identity

HHS makes Palantir data analytics platform available to all its agencies

Biden calls on Congress to fund ‘DARPA for health’ in State of the Union address

FDA made improving diagnostic test data the focus of its pandemic response

Transition to federal health data standards an ‘unfunded mandate’ for smaller providers

Latest Podcasts

The Energy Department recruits NASA, DOD and others to join the Genesis Mission

How agentic AI can help agencies prevent fraud in real time

Commerce selects six Tech Hubs winners for re-awarded funds

OPM moves forward with its transformational HR contract

Tech

Defense

Cyber

FedScoop TV