Overcoming the zettabyte: How government is making records electronic
Electronic record storage is such a massive undertaking for the government that the scale is nearly incomprehensible — even to the experts.
“I learned a new word today: zettabyte,” said Rob Carey, deputy chief of information at the Defense Department. “I’m sure that’s something with a lot of zeros after it.”
But there’s a pressing need to comprehend and manage this gargantuan amount of data. By the end of 2016, the entire government is supposed to manage its email records entirely electronically (some agencies still print out emails to file them away). By the end of 2019, all records must be managed electronically. And the Government Account Office released a report Tuesday saying the National Archives and Records Administration — the ultimate records depository — must do more to ensure its data storage facilities meet proper standards.
“We are not yet in a position to make either of those two goals happen,” said Meg Phillips, external affairs liaison for NARA. By December, though, NARA will have a plan, she said. And it will need help from industry partners to get there.
Which is why NARA hosted an industry day Tuesday at the National Archives. DOD’s Carey joined Interior Department CIO Bernard Mazer and NARA CIO Michael Wash to discuss their successes and strategies to this point, and to give industry leaders a sense of where there were opportunities to help in coming years. Agencies are working to streamline and automate their records collection processes, they said, while NARA is building the capacity to store those records.
Data is growing at a rate of 32 percent per year, Wash said, and 90 percent of all data today was created in the last two years.
“What we have today is just the tip of the iceberg of what we have coming,” he said. The current NARA digital records takes up roughly 3 exabytes, nothing compared to the 400 exabytes “in the pipeline” (that’s 400 billion gigabytes, or the hard-drive capacity of roughly 550 million MacBook Pros). And individual data sets are getting bigger. The 2010 census alone was 330 terabytes — nearly 500 MacBook Pros worth of storage and almost triple the size of the 1940 census data set.
The confounding zettabyte? It’s yet another step up — 1,000 exabytes. And it’s already here. The exploding amount of data NARA receives is only about 2 to 3 percent of the data created within the government. Which means individual agencies are dealing with the time-consuming and fallible process of determining what is and isn’t a permanent record.
Automating this process can reduce error and free up employees to spend more time on their job, according to Interior’s Mazer. In May 2012, his department implemented a cloud solution to “capture, manage and preserve all records,” he said. The new system — which uses Google Apps for Government — collects all emails and stores them in a searchable database.
Previously, when the Interior Department received a Freedom of Information Act request, employees had to manually search their email for relevant records. Now that process is automated, saving employees countless hours.
“Our cloud service we have can be leveraged by other agencies,” Mazer said. “We used common business standards that are probably replicable across the government.”
But the issue across the government is as much technological as it is cultural, according to DOD’s Carey.
“In this information age we’re in … one extra step with your thumb is creating an additional step and thought process that we are not good at,” he said.
For instance, Carey has struggled to get DOD employees to take the extra step on a BlackBerry to ensure a sent email is encrypted. To get true cultural buy-in, CIOs have to present a record collection process that is seamless and goes unnoticed.
Which is where industry partners come in. If they can help deliver a seamless process, or enhance a system to automatically sort and classify records, Mazer and Carey believe government efficiency will drastically improve.
“Engines that can classify, tag, index, and then allow me to just put in search terms and get it based on my security credentials are the way of the future,” Carey said.
In this future, DOD employees could find — in seconds and with complete certainty — something specific (a document from April 23, 1987) or something more vague (overhead satellite images of an entire region).
“I look at tools we use for tagging and indexing, which allows absolute sure retrieval of information and data, as a way of turning our traditional process on its head,” Carey said.