‘We’re lucky someone wasn’t killed’: A look at the patent office’s Christmas outage
The U.S. Patent and Trademark Office’s massive data center outage last Christmas was more than just an inconvenience, said the agency’s chief tech exec — it could have been deadly.
“Metal vaporized, doors blown off hinges … we’re lucky someone wasn’t killed,”John Owens, the chief information officer of the Patent and Trademark Office, said at a Patent Public Advisory Committee meeting in a harrowing account that sounds more like the start of a horror flick than a quarterly update from a CIO’s office.
The result was that many key patent and trademark systems — like those that allow people to file and search for applications — were unavailable for nearly a week. The agency’s IT staff and vendors came in over the Christmas holiday to make fixes and restore service.
Intellectual property firms in the meantime had to mail in paper copies of their applications if they needed to secure a particular filing date, and the office moved due dates for responding to correspondence.
During the committee’s February hearing, the transcript of which was posted on the USPTO’s website, Owens laid out what happened: It started when electrical wire had been chaffing against the metal tube that held it, wearing down the insulation. It short-circuited and there was “so much power and voltage it obliterated the control system for one of the two flywheels that we use on the power.” Then, somehow, “that charge traveled well over 100 feet into the neighboring flywheel, which is a completely separate concrete room and blew it out.”
In a flash, thousands of servers across the data center went off, according to the transcript from a separate meeting of the trademark’s advisory committee. The agency’s headquarters, based in Alexandria, Virginia, lost network connectivity.
The tech team scrambled to respond, Owens said, digging in from their own inventory to make repairs that would bring systems back up.
“In the meantime, the companies that work with our facility here flew in parts and personnel from, literally, Europe and the United States” to make fixes to the power infrastructure, he said.
He said that applications in the cloud — like email, which uses Microsoft Office 365, and an experimental patent search application, which is on the Amazon Web Services cloud — were functioning. But data or applications tied to legacy IT took much longer to get back online.
During the hearings, Owens emphasized that the outage was caused by a power failure — not IT. His team enacted its continuity of operations plan, or COOP, to bring the data center back up. Indeed, he told FedScoop at USPTO’s headquarters that the team’s response was a “good news story.”
“There’s very few agencies I know of that the CIO can say, ‘I enacted the COOP plan, and it worked,'” Owens said. “And certainly it was proof I didn’t want to have.”
He added, “In the future, though, none of that will ever happen.”
Already, he said his team has been standing up next-generation systems to replace old ones, but some are still tied to data in the legacy IT. In the future, he said, that won’t be the case.
“My first reaction was actually sympathy for them more than anything,” Tony Cole, vice president and global government CTO at FireEye, told FedScoop after reading the transcript of the hearing.
Years ago, Cole said, he was working in a government facility that “was critical to national security” and, even though it had several layers of backup power systems, it experienced a somewhat similar outage. Making the repairs was so critical that parts were sent on a jet to speed the process.
“Quite frankly, nothing, nothing is perfect for a solution,” Cole said. “You can think you have the best engineered solution across the board that you’ve done everything in your power to ensure that nothing is gong to go down, and it’s still a distinct possibility.”
But he said the biggest lesson he hopes the government learns is that it needs to upgrade its IT infrastructure. He touted U.S. CIO Tony Scott’s push to create a $3.1 billion IT Modernization Fund, a proposal that he has been trying to sell to Congress. And he also recommended that the government make a stronger push into the cloud.
John Pescatore, director of emerging security trends at the SANS Institute, agreed that using the cloud was key. He said it’s critical that USPTO have the ability to get its most important, or “crown jewel,” applications quickly back online. For the USPTO, he said, that’s their online patent filing system.
“To not be able to restore your operations in that amount of time [for almost a week] — that’s a long time,” he said.
“[It] is fair to say they are still partying like it is 1999,” Waylon Krush, CEO of cybersecurity company Lunarline, said in an email.
He added, “Many legacy systems are very difficult if not possible to recover after a significant event happens like a major power outage. The USPTO was lucky they were able to find the spare parts” — and “even more lucky” it had staff to make the fixes.
At the same time, committee members in the hearings lauded Owens’ efforts, calling his work “heroic” or “Herculean,” according to the transcript. Mark Goodson, a member of the patent advisory committee and founder of Texas-based Goodson Engineering, said Owens’ work to restore service was downright surgical.
“The man was in his operating gown going into surgery,” he said during the hearing — noting, “it could have been a whole lot worse.”
Contact the reporter on this story via email Whitney.Wyckoff@fedscoop.com, or follow her on Twitter @whitneywyckoff. Sign up for all the federal IT news you need in your inbox every morning at 6:00 here: fdscp.com/sign-me-on.