Advertisement

The Air Force isn’t shying away from bad data

Taking the bad data with the good makes it easier to decide what to run machine-learning analysis on, according to CTO Frank Konieczny.
Frank Konieczny, Air Force
Frank Konieczny speaks April 25, 2019, at the Security Through Innovation Summit presented by McAfee and produced by FedScoop and CyberScoop. (FedScoop)

Air Force analysts are taking a fresh approach to the service’s data by refusing to fix the “bad” material until after it has been presented to leadership, Chief Technology Officer Frank Konieczny says. 

An example could be two related but differing datasets about the on-board hardware of aircraft, Konieczny says. The data could be bad for various reasons: It could be stored in an old or obsolete format or system. It could be missing fields or include incorrect information. It could be mismatched, repeat or include typos. Whatever the case, the Air Force is trying to ignore the urge to hide those flaws, he says.

“We want the seniors to actually see the bad data so that they yell at people to get it fixed,” Konieczny said at an Armed Forces Communications and Electronics Association event Tuesday. “Most people like to fix it and then show the seniors the results.”

The Air Force’s Office of Information Dominance is working to clean things up, but for now it has more intelligence, surveillance and reconnaissance data than it can process, Konieczny said. The service needs “a bunch of” fit-for-purpose clouds — ones outside the Joint Enterprise Data Infrastructure general-purpose cloud that will eventually support the entire Department of Defense — with graphic processing units to perform machine-learning faster, he said.

Advertisement

The goal is to avoid having a single data lake that could go stale quickly, Konieczny said, so his team is meta-tagging data where it resides.

“Because we realize we can’t move all the data in the Air Force into a central cloud,” he said. “It’s just impossible. We tried it initially.”

With meta-tagging, users can “dynamically” cherry-pick the “right” data they want while still presenting “wrong” data, Konieczny said.

“We have to do things better with the data faster and look at the bad data,” he said.

Latest Podcasts