How one data-driven agency — the Census Bureau — found extra value in machine learning
Like many agencies, the Census Bureau looks for reductions in expenses and workloads when it makes decisions about machine learning. But the agency has discovered another advantage in the technology: It can find data that employees never knew they needed.
More than 100 different surveys are handled by siloed programs within the Census Bureau, and the capture, instrumentation, processing and summation of the resulting data is “really hard to manage,” said Zachary Whitman, chief data officer, at an AFCEA Bethesda event Wednesday,
The bureau’s dissemination branch exports data in a consolidated system where discovery and preparation is “difficult” for employees, Whitman said. So the agency is piloting ML that flags valuable information employees may not have even been searching for originally.
“How do you get people to translate into information they might not know about but would be very valuable to them?” Whitman said. “That’s where a lot of our AI is coming into play, not only with our search services, but also with our user engagement.”
When users write to the bureau about one of its products — maybe they found the title of a table confusing — a feedback algorithm analyzes their comment. The algorithm classifies positive and negative feedback, who the author is, why they used the tool, whether the comment concerns a feature or a bug, and how their experience might be improved.
That information is then relayed upstream to inform the development of new, customer-driven applications.
When it comes to ML, the bureau continues to try and make the value of a dollar go farther, Whitman said. The process of onboarding data from systems that look and feel differently — and have been operating on their own for years — can only scale up if operations and maintenance costs continue to decline.
“Trying to converge [systems] into a consolidated data model is a lot of work — manual work — because they’ll deliver to a spec that will fail 10, 15, 20 times, and each time it will require someone to go in and help them debug to understand what it is about the XML [formatting] that is failing the data,” Whitman said. “That inefficiency is gross and something that we are desperate to move off of, because we can ultimately never scale to where we need to go.”
The alternative, he added, is killing and refreshing the systems — a process that is much more costly and time-consuming.