Advertisement

CISA’s chief data officer: Bias in AI models won’t be the same for every agency

Monitoring and logging are critical for agencies as they assess datasets, though “bias-free data might be a place we don’t get to,” the federal cyber agency’s CDO says.
(Getty Images)

As chief data officer for the Cybersecurity and Infrastructure Security Agency, Preston Werntz has made it his business to understand bias in the datasets that fuel artificial intelligence systems. With a dozen AI use cases listed in CISA’s inventory and more on the way, one especially conspicuous data-related realization has set in.

“Bias means different things for different agencies,” Werntz said during a virtual agency event Tuesday. Bias that “deals with people and rights” will be relevant for many agencies, he added, but for CISA, the questions become: “Did I collect data from a number of large federal agencies versus a small federal agency [and] did I collect a lot of data in one critical infrastructure sector versus in another?”

Internal gut checks of this kind are likely to become increasingly important for chief data officers across the federal government. CDO Council callouts in President Joe Biden’s AI executive order cover everything from the hiring of data scientists to the development of guidelines for performing security reviews.

For Werntz, those added AI-related responsibilities come with an acknowledgment that “bias-free data might be a place we don’t get to,” making it all the more important for CISA to “have that conversation with the vendors internally about … where that bias is.”

Advertisement

“I might have a large dataset that I think is enough to train a model,” Werntz said. “But if I realize that data is skewed in some way and there’s some bias … I might have to go out and get other datasets that help fill in some of the gaps.”

Given the high-profile nature of agency AI use cases — and critiques that inventories are not fully comprehensive or accurate — Werntz said there’s an expectation of additional scrutiny on data asset purchases and AI procurement. As CISA acquires more data to train AI models, that will have to be “tracked properly” in the agency’s inventory so IT officials “know which models have been trained by which data assets.” 

Adopting “data best practices and fundamentals” and monitoring for model drift and other potentially problematic AI concepts is also top of mind for Werntz, who emphasized the importance of performance security logging. That comes back to having an awareness of AI models’ “data lineage,” especially as data is “handed off between systems.” 

Beyond CISA’s walls, Werntz said he’s focused on sharing lessons learned with other agencies, especially when it comes to how they acquire, consume, deploy and maintain AI tools. He’s also keeping an eye out for technologies that will support data-specific efforts, including those involving tagging, categorization and lineage.

“There’s a lot of onus on humans to do this kind of work,” he said. “I think there’s a lot of AI technologies that can help us with the volume of data we’ve got.” CISA wants “to be better about open data,” Werntz added, making more of it available to security researchers and the general public. 

Advertisement

The agency also wants its workforce to be trained on commercial generative AI tools, with some guardrails in place. As AI “becomes more prolific,” Werntz said internal trainings are all about “changing the culture” at CISA to instill more comfort in working with the technology.

“We want to adopt this. We want to embrace this,” Werntz said. “We just need to make sure we do it in a secure, smart way where we’re not introducing privacy and safety and ethical kinds of concerns.” 

Latest Podcasts