IRS dinged by GAO for subpar documentation of AI audit models

The tax agency has taken steps to address the watchdog’s concerns over how AI is used to select audit cases.

June 7, 2024

A view of the Internal Revenue Service building. (Chip Somodevilla / Getty Images)

An IRS pilot program that uses artificial intelligence to select audit cases and identify noncompliance didn’t properly document elements of the technology’s sample selection models, a new watchdog report found.

Because the tax agency had “not completed its documentation of several elements” of the models used for its National Research Program audits, the IRS could struggle to “retain organizational knowledge, ensure the models are implemented consistently, and make the process more transparent to future users,” according to the Government Accountability Office.

The IRS first piloted AI techniques for sampling tax returns in NRP audits during the 2019 filing season. The tax agency selected 4,000 returns for audit through that new AI-powered methodology, while an equal share was chosen through its traditional selection process. The following year, the NRP sample was approximately 1,500, all selected with the AI-informed process, and in 2021, 4,000 returns were picked based on two different AI samples.

The GAO noted that the implementation of redesigned sample selection processes “can be a complex undertaking,” especially when an emerging technology like AI is added to the mix. With that in mind, the watchdog pointed to the usefulness of its AI accountability framework.

“The AI Framework emphasizes the importance of documentation to help ensure that the AI system’s objectives are met,” the GAO wrote. “It further emphasizes that documentation can offer a way for agencies to provide transparency, such as (1) what the system is for, (2) what it is not for, (3) how it was designed, and (4) what its limitations are.”

The GAO’s audit found that the IRS had fallen short in two framework areas: clearly defining and documenting roles and responsibilities for each step of the AI sample selection process, and documenting the variables used to develop and run those selection models.

As the IRS reviewed the GAO report in April and responded with comments, it made two changes to address the watchdog’s concerns: writing a draft memo that listed the people responsible for steps in the AI development and sample selection process, and updating a technical document with specifics on variables and the code behind the AI models.

“These actions will increase IRS’s ability to effectively implement and ensure operational effectiveness of the AI models,” the GAO said.