Commentary

The government’s AI efficiency numbers look good. That should worry you.

Current AI training across agencies is not sufficient and would benefit from more “original intelligence.”

By Jonathan Aberman

May 4, 2026

Listen to this article

0:00

This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.

(Getty Images)

Federal agencies reported 3,611 AI use cases in 2025, an almost 70% increase from 2024’s count, and over six times the number reported in 2023. The State Department is moving toward agentic systems, while the Department of Health and Human Services now holds one of the largest AI inventories in government.

Meanwhile, federal agencies have committed billions to AI, including more than $3 billion in civilian spending in the latest budget cycle, with the Department of Defense requesting $13.4 billion for AI and autonomous systems in fiscal 2026 as a standalone budget category. By any conventional measure of adoption, the numbers are extraordinary.

So why am I not impressed? Because AI adoption on its own is not successful AI transformation. Don’t believe me? Ask the Government Accountability Office.

A recent GAO report examined how four major agencies — DOD, the Department of Homeland Security, the General Services Administration, and the Department of Veterans Affairs — have been acquiring AI capabilities. It found a consistent pattern: agencies are learning hard lessons in isolation and then failing to share what they’ve learned. What are they learning? AI is not a mere efficiency tool. There’s something missing: a significant disconnect between what AI promises and the outcomes it’s delivering. This is a symptom of organizations moving faster than their own capacity to understand what they are doing.

When an agency reports a reduction in processing time or a lower cost per transaction, it’s measuring throughput. That’s a legitimate data point. But the problem is that mere efficiency doesn’t expose the inherent limitation of the GenAI models: their tendency to produce increasingly similar outputs. For many government functions — such as benefits determinations, regulatory decisions, enforcement actions, and policy analysis — the “right” answer often depends on differentiation and context. Government service often requires nuance and insights that AI cannot provide.

Agencies are under enormous pressure. Workforces have been reduced significantly. Leaders are being asked to demonstrate efficiency gains with fewer people, and GenAI is the obvious bridge. That creates a structural incentive to let AI carry more weight than it’s ready to, and for employees to defer to AI outputs rather than evaluate them critically. I’ve watched this happen in the private sector. We see it in the phenomenon of “AI slop” and in a surprisingly small number of AI adoption efforts resulting in financial return on investment.

Agentic AI has only amplified this problem. These systems can carry out multi-step tasks with limited human oversight, and compound further GenAI’s predisposition to sameness. Moreover, GAO’s own science and technology arm has noted that even the best-performing AI agents can only complete about 30% of complex tasks autonomously without error. That’s a useful number to hold in mind when you’re hearing agency leaders talk about rolling out agents across high-stakes workflows. The remaining 70% doesn’t disappear. In fact, it tends to land on a human being who may or may not have the context, the training, or the cognitive clarity to catch the error before it propagates downstream.

Agencies are scaling the tools faster than they’re scaling the human readiness to work alongside them effectively. The GAO findings aren’t a procurement story; they’re a human performance story wearing procurement clothes.

How do we change this narrative?

It starts with accepting that “AI training” as currently practiced is not sufficient. Most agency training programs teach employees how to use the tools. That’s necessary but insufficient. What agencies need to understand is not whether they can operate AI, but whether their ability to provide differentiation alongside AI improves or degrades in AI-assisted workflows. Those are very different questions, and right now, most agencies have no objective way to answer the second one.

The reality is that the ability to provide differentiation — an attribute that I describe as “Original Intelligence” — can be measured. People can be trained to apply their Original Intelligence to use AI as an assistant, and not a substitute. AI can handle the routine efficiently, while humans focus on what matters most: creating differentiation.

Moreover, because people use their Original Intelligence differently, you can match individuals to roles that will allow them to flourish. Agencies can establish baseline pictures of where human thinking is strongest, where over-reliance on AI is emerging, and which roles carry the highest risk if autonomous systems are given too much latitude. That’s not a soft assessment — it’s actionable workforce intelligence that lets leaders make smarter sequencing decisions about where AI gets introduced, at what pace, and with what level of human review built in.

The agencies that get AI transformation right will not be the ones that deploy the most tools the fastest. They’ll be the ones that understand their human capabilities clearly enough to know where AI amplifies them and where it quietly replaces differentiation that shouldn’t be replaced.

American citizens deserve better than that. And frankly, so do federal employees.

Jonathan Aberman is co-founder and CEO of Hupside, a partner at Ruxton Ventures, and founding dean of Marymount University’s School of Business, Innovation, Leadership and Technology.