Anthropic developing a new tool to detect concerning AI talk of nuclear weapons

The system is already deployed on Claude traffic, the company says.

August 21, 2025

Listen to this article

0:00

This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.

(Photo by RICCARDO MILANI/Hans Lucas/AFP via Getty Images)

As part of its ongoing work with the National Nuclear Security Administration, the small but critical agency charged with monitoring the country’s nuclear stockpile, Anthropic is now working on a new tool designed to help detect when new AI systems output troubling discussions of nuclear weapons.

Artificial intelligence systems have the potential to uncover all sorts of new chemical compounds. While many of those discoveries might be promising, and yield, for example, formulas to help propel nuclear energy sources, they might also risk outputting information that could make it easier to design a nuclear weapon.

In a new blog post, the company said that along with the NNSA and the Energy Department’s national laboratories, it’s developed a classifier that’s able to automatically determine whether nuclear conversation with an AI chatbot is benign or concerning, with 96% accuracy.

The system was developed based on an NNSA-curated list of nuclear risk indicators. More than 300 synthetic prompts were developed to validate the system. These prompts were generated — rather than sourced from actual user conversations — in order to protect privacy, an Anthropic spokesperson said.

The agency then helped refine the classifier, confirming whether its evaluations of synthetic test prompts were correct. The work built on a previously announced red-teaming partnership between Anthropic and the NNSA.

Notably, the system could make mistakes and flag conversations that aren’t actually attempts to use the system for concerning nuclear-related conversations.

The tool is now deployed on Claude usage and is performing well, the company said. Other companies could easily implement the tool, a company spokesperson added.