Advertisement

Anthropic developing a new tool to detect concerning AI talk of nuclear weapons

The system is already deployed on Claude traffic, the company says.
Listen to this article
0:00
Learn more. This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.
(Photo by RICCARDO MILANI/Hans Lucas/AFP via Getty Images)

As part of its ongoing work with the National Nuclear Security Administration, the small but critical agency charged with monitoring the country’s nuclear stockpile, Anthropic is now working on a new tool designed to help detect when new AI systems output troubling discussions of nuclear weapons.

Artificial intelligence systems have the potential to uncover all sorts of new chemical compounds. While many of those discoveries might be promising, and yield, for example, formulas to help propel nuclear energy sources, they might also risk outputting information that could make it easier to design a nuclear weapon. 

In a new blog post, the company said that along with the NNSA and the Energy Department’s national laboratories, it’s developed a classifier that’s able to automatically determine whether nuclear conversation with an AI chatbot is benign or concerning, with 96% accuracy. 

The system was developed based on an NNSA-curated list of nuclear risk indicators. More than 300 synthetic prompts were developed to validate the system. These prompts were generated — rather than sourced from actual user conversations — in order to protect privacy, an Anthropic spokesperson said. 

Advertisement

The agency then helped refine the classifier, confirming whether its evaluations of synthetic test prompts were correct. The work built on a previously announced red-teaming partnership between Anthropic and the NNSA. 

Notably, the system could make mistakes and flag conversations that aren’t actually attempts to use the system for concerning nuclear-related conversations. 

The tool is now deployed on Claude usage and is performing well, the company said. Other companies could easily implement the tool, a company spokesperson added. 

Rebecca Heilweil

Written by Rebecca Heilweil

Rebecca Heilweil is an investigative reporter for FedScoop. She writes about the intersection of government, tech policy, and emerging technologies. Previously she was a reporter at Vox's tech site, Recode. She’s also written for Slate, Wired, the Wall Street Journal, and other publications. You can reach her at rebecca.heilweil@fedscoop.com. Message her if you’d like to chat on Signal.

Latest Podcasts