Anthropic vs. Pentagon: AI Ethics, Autonomous Weapons, and Surveillance

Explore the critical debate between Anthropic and the Pentagon over AI ethics, autonomous weapons, and mass surveillance. This guide details policy conflicts, AI reliability challenges, and industry responses to military AI demands.

5 min readAI Guide

Introduction

Introduction
This document analyzes the ongoing dispute between AI developer Anthropic and the US Department of War (Pentagon) regarding the deployment of advanced AI models. It highlights the ethical and practical implications of using AI for autonomous weaponry and mass surveillance, alongside the critical challenges in ensuring AI reliability and the broader industry's response to these issues.

Configuration Checklist

Element	Version / Link
Key Policies & Directives
DoD Directive 3000.09	Requires human judgment over the use of force for autonomous weapon systems.
DoD Directive 5240.01	Prohibits intelligence components from collecting information on U.S. persons except under specific legal authorities (FISA or Title 50).
Trump-era Responsible AI Implementation Pathway	Reinforces DoD Directive 5240.01.
Anthropic's Responsible Scaling Policy (RSP)	Originally committed to not releasing AI models if proper safety measures couldn't be guaranteed; later amended to allow release if a significant lead over competitors is lacking.
Defense Production Act	US law allowing the President to compel companies to prioritize government contracts.

Step-by-Step Guide

Step 1 — Anthropic Rejects Pentagon's 'Final Offer'

Step 1 — Anthropic Rejects Pentagon's 'Final Offer'
Anthropic, developers of the Claude AI models, rejected the Pentagon's 'final offer' for using its AI. This rejection was based on Anthropic's insistence on usage-policy restrictions, specifically prohibiting applications such as autonomous weapons and mass surveillance. The Pentagon had initially agreed to these terms in a prior contract, but the new offer sought to remove these safeguards.

Step 2 — Pentagon's Stance and Threats

The Department of War (Pentagon) has stated it will only contract with AI companies that agree to 'any lawful use' and remove safeguards. They threatened to remove Anthropic's models from their systems if safeguards were maintained. Furthermore, they threatened to designate Anthropic as a 'supply chain risk' (a label typically reserved for US adversaries, never before applied to an American company) and to invoke the Defense Production Act to force the safeguards' removal. This act would compel Anthropic to tailor its models to the military's needs, including for mass surveillance and autonomous killing.

Step 3 — Employee Solidarity

In response to the escalating tensions, employees from Google and OpenAI, two other leading AI companies, launched a petition titled 'We Will Not Be Divided'. This open letter asks their leaders (Sundar Pichai and Sam Altman) to put aside differences and stand together to refuse the Department of War's demands for permission to use their models for domestic mass surveillance and autonomously killing people without human oversight. As of the video's recording, the petition had garnered over 340 signatures, indicating significant internal support for Anthropic's stance.

Step 4 — Existing DoD Directives

The Verge reported that the Pentagon's demands appear to contradict its own established policies. DoD Directive 3000.09 requires that all autonomous weapon systems be designed so that commanders and operators can exercise appropriate levels of human judgment over the use of force. Additionally, DoD Directive 5240.01 and the Trump-era Responsible AI Implementation Pathway prohibit intelligence components from collecting information on U.S. persons except under specific legal authorities. This suggests the Pentagon is pushing for uses that are already against its own stated principles and regulations.

Step 5 — Anthropic's Ethical and Practical Objections

Anthropic's CEO, Dario Amodei, articulated two main objections to the Pentagon's demands:

Mass domestic surveillance: Anthropic supports AI for lawful foreign intelligence and counterintelligence missions. However, they argue that using these systems for mass domestic surveillance is 'incompatible with democratic values'. They concede that such surveillance might currently be legal, but only because 'the law has not yet caught up with the rapidly growing capabilities of AI'. They highlight that powerful AI can assemble scattered, individually innocuous data into a comprehensive picture of any person's life—automatically and at a massive scale—without a warrant.
Fully autonomous weapons: Anthropic states that 'frontier AI systems are simply not reliable enough to power fully autonomous weapons'. They argue that such systems would make too many mistakes, putting 'America's warfighters and civilians at risk'. They refuse to knowingly provide a product that poses such a risk, emphasizing the need for proper oversight and the critical judgment of highly trained personnel.

Step 6 — AI Agent Reliability Challenges

Research from papers like 'Agents of Chaos' and 'Towards a Science of AI Agent Reliability' (Princeton University, Feb 24, 2026) supports Anthropic's concerns about AI reliability. These papers highlight that AI agents, including those from OpenAI and Anthropic, can exhibit unpredictable and potentially harmful behaviors, even when appearing to perform well on benchmarks. Key reliability dimensions include:

Consistency: Repeatable outcomes under nominal conditions, with low variance across repeated trials. AI agents often show only modest consistency.
Robustness: Graceful degradation under input, environment, or tool perturbations. AI agents remain susceptible to surface-level prompt reformulations, leading to performance degradation.
Predictability: Prediction confidence aligned with accuracy; ability to detect limits and defer/escalate under uncertainty. This is crucial for understanding how an AI might behave in a war zone.
Safety: Bounded harm even when failures occur; worst-case severity remains acceptable. Even a small percentage of catastrophic failures can have devastating consequences in safety-critical applications.

These studies demonstrate that while AI models show impressive headline accuracy, their performance across these critical reliability dimensions is often lacking, making them unsuitable for high-stakes applications like autonomous weapons or mass surveillance.

Step 7 — Anthropic's Policy Shift

Despite its firm stance against the Pentagon, Anthropic recently loosened its own central safety policy, known as the Responsible Scaling Policy (RSP). The company had previously committed to not releasing AI models if it couldn't guarantee proper risk mitigations in advance. However, in a Tuesday blog post (Feb 25, 2026), Anthropic announced it would 'no longer do so if it believes it lacks a significant lead over a competitor'. This change, reported by Bloomberg and TIME, indicates a shift towards prioritizing competitiveness and economic growth over absolute safety guarantees, especially if competitors like OpenAI, Google, and xAI Corp are 'blazing ahead'.

Comparison Tables

AI Agent Reliability Dimensions

Dimension	Cross-Domain Notion	Domain-Specific Exemplars
Consistency	Repeatable outcomes under nominal conditions; low variance across repeated trials.	FAA requires deterministic execution of flight-critical software; NRC sets mandatory response times for digital computers in nuclear reactors.
Robustness	Graceful degradation under input, environment, tool perturbations; stable performance across the full operational envelope.	NASA investigation of software-related unintended acceleration in Toyota cars leads to recall; FAA mandates aviation sensor testing at extreme temperatures, turbulence, and vibration.
Predictability	Prediction confidence aligned with accuracy; detect limits and defer/escalate under uncertainty.	NRC models thousands of potential failure modes in nuclear reactors; aviation uses tiered risk classification with explicit probabilities.
Safety	Bounded harm even when failures occur; worst-case severity remains acceptable.	SIL 4 standard requires dangerous failure probability less than 10^-9; FAA uses a one catastrophic error per billion flight hours target.

AI Models' Stance on Pentagon Request (as of Feb 27, 2026)

AI Model	Autonomous AI Weapons	Domestic Surveillance
Gemini 1.1 Pro Preview	Neither – only considers much narrower, tightly governed variants that meet strict conditions.	Neither – only considers much narrower, tightly governed variants that meet strict conditions.
GPT-4.2	Neither – at least not as stated.	Neither – at least not as stated.
Claude 4.5 Opus	Neither – with important caveats.	Neither – with important caveats.
Grok 4	Cautions about 'not turning it into a sci-fi nightmare'.	Cautions about 'not turning it into a sci-fi nightmare'.

⚠️ Common Mistakes & Pitfalls

Over-reliance on Headline Accuracy: Focusing solely on high benchmark scores (e.g., 93% success rate) without evaluating consistency, robustness, predictability, and safety can lead to dangerous deployments in critical systems.
Ignoring Contextual Degradation: AI models may perform well in controlled environments but degrade significantly with subtle changes in input, environment, or tool calls, leading to unpredictable and potentially catastrophic outcomes.
Lack of Human Oversight: Deploying fully autonomous weapons without appropriate levels of human judgment over the use of force goes against established DoD directives and introduces unacceptable risks due to AI's current unreliability.
Unforeseen Privacy Erosion: AI's ability to aggregate scattered, innocuous data into comprehensive personal profiles, even if currently legal due to outdated laws, poses serious risks to fundamental liberties and can lead to mass domestic surveillance without proper warrants.
Unilateral AI Commitments: Companies making unilateral commitments to safety measures that are later abandoned due to competitive pressures can undermine trust and accelerate the development of potentially dangerous AI without sufficient safeguards.

Glossary

Autonomous AI Weapons: AI systems designed to select and engage targets without human intervention, making lethal decisions independently.
Mass Domestic Surveillance: The widespread, automated collection and analysis of data on a country's own citizens, often using AI, for intelligence or law enforcement purposes.
Responsible Scaling Policy (RSP): A framework adopted by AI development companies to guide the safe and ethical advancement of increasingly powerful AI systems, often including commitments to risk mitigation and transparency.

Key Takeaways

Anthropic is resisting Pentagon demands for unrestricted use of its Claude AI for autonomous weapons and mass domestic surveillance, citing ethical and reliability concerns.
The Pentagon has threatened significant repercussions, including labeling Anthropic a 'supply chain risk' and invoking the Defense Production Act, to compel compliance.
Employees from Google and OpenAI have publicly supported Anthropic's stance, calling for industry solidarity against military AI applications without human oversight.
Existing DoD directives already mandate human judgment for autonomous weapons and restrict domestic surveillance by intelligence components, suggesting internal policy conflicts within the Pentagon.
Anthropic argues that current frontier AI systems are 'simply not reliable enough' for fully autonomous weapons, posing risks to warfighters and civilians.
Research indicates that AI models, despite high benchmark accuracy, often lack consistency, robustness, predictability, and safety, which are critical for high-stakes applications.
Anthropic recently revised its Responsible Scaling Policy, indicating it may no longer delay AI development for safety if it believes it lags behind competitors, highlighting the tension between safety and competitive pressure.

Resources

Axios article: https://www.axios.com/2026/02/27/anthropic-pentagon-ai-safeguards-fight
Anthropic Statement: https://www.anthropic.com/news/statement-from-dario-amodei-on-our-discussions-with-the-department-of-war
The Verge article: https://www.theverge.com/2026/02/26/24083226/anthropic-pentagon-ai-military-contract-negotiations
Politico article: https://www.politico.com/news/2026/02/26/coherent-hegseth-anthropic-ultimatum-confounds-ai-policymakers-00138715
Open Letter 'We Will Not Be Divided': https://wewillnotbedivided.org/
'Agents of Chaos' paper: https://arxiv.org/pdf/2402.13532v1.pdf
'Towards a Science of AI Agent Reliability' paper: https://arxiv.org/pdf/2402.15266v2.pdf
LM Council.ai: https://lmcouncil.ai/
Bloomberg article: https://www.bloomberg.com/news/articles/2026-02-25/anthropic-adds-caveat-to-ai-safety-policy-in-race-against-rivals
TIME article: https://time.com/6890226/anthropic-ai-safety-policy-jared-kaplan-interview/

All guides Lire en français →