Anthropic's Responsible Scaling Policy: What Version 3.1 Actually Means
Anthropic just updated its Responsible Scaling Policy for the fourth time. Here's what the new AI safety thresholds, Frontier Safety Roadmaps, and sabotage assessments actually mean — and how it stacks up against OpenAI and Google.
Anthropic has now updated its Responsible Scaling Policy four times in less than three years. Each revision gets a little more specific, a little harder to dismiss as corporate theater — and a little more revealing about what Anthropic actually thinks it’s building. Version 3.1, released April 2, 2026, is the latest iteration, following the sweeping Version 3.0 rewrite from February. If you care about what the most safety-focused major AI lab in the world has committed to doing when its models get genuinely dangerous, this is the document you should be reading.
Most people aren’t. That’s a mistake.
What the RSP Actually Is (And Isn’t)
The Responsible Scaling Policy is Anthropic’s self-imposed commitment not to train or deploy AI models that exceed certain capability thresholds without first proving they have adequate safeguards in place. Think of it as a conditional promise: “We will not release a model that can help someone build a bioweapon unless we have controls in place that make that sufficiently unlikely.”
It’s not regulation. It’s not law. There’s no external enforcement body that drags Anthropic to court if they violate it. The RSP is a reputational commitment — and the weight of that depends entirely on how much you trust Anthropic to follow it when the competitive pressure says otherwise.
That said, dismissing it as pure PR misses something important. The RSP has teeth that most corporate safety pledges lack: specific, measurable thresholds, published assessment methodologies, third-party evaluator access, and now a published record of every version and every update. When Version 2.2 quietly narrowed the ASL-3 security scope in May 2025 to exclude “sophisticated insiders and state-compromised insiders,” they published it. That’s more transparency than most labs offer on anything.
The ASL Framework: How Dangerous Is Dangerous?
The core architecture of the RSP is the AI Safety Level (ASL) system, borrowed conceptually from biosafety levels. The idea is that different capability levels require different safeguard tiers.
ASL-1 covers systems that pose no meaningful uplift over existing technology — a chess engine, a basic image classifier. No special controls needed.
ASL-2 is where every current Anthropic model sits, including Claude Opus 4.6. This means industry-standard security practices, existing deployment controls, and the baseline safety work already baked into training. Anthropic’s position is that Claude can discuss dangerous topics but doesn’t provide meaningful uplift beyond what a motivated person could find in a library.
ASL-3 kicks in when a model can “meaningfully assist someone with a basic technical background” in creating or deploying chemical, biological, radiological, or nuclear (CBRN) weapons. At this level, Anthropic commits to internal access controls, enhanced model weight protection, real-time monitoring, rapid response protocols, and mandatory pre-deployment red-teaming. Importantly: they cannot deploy an ASL-3 model until these controls are verified to work.
ASL-4 would require a model capable of “autonomously conducting complex AI research tasks typically requiring human expertise” — essentially a model that can meaningfully compress the pace of AI development itself. The Version 3.1 clarification on this threshold is worth noting: they now specify this means “compressing two years of 2018-2024 AI progress into a single year.” That’s a concrete benchmark, not a vague gesture.
Higher ASL levels are deliberately left open-ended. Anthropic’s position is that we don’t yet know what the right safeguards for ASL-5+ look like, so they won’t pretend to define them now.
What Version 3.0 Actually Changed (The Big One)
The February 2026 Version 3.0 update wasn’t an incremental revision — Anthropic called it a comprehensive rewrite, and they weren’t exaggerating. Two structural additions stand out.
Frontier Safety Roadmaps are now required. Rather than just publishing what thresholds exist, Anthropic must now maintain and publish detailed safety objectives — specific research goals and timelines for staying ahead of the capability curve. The April 2026 update noted that two prior roadmap goals had already been achieved: launching dedicated moonshot R&D safety projects, and completing a data retention policy assessment. This makes the safety work legible in a way it wasn’t before. You can now track whether they’re hitting their own stated targets.
Risk Reports quantify risk across deployed models. Instead of “here’s our policy,” you now get “here’s our current assessment of how dangerous our deployed models actually are.” The February 2026 Sabotage Risk Assessment for Claude Opus 4.6 — finding that it does not cross the AI R&D-4 threshold — is the first concrete output of this system. Critically, the report also acknowledges that “ruling out threshold crossing with confidence is becoming increasingly difficult” as models get more capable. That’s an honest admission of a serious measurement problem.
The Sabotage Assessment: Why It Matters
The February 2026 Sabotage Risk Report deserves its own attention. Anthropic tested whether Claude Opus 4.6 could meaningfully undermine human oversight of AI development — what they call the “sabotage” risk. The finding that it doesn’t cross the threshold is reassuring. The footnote — that confidently ruling out threshold crossing is getting harder — is not.
This is the central tension in any RSP-style framework: the evaluations are only as reliable as the methods used to conduct them, and those methods lag behind model capabilities. A model that can subtly deceive evaluators presents a self-defeating measurement problem. Anthropic is at least naming this openly, which is more than their competitors typically do.
How to Use This Information
For most users, the RSP has no direct practical effect on how you use Claude today. What it does affect:
Enterprise procurement decisions. If you’re a company evaluating AI vendors for high-stakes applications, the RSP gives you a concrete framework for asking hard questions: Has this model been assessed against CBRN uplift thresholds? What’s the current ASL designation? What monitoring is in place? Anthropic’s published summaries at anthropic.com/rsp-updates give you something to reference.
Assessing future model releases. When Anthropic announces a new model, the RSP tells you what evaluations it passed before being deployed. A model deployed under ASL-2 tells you something about Anthropic’s confidence in its safety profile. If a model is ever deployed under ASL-3 controls, the associated requirements tell you exactly what additional safeguards are in place.
Holding the lab accountable. The version history is public. Version 2.2 narrowed the scope of ASL-3 security requirements. That change is logged, dated, and available. If Anthropic makes a commitment in Version 3.1 and walks it back in Version 4.0, the record exists.
Reading the tea leaves on capability trajectory. When Anthropic says that the AI R&D threshold is “compressing two years of 2018-2024 AI progress into a single year,” that’s a signal about what they expect their models to eventually be capable of — and how close they think they are to it.
How This Compares to Competitors
OpenAI has its Preparedness Framework, Google DeepMind has its Frontier Safety Framework, and Meta has… significantly less structured public commitments. The differences matter.
OpenAI’s Preparedness Framework operates on similar principles — capability thresholds mapped to required safeguards — but the evaluation process is less transparent and the threshold definitions are vaguer. OpenAI’s “critical” designation has no public equivalent of Anthropic’s specific CBRN uplift test or the AI R&D compression benchmark. The version history is also not as cleanly maintained.
Google DeepMind’s Frontier Safety Framework is arguably the most technically rigorous in published form, with detailed evaluation methodology for dangerous capabilities. But it lacks the granular ASL system and the public commitment not to deploy without passing safeguards — DeepMind frames its framework more as ongoing research than as binding commitment.
Meta publishes responsible use policies and does model safety evaluations, but there’s no equivalent of the RSP’s don’t-deploy-if-you-fail-the-evaluation structure. Meta’s open-weight approach also makes deployment control a different kind of problem — once Llama is out, it’s out.
The honest comparison: Anthropic’s RSP has the most specific thresholds, the most transparent version history, and the most explicit commitment to halt deployment if thresholds are exceeded without corresponding safeguards. Whether that commitment holds under competitive pressure is the question none of these frameworks can answer from the outside.
The Honest Take
What’s genuinely impressive: The threshold specificity in Version 3.1 is real. “Compressing two years of 2018-2024 AI progress into a single year” is an actual testable benchmark, not a vibes-based assessment. The Frontier Safety Roadmap requirement forces Anthropic to publish concrete safety objectives and then track whether they hit them. The Sabotage Risk Report, including its honest acknowledgment of measurement limitations, is better epistemics than most institutions manage.
What’s legitimately concerning: The entire framework rests on self-certification. Anthropic evaluates its own models against its own thresholds using its own methodology. External AI Safety Institutes have access to shared methodology, but that’s not the same as independent verification with teeth. The May 2025 Version 2.2 change — narrowing the scope of ASL-3 security requirements to exclude sophisticated state-level adversaries — was made unilaterally and published after the fact. That’s transparency, but it’s also a reminder that the commitments can change.
What’s overhyped: The idea that publishing an RSP meaningfully constrains Anthropic’s behavior in the way external regulation would. If competitive dynamics push toward cutting corners, a public policy document provides friction, not a hard stop. The RSP is a commitment to a process, not a guarantee of an outcome.
What’s undersold: The internal governance changes. Jared Kaplan as Responsible Scaling Officer, the new Head of Responsible Scaling role, the expanded noncompliance reporting channels — these are organizational choices that shape decision-making in ways that public commitments don’t. Building the RSP into the org structure rather than treating it as a comms output is the part that actually makes the framework more than theater.
What This Means for AI Users
The RSP tells you something important about where the industry’s most safety-focused lab thinks AI capability is heading. The fact that they’re building detailed frameworks for ASL-4 and beyond — models capable of autonomously accelerating AI research — isn’t a hypothetical exercise. They expect to get there. The framework exists because they believe the threshold will eventually be crossed.
For users of Claude today, the practical takeaway is this: the model you’re using has been assessed against dangerous capability thresholds and found not to cross them. That assessment has methodology, version history, and is being updated as capabilities improve. That’s not nothing.
For anyone watching the broader AI race, the RSP is worth reading not just as a safety document but as a capability roadmap. Every threshold Anthropic defines is a threshold they believe future models will approach. The version history tracks their evolving understanding of what those models can do. Read the updates. The tea leaves are there.
The alternative — waiting for the lab to tell you what matters when it becomes unavoidable — is exactly what Anthropic is betting against. At minimum, they’re asking the right questions out loud. That’s still rare enough to notice.
Sources
> Want more like this?
Get the best AI insights delivered weekly.
> Related Articles
DeepSeek Platform V4: The API Price War Goes Nuclear
DeepSeek's API stack was already one of the best value plays in AI. With V4 nearing launch, the cost gap versus Western frontier models looks even more disruptive.
Veo 3.1 Lite: Google's Bet That Cheap Video Generation Is the Real Unlock
Google just dropped Veo 3.1 Lite, its most cost-efficient video model yet. It won't dazzle you in a demo — but it might be the version that actually matters for building real products.
Quantum Computing Meets AI: What's Real, What's Hype, and What's Coming
Quantum computing promises to supercharge AI, but separating breakthroughs from buzzwords requires cutting through layers of hype. Here's the honest picture.
Tags
> Stay in the loop
Weekly AI tools & insights.