Capability Floors: Why Your Model Sets Your Security Posture

4 minute read

A Phase 2 organization deploys a frontier model. They have identity controls, basic supply chain verification, and competence testing. They feel ready.

The model can write code, execute it in a sandbox, browse the web, and manage credentials. It operates at Phase 4 capability.

The organization is now exposed to Phase 4 threats with Phase 2 defenses.

This is the capability floor problem.

The Floor Mechanism

Security maturity models traditionally measure what an organization has built — controls implemented, processes documented, certifications achieved. You assess your posture, identify gaps, and work toward the next phase.

Capability floors invert this. They measure what the model can do, and impose a minimum security phase regardless of the organization’s self-assessment.

The logic is simple: if the model can autonomously execute code, then your security posture must account for autonomous code execution threats — whether you planned for them or not. The model’s capabilities determine your threat surface. Your threat surface determines your minimum viable security phase.

Model Capability Minimum Phase Why
Text generation only Phase 1 Standard prompt injection surface
Tool use (APIs, search) Phase 2 Supply chain and association threats
Code execution + networking Phase 3 Autonomous action with external effects
Credential management + persistence Phase 4A Stateful resource threats
Extended reasoning (deep think, CoT) Phase 4B Reasoning fidelity threats
Autonomous multi-step execution Phase 4C Unsupervised consequential action
Multi-agent delegation Phase 4D Compositional trust threats

The Granularity Problem

Before sub-phasing, “Phase 4” was a single bucket containing 21 controls, unordered. An organization deploying a model with credential management and an organization deploying multi-agent orchestration were both told “you need Phase 4.” That is not actionable.

Sub-phasing solves this. Phase 4 becomes four progressive stages:

Phase 4A — Stateful Resources. The model manages persistent state: memory, RAG, cached credentials. The threat is state manipulation — poisoning memory, corrupting retrieval, stealing cached tokens. Controls: memory integrity verification, credential lifecycle management, state isolation.

Phase 4B — Extended Reasoning. The model performs deep thinking, chain-of-thought reasoning, multi-step analysis. The threat is reasoning manipulation — unfaithful chain-of-thought, steganographic channels in thinking tokens, legibility decay. Controls: reasoning trace verification, faithfulness probing, thinking token analysis.

Phase 4C — Autonomous Execution. The model executes actions with real-world consequences without human oversight: code execution, network requests, financial transactions. The threat is unsupervised consequential action — the 3am problem. Controls: mandate-based delegation, pipeline enforcement, cryptographic attestation.

Phase 4D — Multi-Agent Composition. Multiple agents delegate to each other, forming trust chains. The threat is compositional trust decay — an agent delegates to a sub-agent that delegates to a tool that accesses a resource. Each hop weakens the trust chain. Controls: delegation chain attestation, compositional identity, trust propagation verification.

An organization deploying a model with credential management needs 4A. One deploying multi-agent orchestration needs 4A through 4D. The floor is specific, not generic.

The Ratchet Effect

Capability floors rise but never fall.

When your organization adopted a text-only model, the floor was Phase 1. When you added tool use, the floor rose to Phase 2. When you enabled code execution, Phase 3. Each capability threshold, once crossed, permanently elevates the minimum security posture.

You cannot lower the floor by restricting the model’s access. The model’s capability determines the floor, not its configuration. A frontier model configured to only answer questions still can write code if jailbroken. The capability exists. The threat surface exists. The floor holds.

This creates a forcing function. Organizations that adopt more capable models are structurally required to advance their security posture. Not by policy — by the architecture of risk.

The Recalibration Pipeline

Floors are not static. Every frontier model release triggers a recalibration:

  1. Capability Assessment — What can this model do that the previous version could not?
  2. Floor Computation — Which phase boundaries does this capability cross?
  3. Floor Publication — Updated floor specifications published to the maturity model
  4. Cohort Notification — Organizations deploying this model class are notified of floor changes

The compound surface makes this complex. The same model in different configurations yields different floors:

Model Mode Tools Floor
Frontier Chat only None Phase 1
Frontier Agent Search, APIs Phase 2
Frontier Agent Code execution, networking Phase 3
Frontier Agent Code + credentials + persistence Phase 4A-4C
Frontier Multi-agent Delegation + composition Phase 4D

The model is the same. The configuration changes the floor.

Why This Matters Now

The industry is moving fast. Organizations that spent months building Phase 2 security are deploying Phase 4 models because the capability is available and the business pressure is real.

The capability floor framework does not slow this down. It makes the risk explicit. You can deploy the frontier model. You accept the floor. You build the controls. The framework tells you which controls, in what order, at what granularity.

The alternative is deploying Phase 4 capabilities with Phase 2 defenses and discovering the gap when something goes wrong. The floor is not a gate — it is a mirror. It shows you where you are relative to where the model has already placed you.


This extends the “Zero Trust for Agentic AI” series. The maturity model, phasing, and capability floor dynamics are part of the broader framework.

The full model is grounded in a larger document corpus backed by a live implementation.