Wardenby Bitmill
Documentation

Runtime Policy

Every tool call your agent makes passes through Warden’s runtime layer before it executes. The evaluation produces one of three outcomes:

DecisionWhat happensWhen
DenyCommand is blocked. Agent sees an explanation of why.Destructive actions, hallucinated flags, secret exposure, unsafe patterns.
Allow + AdvisoryCommand runs, but the agent receives a targeted correction.Session is drifting, tool choice is suboptimal, verification debt is accumulating.
Silent AllowCommand passes through with zero overhead.Healthy work. No intervention needed.

This is not post-hoc analysis. The decision happens on the live path, before the command reaches your environment.

Why Runtime Policy Matters

Static rules in a CLAUDE.md or system prompt are suggestions. The agent can ignore them, misinterpret them, or forget them after context compaction. They also consume context budget on every turn — hundreds of tokens of instructions that the model re-reads but may not follow.

Runtime policy is different. It operates outside the model’s context window, on the actual tool call. The agent cannot bypass it because the evaluation happens before the command executes. A rm -rf / in a static rule file is a polite request. A rm -rf / caught by runtime policy is a wall.

Decision Types in Detail

Deny blocks the command entirely. The agent receives a structured explanation with the rule ID, what pattern matched, and what to do instead. The command never executes.

BLOCKED: rm -rf on broad paths. Remove specific files by name.
Rule: safety.0 | Pattern: \brm\s+-rf?\s+[~*/.]

The agent sees this message in place of the command output and adjusts its approach. Most agents self-correct immediately after a deny.

Allow + Advisory lets the command run but injects a hint into the response. The agent gets both the command output and the advisory. This is for situations where the action isn’t dangerous but could be done better.

Advisory: Use aidex_query for symbol lookups instead of rg.

The agent receives this alongside the grep results. It can choose to follow the advice or ignore it. Advisories are budgeted — Warden doesn’t flood the context with hints. The injection budget is tied to trust score: high-trust sessions get fewer advisories, struggling sessions get more.

Silent Allow produces zero overhead. No message is injected, no processing beyond the rule match. In a healthy session, 90%+ of tool calls hit this path. The agent works as if Warden isn’t there.

The Trust Model

Warden maintains a per-session trust score (0-100) that starts at 100 and adjusts based on agent behavior:

EventImpact
Command errors-5 per error
Verification debt (edits without tests)-3 per unverified file
Subsystem switches (jumping between unrelated code areas)-2 per switch
Dead ends (reverting changes, undoing work)-4 per dead end

High trust (85+) means the session is healthy. Warden injects at most 1 advisory and stays out of the way. Normal trust (50-84) allows up to 3 advisories. Degraded trust (25-49) allows up to 5. Below 25, the budget is uncapped — the session is struggling and needs active intervention.

This means a well-behaved agent barely notices Warden. An agent that’s drifting, looping, or making mistakes gets progressively more guidance.

How Policies Compose

Rules come from three sources, merged at startup:

  1. Compiled defaults — baked into the binary. Safety rules, substitutions, hallucination detection. These are the immutable floor. You cannot disable core safety rules.
  2. Global rules (~/.warden/rules.toml) — your personal overrides that apply to all projects. Add custom patterns, disable non-safety rules, adjust thresholds.
  3. Project rules (.warden/rules.toml in the project root) — project-specific overrides. A monorepo might need different thresholds than a small library.

Each section supports replace = true to fully override the defaults for that category, or the default append mode to add rules on top.

The Injection Budget

Not every advisory is worth injecting. Each advisory consumes context tokens, and excessive injection degrades the session it’s trying to help. Warden uses a budget system:

  • Advisories are ranked by relevance (based on the current session phase and what the agent is doing).
  • The trust score determines how many advisories can fire per turn.
  • Once the budget is spent, remaining advisories are logged but not injected.
  • Rules that consistently don’t improve outcomes get demoted by the learning system over time.

The net effect: early in a session when everything is going well, Warden is invisible. As the session degrades, Warden becomes more vocal — but only with the most impactful corrections.