Agent Interposition ARF

Architecture

The interposition
model.

Runner

Claude Code
Codex CLI
Gemini CLI
Ollama

$ANTHROPIC_BASE_URL
→ localhost:4554

HTTPS →
wire protocol

ARF Proxy

arf

✓ Protocol translate
✓ Governance eval
✓ Sign & record
✓ Intercept & steer

localhost:4554

→ HTTPS
canonical IR

Engine

Anthropic API
OpenAI API
Gemini API
Local Models

api.anthropic.com
api.openai.com
generativelanguage…

The proxy is fully transparent to the runner. Any AI coding CLI sees a standard API endpoint. ARF handles all protocol translation internally through a canonical intermediate representation, then forwards to whatever upstream engine you've configured.

Protocol Translation

Anthropic, OpenAI, Gemini.
One canonical IR.

Anthropic uses server-sent events with content_block_delta streaming. OpenAI uses chat completion chunks. Gemini has its own protobuf-flavored JSON. They're not compatible.

ARF's translation layer converts between all of them through a Canonical Intermediate Representation a single internal message format that captures the full semantics of any supported protocol. Translation is lossless for governance-relevant fields.

This means you can run Claude Code against OpenAI's API, or Codex CLI against Anthropic's. The runner doesn't know. The engine doesn't know. ARF does the work.

Bidirectional translation: inbound and outbound messages
Streaming-aware: SSE, chunked transfer, WebSocket
Tool/function call bridging across formats
System prompt normalization and injection
Preserves token counts, stop reasons, finish conditions

arf · protocol bridge

-- INBOUND (from claude-code) ----------------------

POST /v1/messages HTTP/1.1
x-arf-runner: claude-code
x-api-key: [redacted]
 
{
  "model": "anthropic/claude",
  "messages": [...]
}

-- CANONICAL IR --------------------------------

CIR {
  runner: claude-code
  engine: anthropic -> openai
  session: 01HX...QPBZ
  policy: standard ✓
}

-- OUTBOUND (to openai endpoint) ----

POST /v1/chat/completions HTTP/1.1
{
  "model": "openai/gpt",
  "messages": [translated],
  "stream": true
}

✓ Translation complete · 2.1ms

Transparent Operation

No agent code changes.
Ever.

ARF works by intercepting the HTTP traffic at the environment level. You set one environment variable ANTHROPIC_BASE_URL=http://localhost:4554 and every CLI that respects that variable is now governed by ARF.

Claude Code doesn't need a plugin. Codex CLI doesn't need a flag. Gemini CLI doesn't need configuration. The interposition is at the network layer, not the application layer. This is the Agent Runtime Firewall model: everything goes through the fence, nothing bypasses it.

ARF handles TLS termination, certificate injection for mTLS environments, and credential substitution so the runner sees a plain HTTP endpoint while ARF manages the secure upstream connections.

One env var: ANTHROPIC_BASE_URL
TLS termination and cert injection
Credential vault integration keys never touch disk
Works with any HTTP client in any language

Quick Start

# Install ARF
cargo install arf

# Start the proxy
arf start --profile standard

# Point any CLI at ARF
export ANTHROPIC_BASE_URL=http://localhost:4554
export OPENAI_BASE_URL=http://localhost:4554/openai
export GOOGLE_API_BASE=http://localhost:4554/gemini

# Run Claude Code as normal -- ARF intercepts everything
claude .

# Or route Claude through OpenAI's API
ARF_ENGINE=openai/gpt claude .
          

Multi-Runner Support

One ARF.
Every runner you use.

The Agent Runtime Firewall doesn't care which runner is connecting. It identifies the runner from the User-Agent header and incoming protocol shape, assigns the session to the appropriate protocol adapter, and proceeds. All runners share the same governance policy, the same audit chain, the same credential vault.

Running three agents in parallel (Claude on a feature branch, Codex doing code review, a DeepSeek model writing tests)? ARF tracks them as separate sessions under a shared work graph, applies governance uniformly, and records each decision to the same proof chain. The TUI shows all three at once.

Claude Code · Codex CLI · Gemini CLI
Ollama · DeepSeek · Qwen · any OAI-compatible local model
Concurrent multi-session tracking with ULID attribution
Per-runner policy overrides via TOML profiles

arf · active sessions

Runner Registry

SESS  RUNNER          ENGINE              MSGS  TOKENS
●  01HX…AB12  claude-code       anthropic            142   28.4k
●  01HX…CD34  codex-cli         openai                88   12.1k
◐  01HX…EF56  gemini-cli        google                34    8.7k
●  01HX…GH78  ai-cli            ollama                21    4.2k

4 active sessions  ·  policy: standard  ·  uptime: 3h 42m

4 sessions 3 healthy 1 throttled ARF · Agent Runtime Firewall

What the Firewall Does

Everything between
CLI and cloud.

Protocol Translation

Bidirectional conversion between Anthropic, OpenAI, and Gemini wire formats. Every message passes through the canonical IR. No information lost.

Governance Evaluation

Every request and response is evaluated against your TOML policy rules before being forwarded. Policy violations trip circuit breakers in real time.

Cryptographic Recording

Each message is signed with Ed25519 and added to the SHA-256 hash chain. The record is tamper-evident from the first request to the last seal.

Credential Injection

API keys are pulled from the encrypted vault just-in-time, injected into outbound headers, and never written to disk. Runners hold no credentials.

Message Steering

Inbound prompts and outbound completions can be rewritten, augmented, or filtered by ARF rules before the runner or engine sees them.

Session Attribution

Every request is tagged with a ULID session ID, git commit context, and caller identity. Attribution is precise enough for compliance audits.

Why This Matters

The agent watchdog
sees everything.

Your runners are agnostic about which model runs them
Use Claude's UX with OpenAI's backend, or Codex's workflow against Anthropic's API. ARF translates everything. Mix, match, experiment without touching agent config.
Governance applies regardless of which runner or engine
Your cost caps, content rules, and circuit breakers are defined once in TOML and enforced at the proxy layer. An agent can't route around policy by switching models.
Every API call is on the record
The hash-chained proof bundle captures every message request and response with timing, token counts, policy decisions, and cryptographic signatures. Nothing slips through.
Local models are first-class citizens
Ollama, DeepSeek, Qwen, and any OAI-compatible local model plug into ARF the same way the cloud APIs do. Route sensitive work locally, experimental work to the cloud. Policy follows either way.

One proxy.Every agent. Every API.

The interpositionmodel.

Anthropic, OpenAI, Gemini.One canonical IR.

No agent code changes.Ever.

One ARF.Every runner you use.

Everything betweenCLI and cloud.

The agent watchdogsees everything.