1Step 1 of 7

Starting Empty

Every session, an agent starts with nothing.

No memory of yesterday. No recall of decisions made. The agent is handed 40+ tool descriptions it may never use — burning nearly 10,000 tokens before it can even begin to reason. On smaller models (128K), this overhead triggers context overflow within 30 turns. On frontier models (200K–1M), the extra room just means more space for the agent to lose important details in the noise — the "lost in the middle" problem gets worse, not better.

Turn 0 of 100

No Optimization

200,000

100% — context overflow

Provider Compaction

150,000

75% — lossy compaction loop

With CTX

8,000

4% — CTX-native compression

Your 200K context after 0 turns

* CTX-native: all tool calls, responses, and memory encoded in CTX with external memory via SurrealDB

9,950

tokens wasted on tool descriptions per session

100%

context overflow after extended agentic sessions

6,500

tokens to rebuild context from scratch

2Step 2 of 7

The Agent Speaks CTX

What if it could ask in 10 tokens instead of 10,000?

CTX is a structured query language designed for how agents actually think. Instead of verbose JSON-RPC payloads, the agent writes concise operations — each one a thought and an action fused into a single expression.

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "search_issues",
    "arguments": {
      "query": "state:open is:issue",
      "limit": 10
    }
  },
  "id": 1
}

↓

?t github.search_issues "state:open is:issue" ^10

76%

fewer output tokens per operation

operators × 7 planes = entire API

3Step 3 of 7

The Gateway Routes

One gateway. Seven planes. The agent doesn't need to know which backend handles what.

The CTX Gateway is a zero-trust router. It parses the agent's statement, identifies the target plane, runs it through 8-layer security middleware, and dispatches to the right backend — all in one hop. Bigger context windows don't solve this: routing 70 tool descriptions through a 1M-token window still costs the same per-token, and the agent still has to parse them all.

?k "auth middleware" #code ^3

gateway endpoint instead of 70 tool descriptions

security middleware layers per request

4Step 4 of 7

Memory That Persists

The agent remembers — across sessions, across days.

CTX memory is a 5-layer cognitive architecture: from split-second perception up to long-term procedural knowledge. Every memory is graph-structured, tag-searchable, and recalled in sub-millisecond time — flat to 50,000 entries. This is what provider compaction can't do: it summarizes away the details, while CTX stores them externally and recalls exactly what the agent needs, when it needs it.

+m "lesson" #architecture "singleton pattern breaks integration tests"

↓

?m "architecture" @7d ^3

0.78ms

memory recall, flat to 50K entries

cognitive layers (perception → procedural)

97%

fewer tokens for session resume

5Step 5 of 7

The Sidecar Translates

Compiled, not interpreted. Then cryptographically signed.

The sidecar is a deterministic compiler — no LLM in the translation path. It takes every CTX operation and compiles it into 7 target formats, including human-readable English. Every translation is Ed25519 signed, creating an immutable audit trail.

CTX Input?m "arch" @7d ^3

→ English

The agent recalled 3 architecture-related memories from the last 7 days.

compilation targets (SurrealQL, SQL, REST, GraphQL, JSON-RPC, English, CTXB)

LLMs in the translation path — pure compiler

6Step 6 of 7

Full-Stack Compression

Not just input. Every direction.

CTX compresses in all directions: input tokens (tool descriptions), output tokens (agent writes), response tokens (data returned), and delegation tokens (agent-to-agent handoffs). A traditional prose handoff costs 2,000 tokens — with CTX delegation, it's 15. Whether your model has 128K or 1M tokens, every token costs money. Larger windows mean larger bills.

Input

91%

9,950 → 850 tokens

Output

76%

49 → 12 tokens/call

Response

60%

JSON → CTX encoding

Delegation

99%

2,000 → 15 tokens

91%

input compression (9,950 → 850 tokens)

76%

output compression (49 → 12 tokens/call)

25-60%

response compression via CTX encoding

99%

delegation compression (2,000 → 15 tokens per handoff)

7Step 7 of 7

Agents Talk to Agents

One agent is useful. A network of agents with a shared language is transformative.

When agents coordinate through CTX, a handoff is 15 tokens instead of 2,000. The memory bus means one agent writes what it learned and the next reads it instantly. Federation means agents across organizations can query each other over mTLS.

2,000

Prose handoff

→

CTX handoff

// Prose handoff: ~2,000 tokens
{
  "type": "handoff",
  "from": "agent-alpha",
  "context": "I have completed the parser implementation 
    including all edge cases for nested pipes. The SSE 
    transport layer needs wiring next. Authentication 
    is handled via OIDC auto-discovery. Note that the 
    singleton pattern caused test failures, switched to 
    factory pattern. See files: src/parser/...",
  // ... 1,900 more tokens of context
}

↓

+m "handoff" #state "parser done, SSE needs wiring"

tokens per agent-to-agent handoff

2,000

tokens for the same handoff in prose

99%

compression on coordination messages

Ready to give your agents a brain?

Install AgentCTX in 30 seconds. Your agents start saving tokens immediately — no behavior changes required.

Get Started →Calculate Savings

How AgentCTX Works