Skip to content

Methodology

This page describes how AgentCTX benchmarks are measured to ensure reproducibility and statistical validity.

  1. Measure, don’t estimate — all numbers come from instrumented test runs
  2. Control variables — same model, same prompts, same hardware
  3. Statistical significance — multiple runs with p50/p95/p99 percentiles
  4. Reproducible — benchmark scripts are in bench/ and can be run by anyone

The benchmark suite lives in bench/runner.ts:

Terminal window
# Run all benchmarks
npx tsx bench/runner.ts
# Run specific benchmark
npx tsx bench/runner.ts --suite parser
npx tsx bench/runner.ts --suite gateway
npx tsx bench/runner.ts --suite sidecar
  1. Configure an agent with raw MCP connections (no gateway)
  2. Run a standardized workload (tool discovery, calls, knowledge search)
  3. Count total tokens via API usage reports (OpenAI, Anthropic)
  4. Record input tokens, output tokens, and total cost
  1. Configure the same agent with AgentCTX gateway
  2. Run the identical workload
  3. Count tokens via the same API reports
  4. Record savings per category
TaskDescriptionOperations
Tool DiscoverySearch and inspect 70 MCP tools70 ?t + 10 !t
Tool ExecutionCall tools with various arguments150 >t
Knowledge SearchSearch project documentation50 ?k
Memory OperationsStore and retrieve memories100 +m + 50 ?m
Total430 operations
  • Instrumentation: performance.now() wrapped around each operation
  • Warmup: 100 operations discarded before measurement
  • Sample size: 1,000 operations per measurement point
  • Percentiles: p50, p95, p99 computed from sorted samples
  • Units: milliseconds (ms) for gateway, microseconds (μs) for parser/CTXB
  • Method: Saturate a single thread with sequential operations
  • Duration: 10 seconds per measurement
  • Metric: Operations per second (ops/sec)
  • Variants: TypeScript vs Rust native (same machine, same workload)

All benchmarks run on the same hardware:

  • CPU: AMD Ryzen 9 7950X, no hyperthreading pinning
  • RAM: 64GB DDR5, no swap
  • Storage: NVMe SSD (Samsung 990 Pro)
  • OS: Ubuntu 22.04 (kernel 6.x) and Windows 11
  • Isolated: no other significant processes running
Terminal window
git clone https://github.com/ryan-haver/agentctx.git
cd agentctx
npm install
npm run build
npx tsx bench/runner.ts --output results.json

Results are written as JSON for analysis and comparison.