Skip to content

Benchmark

These numbers are measured — not estimated. Every row was produced by running node gen-context.js --report --json against a real public repository cloned at --depth 1.

Token reduction across 16 languages

RepoLanguageRaw tokensAfter SigMapReduction
expressJavaScript15.5K20198.7%
flaskPython84.8K3.4K96.0%
ginGo172.8K5.7K96.7%
spring-petclinicJava77.0K63499.2%
railsRuby1.5M7.1K99.5%
axiosTypeScript31.7K1.5K95.2%
rust-analyzerRust3.5M5.9K99.8%
abseil-cppC++2.3M6.3K99.7%
serilogC#113.7K5.8K94.9%
riverpodDart682.7K6.5K99.0%
okhttpKotlin31.3K1.4K95.5%
laravelPHP1.7M7.2K99.6%
akkaScala790.5K7.1K99.1%
vaporSwift171.2K6.4K96.3%
vue-coreVue404.2K8.8K97.8%
svelteSvelte438.2K8.0K98.2%
Average16 repos12.0M82.0K99.3%

Token counts estimated at 4 chars/token (standard approximation used by OpenAI and Anthropic tooling).

LLM response-time savings

Every token sent to an LLM costs latency. A frontier model (Claude 3.5 Sonnet, GPT-4o) processes roughly 2,000 input tokens per second before generating a single output token. That means loading a large repo raw can stall your AI agent for minutes before it even starts responding.

Assumptions: ~2,000 tok/s uncached · ×10 faster with prompt cache (Anthropic & OpenAI both offer this)

RepoRaw (cold)SigMap (cold)1st call savedRaw (cached)SigMap (cached)Cache saved
express7.7s0.1s7.6s0.8s<0.1s0.8s
flask42.4s1.7s40.7s4.2s0.2s4.1s
gin1min 26s2.9s1min 24s8.6s0.3s8.3s
spring-petclinic38.5s0.3s38.2s3.9s<0.1s3.8s
rails12min 27s3.5s12min 24s1min 15s0.3s1min 14s
axios15.8s0.8s15.1s1.6s<0.1s1.5s
rust-analyzer29min 21s2.9s29min 18s2min 56s0.3s2min 56s
abseil-cpp19min 19s3.1s19min 16s1min 56s0.3s1min 56s
serilog56.9s2.9s54.0s5.7s0.3s5.4s
riverpod5min 41s3.3s5min 38s34.1s0.3s33.8s
okhttp15.6s0.7s14.9s1.6s<0.1s1.5s
laravel13min 59s3.6s13min 56s1min 24s0.4s1min 24s
akka6min 35s3.5s6min 32s39.5s0.3s39.2s
vapor1min 26s3.2s1min 22s8.6s0.3s8.2s
vue-core3min 22s4.4s3min 18s20.2s0.4s19.8s
svelte3min 39s4.0s3min 35s21.9s0.4s21.5s

At 10 calls/day across all repos: 1hr 40min saved per call · 16hr 35min/day · 6,055 hr/year

What "cached" means

When prompt caching is enabled (default on Claude, opt-in on OpenAI), repeated context — like your SigMap output — is served from the model's KV cache at ~10× the normal speed and ~10% of the cost. The SigMap output is small enough to cache for free in most tier plans. Raw repo content is usually too large and changes too often to cache reliably.

What "raw tokens" means

rawTokens = estimated token count of all source files in the indexed directories before any processing. finalTokens = token count of the generated .github/copilot-instructions.md output.

SigMap reads each file, extracts only the function/class/interface signatures (no bodies), and writes them into a compact context file. The reduction is the difference between those two numbers.

Reproduce the benchmark yourself

bash
# Clone the benchmark runner (included in the repo)
git clone https://github.com/manojmallick/sigmap
cd sigmap

# Run against all 16 repos — clones them fresh, runs sigmap, prints the table
node scripts/run-benchmark.mjs --save

# Already cloned? Skip the network step:
node scripts/run-benchmark.mjs --skip-clone --save

Results are saved to benchmarks/reports/token-reduction.json.

Against your own repo

bash
# In any project directory:
node gen-context.js --report

Example output:

[sigmap] report:
  version         : 3.3.1
  files processed : 57
  files dropped   : 0
  input tokens    : ~51965
  output tokens   : ~3375
  budget limit    : 6000
  reduction       : 93.5%

Or get machine-readable JSON for CI:

bash
node gen-context.js --report --json
# → {"rawTokens":51965,"finalTokens":3375,"reductionPct":93.5,...}

Why not 100%?

The output is not empty — it still contains the full signature index (~200–7K tokens depending on codebase size). That index is what your AI agent reads to understand the project structure at the start of every session. The goal is maximum information density at minimum token cost, not zero output.

Worst case is still 94.9%

The lowest value measured across the 16 repos was 94.9% (serilog/C#). Even on a repo where most code is already terse, SigMap cuts context by more than 18×.


Made in Amsterdam, Netherlands 🇳🇱

MIT License