Generalization

Why this matters

SigMap was not tuned for one repo. This benchmark matters because it shows the same workflow transfers across different languages, repo sizes, and architectures without manual tuning.

Official v6.10.10 benchmark snapshot

Benchmark ID: sigmap-v6.10-main · Date: 2026-05-22 (with R language)

Metric	Value
Hit@5	80% vs 13.6% baseline
Retrieval lift	5.9×
Prompt reduction	41.4% (2.84 → 1.67)
Task success proxy	53.3%
Overall token reduction	96.5%
GPT-4o overflow (without → with)	16/21 → 0/21

The important part of SigMap's benchmark story is not just the topline score. It is that the same retrieval approach works across a mixed set of repos rather than one curated demo project.

What "generalization" means here

SigMap's signature extractors are hand-written regex patterns, not ML models. Generalization means: do the patterns hold up on codebases the authors never inspected? The answer across these 90 tasks is yes — 80% hit@5 with no per-repo tuning in the latest saved v6.10.10 run.

21 repos (including 3 R language repos)
31 languages (added R and GDScript)
multiple domains
78.9% overall hit@5
no per-repo tuning

That snapshot is shared with the retrieval benchmark and the task benchmark, so the public docs now use one release number set instead of mixing older runs.

Why this matters

SigMap uses hand-written extractors and lightweight ranking rather than a hosted retrieval stack. The strongest proof of generalization is therefore breadth:

frameworks and application repos
libraries and dev tools
small, medium, and large codebases
languages with very different syntax shapes

Representative coverage

Category	Example repos
Web frameworks	`express`, `flask`, `gin`, `rails`, `laravel`, `fastapi`, `fastify`, `vapor`
Libraries / tooling	`axios`, `okhttp`, `serilog`, `riverpod`, `rust-analyzer`, `abseil-cpp`, `akka`
UI frameworks	`vue-core`, `svelte`

Practical takeaway

If you want one number to carry into launch messaging, use the shared v6.5.0 snapshot rather than an older per-page variant:

Domain	Repos	Hit@5	Example repo
Dev tools	1	100%	rust-analyzer
Systems lib	1	100%	abseil-cpp
State management	1	100%	riverpod
Concurrency	1	100%	akka
Web framework	8	83%	express, rails, gin, laravel, flask, vapor, fastify, fastapi
HTTP client	2	80%	axios, okhttp
Logging	1	80%	serilog
UI framework	2	80%	vue-core, svelte
Web app	1	60%	spring-petclinic

No domain scores below 60%. The variation is explained by repo structure (fragmented vs modular signatures) rather than language or domain category.

By repo size — small to 1,533 files

Size	File count	Repos	Avg hit@5
Small	≤25 files	5	80%
Medium	26–200 files	5	76%
Large	>200 files	8	93%

Large repos benefit most. Without SigMap, the random baseline for a 1,000-file repo is effectively 0% (5/1000 = 0.5%). SigMap's ranked retrieval closes that gap entirely, scoring 100% hit@5 on rails (1,179 files) and laravel (1,533 files).

Anti-overfitting evidence

SigMap's extractors use hand-written regex patterns per language — not ML models, not embeddings. They were written against a small set of internal fixtures. The 18 benchmark repos were never inspected during development.

Key signals that the results are not overfit:

Zero per-repo tuning — the same gen-context.js command with default config ran on all 18 repos
Blind selection — repos were chosen by GitHub star count and language diversity, not by testing which ones scored well
Failure modes are honest — Swift/vapor 60%, JavaScript/svelte 60%, fastify 60%, spring-petclinic 60% — genuine weak spots, not massaged away
Large repos score higher — if the extractor patterns were memorized, they'd degrade on unseen large codebases; instead they improve (93% vs 84% for small repos)

Repo inventory

Repo	Language	Domain	Files	Hit@5
express	JavaScript	Web framework	6	80%
flask	Python	Web framework	19	100%
gin	Go	Web framework	107	100%
spring-petclinic	Java	Web app	13	60%
rails	Ruby	Web framework	1,179	80%
axios	TypeScript	HTTP client	25	60%
rust-analyzer	Rust	Dev tools	635	100%
abseil-cpp	C++	Systems lib	700	100%
serilog	C#	Logging	99	80%
riverpod	Dart	State management	446	100%
okhttp	Kotlin	HTTP client	18	100%
laravel	PHP	Web framework	1,533	100%
akka	Scala	Concurrency	211	100%
vapor	Swift	Web framework	131	60%
vue-core	Vue	UI framework	232	100%
svelte	Svelte	UI framework	370	60%
fastify	JavaScript	Web framework	31	60%
fastapi	Python	Web framework	48	80%

Made in Amsterdam, Netherlands 🇳🇱

Generalization ​

Why this matters ​

Representative coverage ​

Practical takeaway ​

By repo size — small to 1,533 files ​

Anti-overfitting evidence ​

Repo inventory ​

Generalization

Why this matters

Representative coverage

Practical takeaway

By repo size — small to 1,533 files

Anti-overfitting evidence

Repo inventory