Skip to content

Generalization

Why this matters

SigMap was not tuned for one repo. This benchmark matters because it shows the same workflow transfers across different languages, repo sizes, and architectures without manual tuning.

Official v6.10.10 benchmark snapshot

Benchmark ID: sigmap-v6.10-main  ·  Date: 2026-05-22 (with R language)

MetricValue
Hit@580% vs 13.6% baseline
Retrieval lift5.9×
Prompt reduction41.4% (2.84 → 1.67)
Task success proxy53.3%
Overall token reduction96.5%
GPT-4o overflow (without → with)16/21 → 0/21

The important part of SigMap's benchmark story is not just the topline score. It is that the same retrieval approach works across a mixed set of repos rather than one curated demo project.

What "generalization" means here

SigMap's signature extractors are hand-written regex patterns, not ML models. Generalization means: do the patterns hold up on codebases the authors never inspected? The answer across these 90 tasks is yes — 80% hit@5 with no per-repo tuning in the latest saved v6.10.10 run.

  • 21 repos (including 3 R language repos)
  • 31 languages (added R and GDScript)
  • multiple domains
  • 78.9% overall hit@5
  • no per-repo tuning

That snapshot is shared with the retrieval benchmark and the task benchmark, so the public docs now use one release number set instead of mixing older runs.

Why this matters

SigMap uses hand-written extractors and lightweight ranking rather than a hosted retrieval stack. The strongest proof of generalization is therefore breadth:

  • frameworks and application repos
  • libraries and dev tools
  • small, medium, and large codebases
  • languages with very different syntax shapes

Representative coverage

CategoryExample repos
Web frameworksexpress, flask, gin, rails, laravel, fastapi, fastify, vapor
Libraries / toolingaxios, okhttp, serilog, riverpod, rust-analyzer, abseil-cpp, akka
UI frameworksvue-core, svelte

Practical takeaway

If you want one number to carry into launch messaging, use the shared v6.5.0 snapshot rather than an older per-page variant:

DomainReposHit@5Example repo
Dev tools1100%rust-analyzer
Systems lib1100%abseil-cpp
State management1100%riverpod
Concurrency1100%akka
Web framework883%express, rails, gin, laravel, flask, vapor, fastify, fastapi
HTTP client280%axios, okhttp
Logging180%serilog
UI framework280%vue-core, svelte
Web app160%spring-petclinic

No domain scores below 60%. The variation is explained by repo structure (fragmented vs modular signatures) rather than language or domain category.


By repo size — small to 1,533 files

SizeFile countReposAvg hit@5
Small≤25 files580%
Medium26–200 files576%
Large>200 files893%

Large repos benefit most. Without SigMap, the random baseline for a 1,000-file repo is effectively 0% (5/1000 = 0.5%). SigMap's ranked retrieval closes that gap entirely, scoring 100% hit@5 on rails (1,179 files) and laravel (1,533 files).


Anti-overfitting evidence

SigMap's extractors use hand-written regex patterns per language — not ML models, not embeddings. They were written against a small set of internal fixtures. The 18 benchmark repos were never inspected during development.

Key signals that the results are not overfit:

  • Zero per-repo tuning — the same gen-context.js command with default config ran on all 18 repos
  • Blind selection — repos were chosen by GitHub star count and language diversity, not by testing which ones scored well
  • Failure modes are honest — Swift/vapor 60%, JavaScript/svelte 60%, fastify 60%, spring-petclinic 60% — genuine weak spots, not massaged away
  • Large repos score higher — if the extractor patterns were memorized, they'd degrade on unseen large codebases; instead they improve (93% vs 84% for small repos)

Repo inventory

RepoLanguageDomainFilesHit@5
expressJavaScriptWeb framework680%
flaskPythonWeb framework19100%
ginGoWeb framework107100%
spring-petclinicJavaWeb app1360%
railsRubyWeb framework1,17980%
axiosTypeScriptHTTP client2560%
rust-analyzerRustDev tools635100%
abseil-cppC++Systems lib700100%
serilogC#Logging9980%
riverpodDartState management446100%
okhttpKotlinHTTP client18100%
laravelPHPWeb framework1,533100%
akkaScalaConcurrency211100%
vaporSwiftWeb framework13160%
vue-coreVueUI framework232100%
svelteSvelteUI framework37060%
fastifyJavaScriptWeb framework3160%
fastapiPythonWeb framework4880%

Made in Amsterdam, Netherlands 🇳🇱

MIT License