Skip to content

Inspector

Added in v0.6.0 · Feature IDs: F-DX-09 (core) · F-DX-10 (UI) Extended in v0.7.0 · F-OBS-10 (agent DAG) · F-DX-11 (replay) · F-DX-08 (production dashboard) · F-DX-12 (test-case export)

The Gavio Inspector is an embedded, zero-dependency dev-time visualizer. While a request moves through the interceptor chain, the gateway emits span events — which interceptor fired, in what order, what each one changed, how long each took, what the provider returned. A bounded ring buffer assembles them into traces, and a localhost HTTP server renders them in a self-contained web UI.

Since v0.7.0 it also renders multi-agent call graphs and sessions, can replay any captured request, aggregates RED stats, exports traces as test cases, and doubles as a read-only production dashboard over a persisted audit store.

It is off by default — dev mode does not enable it implicitly.

Quickstart

python
from gavio import Gateway

gw = (
    Gateway.builder()
    .dev_mode(True)
    .inspect(True)          # defaults: full capture, http://127.0.0.1:7411
    .build()
)
typescript
import { Gateway } from 'gavio'

const gw = new Gateway({ devMode: true, inspect: true })
// defaults: full capture, http://127.0.0.1:7411
java
Gateway gw = Gateway.builder()
    .devMode(true)
    .inspect(true)          // defaults: full capture, http://127.0.0.1:7411
    .build();

Or via the environment — no code change:

bash
GAVIO_INSPECT=1 python your_app.py     # GAVIO_INSPECT_PORT / GAVIO_INSPECT_MODE to tune

Open http://127.0.0.1:7411 and send a request through the gateway. You get:

  • a live trace list (SSE) with status, latency, cost, PII and cache badges,
  • a waterfall per trace — every interceptor span with its duration, plus the provider call,
  • the redaction diff — original vs redacted text, side by side, per PII match,
  • decision records — what the cache, cost router, or your own interceptor decided,
  • a pipeline view with ordering lints (e.g. "audit registered before pii_guard — audit will hash unredacted prompts").

Capture modes

ModeIntended forContent captured
fullLocal dev, debuggingMessages, response, mutation diffs (secrets always masked)
redactedStagingPost-redaction content only; diffs omit the original text
metadataProductionNo content ever — the content-bearing fields are structurally absent from every event

Safety rails:

  • full outside dev mode refuses to start unless you set unsafe_content_capture_ack explicitly — raw-content capture cannot be enabled by accident.
  • Secret values (API keys, JWTs, private keys) are masked in every mode.
  • The server binds 127.0.0.1 only; a non-loopback bind without an auth_token is a startup error, not a warning.

JSON API

Everything the UI shows is plain JSON — same endpoints in all three SDKs. Every response carries an X-Gavio-Inspector-Mode header; when auth_token is set, all requests require Authorization: Bearer <token>.

EndpointReturns
GET /api/healthstatus, SDK, version, mode, trace count
GET /api/pipelinechain composition + ordering lints
GET /api/traces?limit=N&q=trace summaries, chronological; q matches trace-id or content-hash prefixes
GET /api/traces/{id}one assembled trace: summary + ordered events
GET /api/dag?root=<id> or ?session_id=<id>agent call graph with subtree cost/latency/status rollups v0.7.0
GET /api/sessionssessions with trace counts, errors, agents, cost, duration v0.7.0
GET /api/stats?group_by=&since=RED aggregates: rate, error %, latency p50/p95/p99, tokens, cost, cache hit-rate, PII counts v0.7.0
POST /api/replayre-fires a captured trace through the live gateway (full mode only) v0.7.0
GET /api/simulate-cost?trace_id=&model=recosts a trace under a different model v0.7.0
GET /api/traces/{id}/export?format=trace as a test case (see below) v0.7.0
GET /api/chain/verifywalks the audit hash-chain; store mode only v0.7.0
GET /api/streamlive event feed (Server-Sent Events)
GET /the bundled UI

The event contract is canonical JSON Schema: spec/InspectorEvent.schema.json. Cross-SDK parity is enforced by shared event-sequence test vectors that all three suites run.

Agent call graphs & sessions v0.7.0

Pass agent_id, parent_trace_id, and session_id on your gateway calls and the DAG tab reconstructs the multi-agent call graph — every node shows its own cost/latency/status plus subtree rollups (total traces, cost, latency, errors under that agent). The Sessions tab groups traces by session_id with per-session totals; clicking a session opens its graph.

Trace replay & edit-resend v0.7.0

From the trace detail (or POST /api/replay {"traceId", "overrides"?}) you can re-fire any captured request through the live gateway — the full interceptor chain runs again, PII guard included, never bypassed. Optionally edit the messages or the model first ("would haiku have been good enough?"). The result opens as a new trace. Replay requires full capture mode — the endpoint returns 403 otherwise, and the store-backed dashboard never replays.

Export a trace as a test case v0.7.0

GET /api/traces/{id}/export?format=test-vector|testkit-py|testkit-java|testkit-js renders a captured trace as a shared test-vectors/ JSON case or a runnable GavioTestKit unit test in any of the three languages. Detected PII values are replaced with the repo's synthetic fixtures (e.g. jan@example.com, NL91ABNA0417164300) before export, so real data never lands in a test file. Debug → regression test in one click. Requires full or redacted mode.

Cost simulator v0.7.0

GET /api/simulate-cost?trace_id=<id>&model=<model> recomputes a trace's cost from its captured token usage under a different model's pricing — e.g. "this call cost $0.0042 on sonnet; haiku would have been $0.0004".

Production mode: the read-only dashboard v0.7.0

Write your audit trail to a JSONL store, then serve the dashboard from it — no running gateway, metadata mode, no content, no replay:

python
from gavio.interceptors.audit import AuditInterceptor, JsonlSink

gw = (
    Gateway.builder()
    .provider("anthropic")
    .use(AuditInterceptor(sink=JsonlSink("audit.jsonl"), hash_chain=True))
    .build()
)
bash
gavio inspect --store audit.jsonl          # http://127.0.0.1:7411
gavio inspect --store audit.jsonl --port 7412 --token $TOKEN

The store-backed server adds what production debugging needs: RED stats (/api/stats), search-by-content-hash (/api/traces?q=<hash prefix> — hash the prompt locally, look it up without content ever reaching the server), and the hash-chain verifier (/api/chain/verify, F-OBS-02 surfaced) that walks every previous_hash link and reports the first tampered record.

For fleet-level observability, pair with the Prometheus metrics (F-OBS-08) and the prebuilt Grafana dashboard; the InspectorEvent → OpenTelemetry span mapping is documented in docs/otel-mapping.md.

Recording decisions from your own interceptor

Anything you pass to ctx.inspect(key, value) during a hook surfaces as the decision record on that hook's span — a harmless no-op when the inspector is off. Interceptor state keyed by the interceptor's name (e.g. the cost router's cost_router entry) surfaces automatically.

python
async def before(self, request, ctx):
    ctx.inspect("triage", {"rule": "length-check", "routed": "standard"})
    return request
typescript
before(request, ctx) {
  ctx.inspect('triage', { rule: 'length-check', routed: 'standard' })
  return request
}
java
public CompletableFuture<GavioRequest> before(GavioRequest request, InterceptorContext ctx) {
    ctx.inspect("triage", Map.of("rule", "length-check", "routed", "standard"));
    return CompletableFuture.completedFuture(request);
}

Configuration

OptionDefaultDescription
modefull in dev mode, else metadataCapture level (see above)
port7411HTTP port; 0 picks an ephemeral port
bind127.0.0.1Non-loopback requires auth_token
auth_tokennoneBearer token for every endpoint
max_traces1000Ring-buffer capacity; oldest evicted
unsafe_content_capture_ackfalseRequired for full outside dev mode
start_servertruefalse = events/buffer only (used in tests)

With the inspector disabled the request path is untouched — emission is a no-op with no subscribers, and all suites run with it off by default.

Overhead v0.8.0

The performance budget (disabled ≈ 0, metadata < 1% p50, full < 5%) is enforced by the benchmarks/inspector/ harnesses, which run in CI for all three SDKs against a delay-padded mock provider. Measured p50 overhead per request: metadata 0.13–0.58%, full 0–2.7% — inside the budget in every SDK.

Released under the MIT License.