Observability

The observability stack the analyzers ride on

Most people meet TokenJam through cost optimization. Underneath is a full local-first observability tool for AI agents: traces, drift, alerts, budgets, schema validation, and a web UI. The five analyzers ride on top of this substrate. Free, open-source, MIT-licensed.

Get started View on GitHub

What's included

Six capabilities. All shipping in the open-source CLI today.

Real-time cost tracking

Every LLM call is priced as it happens using TokenJam's local pricing table. Spend breaks down by agent, model, session, and tool. Budget alerts fire before you hit the limit, not after.

tj cost --since 7d
Trace waterfalls

Every session is captured as a full OpenTelemetry span tree. See which tools ran, in what order, with what arguments, and how long each step took, in the local web UI or the CLI.

tj traces
Sensitive-action alerts

Configure any tool call as a sensitive action (send_email, delete_file, submit_form) and get notified instantly. 13 alert types, 6 channels: ntfy push (free, phone-friendly), Discord, Telegram, webhook, file, stdout.

tj alerts
Behavioral drift detection

TokenJam builds a Z-score baseline from your agent's real behavior (token counts, tool sequences, output shapes). When something drifts (a prompt tweak, a model update, a dep bump), you get a drift_detected alert at session end. No LLM-on-LLM evaluation required.

tj drift
Schema validation

Declare a JSON Schema for any tool's output or let TokenJam infer one from a few sessions. Schema violations are caught at ingest and surface as schema_violation alerts.

tj tools
Local web UI + REST API

`tj serve` runs a local dashboard at 127.0.0.1:7391 with status, traces, cost breakdown, alerts, budget, and drift. Prometheus metrics at /metrics. No cloud, no signup, runs entirely on your machine.

tj serve

It's also what powers the analyzers

Each analyzer needs a particular shape of telemetry. The observability stack is what collects it, so the analyzers can run against your real usage instead of a synthetic benchmark.

Downsize

Needs token counts, tool-call shape, session classification.

Cache

Needs captured prompts and prefix-stability tracking.

Script

Needs tool-name + argument-shape signatures per session.

Trim

Needs captured prompt content for token-significance scoring.

Reuse

Needs plan-skeleton clustering across completed sessions.

Local-first, by design.

Your spans contain prompts, completions, tool inputs, and customer data. Shipping that to a SaaS observability vendor is a data-egress decision most teams aren't ready to make. TokenJam captures, stores, and analyzes everything on your machine: DuckDB on disk, REST API on 127.0.0.1, no telemetry leaving by default.

When you do want to forward telemetry (to Grafana, Datadog, an OTLP collector), tj export ships it on your terms.

Real-time cost tracking

Trace waterfalls

Sensitive-action alerts

Behavioral drift detection