Driving Token Efficiency For AI Agents

TokenJam reads your agent's telemetry and tells you when to downsize, when to trim prompts, what to cache, what to script, and what to reuse. The result is a lower AI bill.

Open-source. Runs locally. Full observability stack included.

Get started Star it on GitHub (64 ⭐)

From your agents

Anthropic
OpenAI
Google
AWS
Microsoft
NVIDIA
LangChain
LiteLLM
Langfuse
OTLP
TokenJam

TOKENJAM

Observe Observability layer

Optimization analyzers

Visualized in

127.0.0.1:7391/#/overview

Lens Last 7d

Spend · last 7d $284.71

At this pace, ~$1,219 by end of June

(linear run-rate, not a forecast)

Recoverable waste Estimated

3 unread alerts · 0 agents drifting

Available via CLI · MCP · Exported config

Reads telemetry from every major agent runtime, framework, provider, and observability tool.

Claude CodeCodex CLIOpenClawOpenAI Agents SDKGoogle ADKStrands Agent SDKLlamaIndexHaystackPydantic AISemantic KernelLangChainLangGraphCrewAIAutoGenNemoClawAnthropicOpenAIGoogle GeminiAWS BedrockLiteLLMLangfuseHeliconeLangSmithPhoenixccusageOTLP

Five analyzers + Lens. One install.

Each analyzer reads the same captured telemetry and surfaces a different shape of waste. Lens brings them together in a local dashboard you can open in your browser.

Downsize

Find tasks where a cheaper model would suffice.

tj optimize — downsize

$ tj optimize Analyzing 247 sessions, 9.8K spans, 4.0M tokens (last 30d)…   ① Model downgrade: 47% of sessions look Haiku-eligible     • 116 of 247 sessions match structural heuristics     • Currently: claude-opus-4-7, claude-sonnet-4-7     • Suggested target: claude-haiku-4-5     • At current usage: $1,886 → $254 (-86% on flagged sessions)     • Estimated monthly savings: $1,632 (-38% total)      ⚠ Structural heuristic only. Run `tj optimize --validate`       to replay-test on your actual sessions before applying.

See how it works

Trim

Identify token waste in your system prompts.

tj optimize — trim

Prompt bloat detected in claude-code-myproj:  • Your CLAUDE.md is 4,213 tokens (up 38% in 30 days)  • Section "Coding conventions > Error handling" appears identically    in 91 of 247 sessions (1,108 tokens × 91 = ~100K repeated tokens)  • Significance analysis suggests ~340 of those 1,108 tokens carry    the signal; the rest could be trimmed  • Estimated cost: ~$8.50/mo at current usage on Sonnet   Detail: open `tj report --bloat claude-code-myproj` to see the  highlighted prompt with high-significance tokens bold,  low-significance dimmed.

See how it works

Cache

Detect cacheable prompt prefixes. Save 30–60% with provider-native caching.

tj optimize — cache

Cache opportunities in last 30d:  • Identical 2,400-token prefix detected across 94% of your calls    (your CLAUDE.md + tools + system prompt)  → You're already using prompt caching for 11% of cacheable opportunities.  → Increasing cache_control breakpoints could save ~$42/mo (90% reduction     on the 89% of calls currently paying full price).  → See `tj report --cache claude-code-myproj` for the specific config.   • Semantic similarity ≥0.95 detected on 47 instances of "format SQL query"    style requests in last 30d.  → Estimated savings: $8/mo at TTL ≥ 1 day.

See how it works

Script

Surface recurring agent sessions that should have been simple (deterministic) scripts.

tj optimize — script

Deterministic workflow candidates (high confidence only):  • 23 sessions in last 30d executed identical 5-step sequence:    git pull → npm install → cat .env.staging → npm run build → pm2 restart     Zero argument variation. Zero observed branching.    Estimated current cost: ~$87/mo (23 sessions × ~$3.80 average)   → This looks like a deployment script, not an agent task.    Suggested: replace with `scripts/deploy-staging.sh`.    Estimated savings: $87/mo, plus ~30s latency per execution.

See how it works

Reuse

Find planning your agent keeps redoing and pays for every time.

tj optimize — reuse

Repeated planning detected (last 30d):  • Cluster "patch-release": 31 sessions share one plan skeleton    (read changelog → bump version → run tests → tag → push)  • Planning portion: ~2,100 tokens/session on claude-opus-4-7  • The skeleton is identical across all 31 runs; only the version    string and date change.  • You paid to generate this plan 31 times.  • Estimated recoverable: ~$54/mo if served from a plan cache    (or ~$61/mo if converted to a slash command)      ⚠ Structural analysis only. Review the exported templates       in `tj report --reuse` before reusing them.

See how it works

Lens

See your spend, your recoverable waste, and your alerts at a glance, in a local browser dashboard.

127.0.0.1:7391/#/overview

Overview Last 7d

Spend · last 7d $284.71

At this pace, ~$1,219 by end of June

(linear run-rate, not a forecast)

Recoverable waste Estimated

Downsize

$42 / mo

Cache

✓ at 100%

Reuse

$54 / mo

3 unread alerts · 0 agents drifting

See how it works

All six are in the open-source CLI. Install once, analyze everything.

Get started GitHub

How it works

Gathering Data

TokenJam reads from wherever your agent telemtry data lives. Local log files if you're on Claude Code or Codex. Ingestion from Langfuse, Helicone, LangSmith, or Phoenix if you have one of those running for observability. Direct OpenTelemetry capture if you don't have an observability tool.

Analyzing Data

Five analyzers, each grounded in published research and validated against your own data. Findings carry explicit confidence levels. TokenJam never claims a smaller model would have produced an identical answer; it shows the candidates with evidence, and you decide what to apply.

Optimizing Token Use

Recommendations land where you'll act on them. Run tj optimize in your terminal. Query TokenJam from inside Claude Code or any MCP-capable agent. Or export a routing config that drops into your existing Claude Code, LiteLLM, or framework setup.

Also included

A full observability stack

Everything you'd expect from a tracing tool, running locally alongside the analyzers.

Real-time cost tracking

Every LLM call priced as it happens. Spend per agent, model, session, and tool, visible the moment it occurs.

Trace waterfalls

Full span tree per session. See exactly which tools ran, in what order, and how long each step took.

Sensitive-action alerts

13 alert types, 6 channels (ntfy, Discord, Telegram, webhook, file, stdout). Get pinged before damage is done.

Drift detection

Z-score baselines flag when your agent starts behaving differently. No LLM-on-LLM evaluation required.

Schema validation

Declare a JSON Schema for your tools or let TokenJam infer one. Violations are caught the instant they occur.

Local web UI & REST API

tj serve runs a dashboard at 127.0.0.1:7391. Prometheus metrics at /metrics. No cloud, no signup.

See all observability features

Install

Three commands. Local. No signup.

install

$ pipx install 'tokenjam[mcp]'$ tj onboard --claude-code$ tj optimize

Star on GitHub

Open source · MIT · runs 100% locally · no signup