TokenJam Downsize
Pay Opus prices for Opus work, not for everything else.
Find tasks where a cheaper model would suffice.
The problem
Coding agents default to premium models on every call. Most calls don't need it. A "fix this typo" task gets routed to Claude Opus 4 the same way a "refactor the entire authentication module" task does.
The structural shape of those two requests is wildly different. The first one runs fine on Haiku for a fraction of the price. Downsize finds the calls that look like the cheap kind, computes what they would have cost on a smaller model, and shows you the difference — grounded in your own session history, not a vendor benchmark.
How it works
Walk every LLM call in your captured trace history. Classify each session by structural shape: input token count, output token count, tool-call count, single-turn vs multi-turn.
Sessions matching a known small-task pattern get flagged as candidates for a cheaper model. Compute what those sessions would have cost on the target model and report the difference. The recommendation is a structural heuristic, not a quality claim — so you decide whether to apply it.
Confidence levels
Every finding carries an explicit confidence level. TokenJam never claims a smaller model would have produced an identical answer; it shows the candidates with evidence, and you decide what to apply.
Structural
Ships today. Flags candidates by structural heuristic — token counts, tool-call shape, turn count. Conservative thresholds keep false positives low.
Replay-validated
Samples flagged sessions and replays them through the cheaper model. Compares tool-call sequences for AST-equivalence. Now the recommendation is backed by evidence from your own data, not a guess.
User-validated
Tracks which recommendations you actually applied and didn't roll back. Promotes structural candidates to high-confidence rules over time, scoped to your codebase.
Example output
Verbatim from a real run against a real Claude Code project. No screenshots, no cherry-picks.
$ pip install "tokenjam[mcp]"$ tj onboard --claude-code → Detects Claude Code installation → Reads ~/.claude/projects/<project>/<session>.jsonl files (last 30d) → Ingests ~9,800 spans across ~247 sessions into the local DuckDB Done. 4M tokens analyzed, $4,287 implied API value, 23 days of history. $ tj optimize Analyzing 247 sessions, 9.8K spans, 4.0M tokens (last 30d, claude-code, api)… ① Model downgrade: 47% of sessions look Haiku-eligible • 116 of 247 sessions match structural heuristics: short input (<4K tokens), short output (<800 tokens), ≤2 tool calls • Currently running on: claude-opus-4-7, claude-sonnet-4-7 • Suggested target: claude-haiku-4-5 • At current usage: $1,886 actual → $254 estimated (-86% on flagged) • Estimated monthly savings: $1,632 (-38% total) ⚠ Structural heuristic only. Run `tj optimize --validate` to replay-test on your actual sessions before applying.
What you do with it
Recommendations land in your existing tools — terminal, MCP-capable agent, or as an exportable config.
- CLI
tj optimize --finding model-downgrade - MCP
should_downgrade_for_taskquery from inside any MCP-capable agent
- Export
tj optimize export-config --target claude-code-settingsdrops into your Claude Code settings.json
The research behind it
-
FrugalGPT
Chen, Zaharia, Zou — Stanford 2023
Introduced the cascade-confidence framework that grounds the validation work.
-
RouteLLM
LMSys — ICLR 2025
Matrix-factorization router with transfer properties. Our primary routing engine; MIT-licensed.
-
BFCL — Berkeley Function-Calling Leaderboard
Patil et al. — ICML 2025
AST-based evaluation of tool-call structure without execution. Makes replay validation cheap for coding agents.