TokenJam Downsize

Pay Opus prices for Opus work, not for everything else.

Find tasks where a cheaper model would suffice.

The problem

Coding agents default to premium models on every call. Most calls don't need it. A "fix this typo" task gets routed to Claude Opus 4 the same way a "refactor the entire authentication module" task does.

The structural shape of those two requests is wildly different. The first one runs fine on Haiku for a fraction of the price. Downsize finds the calls that look like the cheap kind, computes what they would have cost on a smaller model, and shows you the difference — grounded in your own session history, not a vendor benchmark.

How it works

Walk every LLM call in your captured trace history. Classify each session by structural shape: input token count, output token count, tool-call count, single-turn vs multi-turn.

Sessions matching a known small-task pattern get flagged as candidates for a cheaper model. Compute what those sessions would have cost on the target model and report the difference. The recommendation is a structural heuristic, not a quality claim — so you decide whether to apply it.

Confidence levels

Every finding carries an explicit confidence level. TokenJam never claims a smaller model would have produced an identical answer; it shows the candidates with evidence, and you decide what to apply.

Level 1 OSS

Structural

Ships today. Flags candidates by structural heuristic — token counts, tool-call shape, turn count. Conservative thresholds keep false positives low.

Level 2 Pro

Replay-validated

Samples flagged sessions and replays them through the cheaper model. Compares tool-call sequences for AST-equivalence. Now the recommendation is backed by evidence from your own data, not a guess.

Level 3 Pro

User-validated

Tracks which recommendations you actually applied and didn't roll back. Promotes structural candidates to high-confidence rules over time, scoped to your codebase.

Example output

Verbatim from a real run against a real Claude Code project. No screenshots, no cherry-picks.

tj optimize
$ pip install "tokenjam[mcp]"$ tj onboard --claude-code  → Detects Claude Code installation  → Reads ~/.claude/projects/<project>/<session>.jsonl files (last 30d)  → Ingests ~9,800 spans across ~247 sessions into the local DuckDB  Done. 4M tokens analyzed, $4,287 implied API value, 23 days of history. $ tj optimize Analyzing 247 sessions, 9.8K spans, 4.0M tokens (last 30d, claude-code, api)…   ① Model downgrade: 47% of sessions look Haiku-eligible     • 116 of 247 sessions match structural heuristics:       short input (<4K tokens), short output (<800 tokens), ≤2 tool calls     • Currently running on: claude-opus-4-7, claude-sonnet-4-7     • Suggested target: claude-haiku-4-5     • At current usage: $1,886 actual → $254 estimated (-86% on flagged)     • Estimated monthly savings: $1,632 (-38% total)      ⚠ Structural heuristic only. Run `tj optimize --validate` to replay-test       on your actual sessions before applying.

What you do with it

Recommendations land in your existing tools — terminal, MCP-capable agent, or as an exportable config.

  • CLI
    tj optimize --finding model-downgrade
  • MCP
    should_downgrade_for_task

    query from inside any MCP-capable agent

  • Export
    tj optimize export-config --target claude-code-settings

    drops into your Claude Code settings.json

The research behind it

  • FrugalGPT

    Chen, Zaharia, Zou — Stanford 2023

    Introduced the cascade-confidence framework that grounds the validation work.

  • RouteLLM

    LMSys — ICLR 2025

    Matrix-factorization router with transfer properties. Our primary routing engine; MIT-licensed.

  • BFCL — Berkeley Function-Calling Leaderboard

    Patil et al. — ICML 2025

    AST-based evaluation of tool-call structure without execution. Makes replay validation cheap for coding agents.

Downsize is in the open-source CLI. Install once, analyze everything.