TokenJam Cache

Cut 30–60% of your bill with caching the provider already supports.

Detect cacheable prompt prefixes. Save 30–60% with provider-native caching.

The problem

Anthropic, OpenAI, and Google all offer prompt caching. The cached portion of a prompt is billed at roughly 10% of the normal input rate. For a Claude Code user, the system prompt plus tool schemas plus CLAUDE.md is 2–4K tokens that are identical across every call in a session.

Without explicit cache_control markers, you pay full price for that prefix on every call. Cache walks your prompt history, finds the stable prefixes, and tells you exactly where to place the cache_control markers — with the exact savings calculation per provider.

How it works

Walk every prompt in the window. Compute prefix hashes at common breakpoint positions (after the system message, after tool schemas, after project context). Identify identical prefixes across calls within each provider's cache TTL window.

For each identified prefix, compute how many calls share it, how many tokens it represents, and what you'd save by placing a cache_control marker there. Output the specific config snippet that does it. For workloads with semantic-but-not-identical similarity (FAQ-style bots, repeated query patterns), Cache also detects clusters using GPTCache's cosine-similarity approach.

Confidence levels

Every finding carries an explicit confidence level. TokenJam never claims a smaller model would have produced an identical answer; it shows the candidates with evidence, and you decide what to apply.

Level 1 OSS

Structural

The math is deterministic. The provider's cached-read pricing is published; the savings calculation is arithmetic. High confidence by default.

Example output

Verbatim from a real run against a real Claude Code project. No screenshots, no cherry-picks.

tj optimize --finding cache-opportunity
Cache opportunities in last 30d:  • Identical 2,400-token prefix detected across 94% of your calls    (your CLAUDE.md + tools + system prompt)  → You're already using prompt caching for 11% of cacheable opportunities.  → Increasing cache_control breakpoints could save ~$42/mo (90% reduction     on the 89% of calls currently paying full price).  → See `tj report --cache claude-code-myproj` for the specific config.   • Semantic similarity ≥0.95 detected on 47 instances of "format SQL query"    style requests in last 30d.  → Candidate for opportunistic local semantic cache (TokenJam Pro).  → Estimated savings: $8/mo at TTL ≥ 1 day.

What you do with it

Recommendations land in your existing tools — terminal, MCP-capable agent, or as an exportable config.

  • CLI
    tj optimize --finding cache-opportunity
  • MCP
    find_cache_opportunity
  • Export
    cache_control snippets

    drop into your existing prompt-building code or Claude Code settings

The research behind it

  • GPTCache

    Zilliz — 2023

    Semantic-similarity threshold for opportunistic local caching (cosine ≥ 0.8 default).

  • Provider-native caching docs

    Anthropic, OpenAI, Google

    Cached-read pricing (~10% of normal input rate) and cache_control placement rules.

Cache is in the open-source CLI. Install once, analyze everything.