TokenJam Cache
Cut 30–60% of your bill with caching the provider already supports.
Detect cacheable prompt prefixes. Save 30–60% with provider-native caching.
The problem
Anthropic, OpenAI, and Google all offer prompt caching. The cached portion of a prompt is billed at roughly 10% of the normal input rate. For a Claude Code user, the system prompt plus tool schemas plus CLAUDE.md is 2–4K tokens that are identical across every call in a session.
Without explicit cache_control markers, you pay full price for that prefix on every call. Cache walks your prompt history, finds the stable prefixes, and tells you exactly where to place the cache_control markers — with the exact savings calculation per provider.
How it works
Walk every prompt in the window. Compute prefix hashes at common breakpoint positions (after the system message, after tool schemas, after project context). Identify identical prefixes across calls within each provider's cache TTL window.
For each identified prefix, compute how many calls share it, how many tokens it represents, and what you'd save by placing a cache_control marker there. Output the specific config snippet that does it. For workloads with semantic-but-not-identical similarity (FAQ-style bots, repeated query patterns), Cache also detects clusters using GPTCache's cosine-similarity approach.
Confidence levels
Every finding carries an explicit confidence level. TokenJam never claims a smaller model would have produced an identical answer; it shows the candidates with evidence, and you decide what to apply.
Structural
The math is deterministic. The provider's cached-read pricing is published; the savings calculation is arithmetic. High confidence by default.
Example output
Verbatim from a real run against a real Claude Code project. No screenshots, no cherry-picks.
Cache opportunities in last 30d: • Identical 2,400-token prefix detected across 94% of your calls (your CLAUDE.md + tools + system prompt) → You're already using prompt caching for 11% of cacheable opportunities. → Increasing cache_control breakpoints could save ~$42/mo (90% reduction on the 89% of calls currently paying full price). → See `tj report --cache claude-code-myproj` for the specific config. • Semantic similarity ≥0.95 detected on 47 instances of "format SQL query" style requests in last 30d. → Candidate for opportunistic local semantic cache (TokenJam Pro). → Estimated savings: $8/mo at TTL ≥ 1 day.
What you do with it
Recommendations land in your existing tools — terminal, MCP-capable agent, or as an exportable config.
- CLI
tj optimize --finding cache-opportunity - MCP
find_cache_opportunity - Export
cache_control snippetsdrop into your existing prompt-building code or Claude Code settings
The research behind it
-
GPTCache
Zilliz — 2023
Semantic-similarity threshold for opportunistic local caching (cosine ≥ 0.8 default).
-
Provider-native caching docs
Anthropic, OpenAI, Google
Cached-read pricing (~10% of normal input rate) and cache_control placement rules.