Optimization technique

TokenJam Cache

Find the stable prompt prefixes you're paying full price for, and where to cache them.

Detect cacheable prompt prefixes and the breakpoints that would cut input cost with provider-native caching.

tj optimize cache

Cache opportunities in last 30d:  • Identical 2,400-token prefix detected across 94% of your calls    (your CLAUDE.md + tools + system prompt)  → You're already using prompt caching for 11% of cacheable opportunities.  → Increasing cache_control breakpoints could save ~$42/mo (90% reduction     on the 89% of calls currently paying full price).  → See `tj report --cache claude-code-myproj` for the specific config.   • Semantic similarity ≥0.95 detected on 47 instances of "format SQL query"    style requests in last 30d.  → Candidate for opportunistic local semantic cache (TokenJam Pro).  → Estimated savings: $8/mo at TTL ≥ 1 day.

The problem

Anthropic, OpenAI, and Google all offer prompt caching. The cached portion of a prompt is billed at roughly 10% of the normal input rate. For a Claude Code user, the system prompt plus tool schemas plus CLAUDE.md is 2–4K tokens that are identical across every call in a session.

Without explicit cache_control markers, you pay full price for that prefix on every call. Cache walks your prompt history, finds the stable prefixes, and tells you exactly where to place the cache_control markers, with the exact savings calculation per provider.

How it works

Walk every prompt in the window. Compute prefix hashes at common breakpoint positions (after the system message, after tool schemas, after project context). Identify identical prefixes across calls within each provider's cache TTL window.

For each identified prefix, compute how many calls share it, how many tokens it represents, and what you'd save by placing a cache_control marker there. Output the specific config snippet that does it. For workloads with semantic-but-not-identical similarity (FAQ-style bots, repeated query patterns), Cache also detects clusters using GPTCache's cosine-similarity approach.

What you do with it

Recommendations land in your existing tools: the terminal, an MCP-capable agent, or an exportable config.

CLI
tj optimize cache
MCP
find_cache_opportunity
Export
cache_control snippets
drop into your existing prompt-building code or Claude Code settings

The research behind it

GPTCache

Zilliz — 2023

Semantic-similarity threshold for opportunistic local caching (cosine ≥ 0.8 default).
Provider-native caching docs

Anthropic, OpenAI, Google

Cached-read pricing (~10% of normal input rate) and cache_control placement rules.

Cache is in the open-source CLI. Install once, analyze everything.

Get started GitHub

The problem

How it works

What you do with it

The research behind it

GPTCache

Provider-native caching docs

Cache is in the open-source CLI. Install once, analyze everything.