TokenJam Trim
Your system prompt grew over six months. Half of it isn't doing work.
Identify token waste in your system prompts.
The problem
Prompts accumulate. Every new edge case adds an instruction. Every project picks up a CLAUDE.md that gets longer. Tool schemas repeat across calls.
The actual signal in a 4,000-token system prompt might be 800 tokens of real instructions and 3,200 tokens of historical scar tissue. You pay for the whole thing on every call. Trim runs significance analysis on your captured prompts and shows you which sections carry the load and which are dead weight.
How it works
Trim runs LLMLingua-2's token-classification model — BERT-class, MIT-licensed, runs locally on CPU — over your captured prompts. Each token gets a score reflecting its contribution to model outputs.
Sections of consistently low-significance tokens get flagged as bloat candidates. The output is a highlighted view of your prompt with high-significance regions in bold and low-significance regions dimmed. You decide what to remove; Trim never edits your prompts at runtime.
Confidence levels
Every finding carries an explicit confidence level. TokenJam never claims a smaller model would have produced an identical answer; it shows the candidates with evidence, and you decide what to apply.
Structural
Token significance is mathematical, not a quality judgment. We recommend; you trim by hand. We never auto-compress prompts at runtime.
Example output
Verbatim from a real run against a real Claude Code project. No screenshots, no cherry-picks.
Prompt bloat detected in claude-code-myproj: • Your CLAUDE.md is 4,213 tokens (up 38% in 30 days) • Section "Coding conventions > Error handling" appears identically in 91 of 247 sessions (1,108 tokens × 91 = ~100K repeated tokens) • Significance analysis suggests ~340 of those 1,108 tokens carry the signal; the rest could be trimmed • Estimated cost: ~$8.50/mo at current usage on Sonnet Detail: open `tj report --bloat claude-code-myproj` to see the highlighted prompt with high-significance tokens bold, low-significance dimmed.
What you do with it
Recommendations land in your existing tools — terminal, MCP-capable agent, or as an exportable config.
- CLI
tj optimize --include-bloat - Report
tj report --bloat <agent_id>opens a local HTML file with the highlighted prompt
- MCP
surfaces in get_optimize_report when content capture is enabled
The research behind it
-
LLMLingua-2
Microsoft Research — ACL 2024
Token classification via GPT-4 distillation. 3–6× faster than LLMLingua-1. We use the same scoring mechanism for detection only — leaving the editing decision with you.