Optimization technique

TokenJam Trim

Your system prompt grew over six months. Half of it isn't doing work.

Identify token waste in your system prompts.

tj optimize trim

Prompt bloat detected in claude-code-myproj:  • Your CLAUDE.md is 4,213 tokens (up 38% in 30 days)  • Section "Coding conventions > Error handling" appears identically    in 91 of 247 sessions (1,108 tokens × 91 = ~100K repeated tokens)  • Significance analysis suggests ~340 of those 1,108 tokens carry    the signal; the rest could be trimmed  • Estimated cost: ~$8.50/mo at current usage on Sonnet   Detail: open `tj report --trim claude-code-myproj` to see the  highlighted prompt with high-significance tokens bold,  low-significance dimmed.

The problem

Prompts accumulate. Every new edge case adds an instruction. Every project picks up a CLAUDE.md that gets longer. Tool schemas repeat across calls.

The actual signal in a 4,000-token system prompt might be 800 tokens of real instructions and 3,200 tokens of historical scar tissue. You pay for the whole thing on every call. Trim runs significance analysis on your captured prompts and shows you which sections carry the load and which are dead weight.

How it works

Trim runs LLMLingua-2's token-classification model (BERT-class, MIT-licensed, runs locally on CPU) over your captured prompts. Each token gets a score reflecting its contribution to model outputs.

Sections of consistently low-significance tokens get flagged as bloat candidates. The output is a highlighted view of your prompt with high-significance regions in bold and low-significance regions dimmed. You decide what to remove; Trim never edits your prompts at runtime.

What you do with it

Recommendations land in your existing tools: the terminal, an MCP-capable agent, or an exportable config.

CLI
tj optimize trim
Report
tj report --trim <agent_id>
opens a local HTML file with the highlighted prompt
MCP
surfaces in get_optimize_report when content capture is enabled

The research behind it

LLMLingua-2

Microsoft Research — ACL 2024

Token classification via GPT-4 distillation. 3–6× faster than LLMLingua-1. We use the same scoring mechanism for detection only, leaving the editing decision with you.

Trim is in the open-source CLI. Install once, analyze everything.

Get started GitHub