TokenJam — Token economics for AI agents

the problem

Your agent ran.
How much did it cost?
What did it do?

Every observability tool is built for LLM developers. TokenJam is built for people whose agents have real-world side effects — and real-world bills.

No visibility into what agents do while you sleep

Coding agents and autonomous workflows run for hours unattended. They edit files, send emails, hit APIs. Without observability, you find out what happened when something breaks — or when the bill arrives.

Surprise bills, no obvious fix

A Claude Code session can rack up $45 in an hour. Most calls don't need the most expensive model — but without per-task cost attribution, you can't tell which ones do.

Every tool requires a SaaS account

Behavioral drift, sensitive-action alerts, eval-to-production correlation — they all require API keys, hosted backends, and credit cards. TokenJam runs on your machine.

What TokenJam gives you

See where your tokens go — Real-time USD cost per LLM call, per agent, per task. Per-session, per-day, and per-prompt breakdowns. Works for Claude Code, Codex CLI, OpenAI Agents SDK, LangChain, CrewAI, or any OTel-native agent.

Find where they're wasted — tj optimize analyzes your real sessions and flags model-downgrade candidates plus per-provider monthly budget projection. Shipped.

Keep agents in line — Sensitive-action alerts (email sends, file writes, payment actions, form submits). Behavioral drift detection. Cost budget alerts with optional enforcement.

Connect evals to production — Import results from Inspect AI, DeepEval, Promptfoo, and others. Correlate eval failures with matching production sessions.soon

OTel-native, no vendor lock-in — Full GenAI Semantic Conventions compliance. Exportable to Grafana, Jaeger, Datadog, or any OTel backend.

Local-first, CLI-first — Single pip install. No signup, no proxy, no SaaS account. Full-featured CLI (tj status / traces / cost / drift) with JSON output. Local REST API + Prometheus /metrics.

features

Built for individuals.
Architected for teams.

Autonomous agent safety alerts

The only observability tool built for agents with real-world side effects. Configurable alerts fire on email sends, file writes, form submissions, and payment actions.

unique to TokenJam

Token & cost tracking

Real-time USD cost per LLM call, attributed to the agent and tool that triggered it. Configurable daily/session/per-agent budget alerts fire before you get the bill.

per-model pricing TOML

Local behavioral drift detection

Deterministic, no-cloud drift detection. Automatically baselines token usage, tool call sequences, output schema, and session duration — alerts when agents deviate.

no API key required

OTel-native telemetry

Full GenAI Semantic Conventions compliance from day one. Agent spans, tool calls, token metrics — exportable to Grafana, Jaeger, Datadog, or any OTel backend without transformation.

OTel SemConv v1.37+

Output schema validation

JSON Schema validation for tool outputs and agent responses. Declare schemas per-agent/tool in config, or use inference mode to auto-derive from observed sessions.

JSON Schema draft-07

CLI + local REST API

A full-featured CLI (tj status / traces / cost / drift) with JSON output on every command. Local API at localhost with Prometheus /metrics endpoint, OpenAPI spec included.

pipe-friendly · scriptable

Trace-driven cost optimization

Most coding-agent calls don't need the most expensive model. tj optimize analyzes your real sessions and flags model-downgrade candidates plus per-provider monthly budget projection — driven by your actual traces, not generic heuristics. Every recommendation ships with a quality-equivalence caveat, so you decide what to apply.

tj optimize · shipped

Works with every major agent runtime

OpenClaw

LangChain

LangGraph

LlamaIndex

CrewAI

AutoGen

OpenAI Agents SDK

Anthropic (direct)

Google Gemini

AWS Bedrock

NemoClaw

Custom agents

comparison

Other tools tell you the agent ran.
TokenJam tells you what it did,
what it cost, and what to fix.

The tools your team already uses are built for LLM developers. TokenJam fills the gap they all leave open.

Feature	TokenJam	Langfuse	LangSmith	Helicone	Guardrails AI
Observability
OTel GenAI SemConv nativecompliant from day one	✓	~	~	—	—
LLM call tracing	✓	✓	✓	✓	—
Token & cost tracking	✓	✓	✓	✓	—
Framework agnostic	✓	✓	—	✓	✓
Autonomous agent safety
Sensitive action alertsemail, file write, payment, form submit	✓	—	—	—	—
Cost budget alertsdaily / session / per-agent	✓	—	—	—	—
NemoClaw sandbox events	✓	—	—	—	—
Retry loop detection	✓	—	—	—	—
Runtime verification
Behavioral drift detection	✓	—	—	—	—
Output schema validation	✓	—	—	—	✓
Token economics
Trace-driven cost recommendationsmodel-downgrade candidates + per-provider budget projection	✓	—	—	~	—
Eval-to-production correlationimport Inspect / DeepEval / Promptfoo	~	—	~	—	—
MCP server for agent self-introspection13 MCP tools shipped	✓	—	—	—	—
Multi-agent fleet aggregationcloud.tokenjam.dev — coming soon	cloud	~	✓	—	—
Developer experience
Fully local, no signup	✓	✓	—	~	✓
CLI interface	✓	—	—	—	—
OTLP export to any backendGrafana, Jaeger, Datadog…	✓	✓	—	—	—
Open source / self-hostable	✓	✓	—	✓	✓

✓ Supported ~ Partial or roadmap — Not available

Token efficiency
for AI agents.

Your agent ran.
How much did it cost?
What did it do?

No visibility into what agents do while you sleep

Surprise bills, no obvious fix

Every tool requires a SaaS account

What TokenJam gives you

Built for individuals.
Architected for teams.

Other tools tell you the agent ran.
TokenJam tells you what it did,
what it cost, and what to fix.

Eval-to-production correlation.

Token efficiencyfor AI agents.

Your agent ran.How much did it cost?What did it do?

No visibility into what agents do while you sleep

Surprise bills, no obvious fix

Every tool requires a SaaS account

What TokenJam gives you

Built for individuals.Architected for teams.

Other tools tell you the agent ran.TokenJam tells you what it did,what it cost, and what to fix.

Eval-to-production correlation.

Token efficiency
for AI agents.

Your agent ran.
How much did it cost?
What did it do?

Built for individuals.
Architected for teams.

Other tools tell you the agent ran.
TokenJam tells you what it did,
what it cost, and what to fix.