#evaluation

2 posts

Jun 1, 2026 · 8 min read

Reddit is 40% of your agent's retrieval surface

What 150K LLM citations tell builders about prompt-time grounding, eval coverage, and the source biases their agents inherit by default.
May 12, 2026 · 11 min read

What is agent evaluation?

Agent evaluation: measuring multi-step trajectories, tool use, and open-ended outputs. Why benchmarks alone don't tell you whether an agent works in production.