#evaluation
2 posts
-
Reddit is 40% of your agent's retrieval surface
What 150K LLM citations tell builders about prompt-time grounding, eval coverage, and the source biases their agents inherit by default.
-
What is agent evaluation?
Agent evaluation: measuring multi-step trajectories, tool use, and open-ended outputs. Why benchmarks alone don't tell you whether an agent works in production.