The AI Eval Tax: The Hidden Cost Every Agent Team Is Paying show agents open source iris-eval.com

by irparent 6 days ago

Ian Parent

6 days ago

I built Iris because I kept shipping agents with no way to know if the outputs were actually good. Infrastructure monitoring sees 200 OK and moves on — it has no idea the agent just leaked PII or burned $40 on a single query.

Iris sits at the MCP protocol layer and evaluates every agent output against 12 deterministic rules. No LLM-as-judge — regex, thresholds, keyword checks that fire or don't. Independently rated AAA by Glama.

One command to try it: npx @iris-eval/mcp-server