Tag: evaluation
All the articles with the tag "evaluation".
-
Database Reporting Agent: a multi-agent text-to-SQL pipeline
A multi-agent text-to-SQL pipeline over a 15+ table enterprise database, with schema resolution, query generation, guardrails, validation, caching, and an evaluation suite.
-
Steering chain-of-thought length — and what it does to faithfulness
Selective weight editing of short-reasoning attention heads to steer chain-of-thought length, and a follow-up look at how the resulting longer reasoning affects faithfulness.