Tag: llm
All the articles with the tag "llm".
-
Scalable oversight via adversarial deception in resume screening
Applying the Engels et al. (2025) scalable oversight framework to resume screening. We model the task as an adversarial Houdini–Guard game and measure how well a weaker Guard can detect a stronger Houdini's deceptive selections, fitting domain Elo curves across 8 models and 200 games per pair.
-
Steering chain-of-thought length — and what it does to faithfulness
Reproducing ThinkEdit's interpretable weight edits to mitigate overly short chain-of-thought reasoning, then extending the analysis with ChainScope's IPHR faithfulness evaluation across the Qwen3 family (0.6B–8B).