Tag: llm

All the articles with the tag "llm".

Scalable oversight via adversarial deception in resume screening

7 Dec, 2025

Applying the Engels et al. (2025) scalable oversight framework to resume screening. We model the task as an adversarial Houdini–Guard game and measure how well a weaker Guard can detect a stronger Houdini's deceptive selections, fitting domain Elo curves across 8 models and 200 games per pair.
Steering chain-of-thought length — and what it does to faithfulness

4 Jun, 2025

Reproducing ThinkEdit's interpretable weight edits to mitigate overly short chain-of-thought reasoning, then extending the analysis with ChainScope's IPHR faithfulness evaluation across the Qwen3 family (0.6B–8B).

Scalable oversight via adversarial deception in resume screening