Tag: ai-safety
All the articles with the tag "ai-safety".
-
Does differential privacy solve copyright?
A walkthrough of why generative AI scrambles two centuries of US copyright doctrine, the proposed technical fixes — differential privacy, near access-freeness, clean-room training — and why none of them are actually copyright protection. Memorization ≠ infringement. Privacy ≠ copyright.
-
Data extraction after exact unlearning
Reproducing and extending Wu et al.'s Reversed Model Guidance attack against exact unlearning. Across WMDP and a synthetic medical dataset, RMG reliably outperforms unguided pre-unlearning generation, lifting A-ESR by up to ~63%, and reveals a "sweet spot" in forget-set ratio plus an inverse relationship between memorization and the optimal guidance scale.
-
Scalable oversight via adversarial deception in resume screening
Applying the Engels et al. (2025) scalable oversight framework to resume screening. We model the task as an adversarial Houdini–Guard game and measure how well a weaker Guard can detect a stronger Houdini's deceptive selections, fitting domain Elo curves across 8 models and 200 games per pair.