Tag: ai-safety

All the articles with the tag "ai-safety".

Does differential privacy solve copyright?

19 Apr, 2026

A walkthrough of why generative AI scrambles two centuries of US copyright doctrine, the proposed technical fixes — differential privacy, near access-freeness, clean-room training — and why none of them are actually copyright protection. Memorization ≠ infringement. Privacy ≠ copyright.
Data extraction after exact unlearning

11 Dec, 2025

Reproducing and extending Wu et al.'s Reversed Model Guidance attack against exact unlearning. Across WMDP and a synthetic medical dataset, RMG reliably outperforms unguided pre-unlearning generation, lifting A-ESR by up to ~63%, and reveals a "sweet spot" in forget-set ratio plus an inverse relationship between memorization and the optimal guidance scale.
Scalable oversight via adversarial deception in resume screening

7 Dec, 2025

Applying the Engels et al. (2025) scalable oversight framework to resume screening. We model the task as an adversarial Houdini–Guard game and measure how well a weaker Guard can detect a stronger Houdini's deceptive selections, fitting domain Elo curves across 8 models and 200 games per pair.

Does differential privacy solve copyright?