MATS

Sycophancy Steering: …

A 20-hour mechanistic interpretability investigation on DeepSeek-R1-Distill-Llama-8B. I isolate a steering vector that reduces sycophancy, but observe a coherence trade-off at higher strengths.