Lucas Bunzel

May 1, 2025

Diagnosing AI: Advancing Interpretability and Evaluations

Responding to Dario Amodei's urgent call for increased resources committed to AI interpretability, we agree on its importance while stressing the indispensable role of evaluations. Discover why understanding AI's internals and rigorously measuring its behavior are both necessary to ensure a future where AI is safe, steerable, and aligned with human values.