PropensityBench

Udari Madhushani Sehwag^1* , Shayan Shabihi^2* , Alex McAvoy³ , Vikash Sehwag⁴ , Yuancheng Xu⁵, Dalton Towers⁶ , Furong Huang²

¹Scale AI, ²University of Maryland, College Park, ³University of North Carolina at Chapel Hill, ⁴Google DeepMind, ⁵Netflix, ⁶University of Texas at Austin * Equal Contributions

Recent advances in Large Language Models (LLMs) have sparked concerns over their potential to acquire and misuse dangerous or high-risk capabilities, posing frontier risks. Current safety evaluations primarily test for what a model can do—its capabilities—without assessing what it would do if endowed with high-risk capabilities. This leaves a critical blind spot: models may strategically conceal capabilities or rapidly acquire them, while harboring latent inclinations toward misuse. We argue that propensity—the likelihood of a model to pursue harmful actions if empowered—is a critical, yet underexplored, axis of safety evaluation. We present PropensityBench, a novel benchmark framework that assesses the proclivity of models to engage in risky behaviors when equipped with simulated dangerous capabilities using proxy tools. Our framework includes 5,874 scenarios with 6,648 tools spanning four high-risk domains: cybersecurity, self-proliferation, biosecurity, and chemical security. We simulate access to powerful capabilities via a controlled agentic environment and evaluate the models’ choices under varying operational pressures that reflect real-world constraints or incentives models may encounter, such as resource scarcity or gaining more autonomy. Across open-source and proprietary frontier models, we uncover alarming signs of propensity: models frequently choose high-risk tools when under pressure, despite lacking the capability to execute such actions unaided. These findings call for a shift from static capability audits toward dynamic propensity assessments as a prerequisite for deploying frontier AI systems safely.

PropensityBench

Udari Madhushani Sehwag^1* , Shayan Shabihi^2* , Alex McAvoy³ , Vikash Sehwag⁴ , Yuancheng Xu⁵, Dalton Towers⁶ , Furong Huang²

¹Scale AI, ²University of Maryland, College Park, ³University of North Carolina at Chapel Hill, ⁴Google DeepMind, ⁵Netflix, ⁶University of Texas at Austin * Equal Contributions