Company Updates & Technology Articles
December 12, 2025
Scale AI outlines how exporting the full U.S. AI tech stack can secure global leadership, standards, and economic competitiveness.
December 9, 2025
We often discuss "AI Risk" as if it were a single, shapeless shoggoth. But the truth is that risk comes from specific sources, each requiring a different defense. This article dismantles the monolith, categorizing the six distinct vectors of danger: Adversaries, Unforced Errors, Misaligned Goals, Dependencies, Societal Impact, and Emergent Behavior. Learn to distinguish between these threats so you can move from panic to precise preparation.
November 25, 2025
To measure the propensity of agents to make unsafe choices, Scale, the University of Maryland, and other collaborators developed PropensityBench. This benchmark simulates real-world pressure by allowing agents to choose between a safe approach that consistently fails and a functional, harmful shortcut, revealing their true inclinations. The benchmark reveals that agent safety compromises significantly under pressure.
November 24, 2025
The next generation of AI agents is shifting from passive workers that receive user commands and generate outputs to active agents that plan, act, observe, and improve on their own. Agents now choose how to complete a task, which tools to use, and whom (or which agent) to collaborate with. LLMs didn’t invent agency, but they democratized it by turning frontier-level reasoning into a simple API call, letting teams compose complex systems from simple building blocks.
November 20, 2025
The Human Frontier Collective is a premier community of PhDs, academics, and industry leaders advancing AI through research, collaboration, and shared expertise.
Today, we add several new models to Showdown. A surprising finding is that users consistently rank GPT-5 significantly lower than other models. In this blog post, we share our preliminary analysis of GPT-5's ranking on Showdown, where we examine the effect of thinking effort, task type, and evaluation setting.
Earlier this week, we open-sourced Agentex to enable long-running enterprise agents. Today, we’re releasing a tutorial we created with Temporal that shows how to build a long-running procurement agent. It’s a concrete example of an agent that manages extended workflows, responds to external signals, and escalates to humans only when needed.
November 19, 2025
In collaboration with Princeton University, UMD, SecureBio, and the Center for AI Safety, we introduce BioRiskEval, the first comprehensive framework for assessing dual-use risks in open-weight bio-foundation models. Our stress tests on the Evo 2 model reveal a critical vulnerability: dangerous knowledge removed via data filtering often persists in hidden layers or can be rapidly restored with minimal compute. These findings challenge the reliance on simple data curation and underscore the urgent need for "defense-in-depth" strategies to secure the future of biological AI.
November 18, 2025
Today on the podcast, the team is talking about what happens when enterprise GenAI goes wrong. The team digs into recent public AI failures, reviewing the impact of each, whether they could have been prevented, and if so, how.
Scale AI is strengthening its commitment to the contributors behind Outlier, investing in improvements that make work more consistent, transparent, and rewarding.