December 19, 2025
The Agentic Era: Building the Foundation for Autonomous Mission Assurance
Agentic AI marks a shift from reactive chatbots to autonomous mission partners. Government must adopt unified Agentic Infrastructure—combining resilient agent execution and governed AgentOps—to enable machine-speed decisions. Platforms like Scale’s SGP and Agentex deliver interoperable, durable, and accountable autonomy for mission assurance.
Read more
April 25, 2025
Scale’s Role In Building a Safer Internet
Training AI models to behave responsibly in the real world means preparing them for the full range of online content — including the challenging parts. It’s not easy work, but it’s necessary. At Scale, we believe that building AI systems that avoid harmful, abusive, or dangerous behavior is one of the most important challenges of our time. And we’re proud to support the people who make this possible.
Read more
April 24, 2025
Introducing the Scale AI and University of Missouri - St. Louis Geospatial Collaborative
As part of Scale’s ongoing investment in its AI workforce in St. Louis, Scale and the University of Missouri-St. Louis (UMSL) are officially launching a collaborative education effort.
Read more
April 3, 2025
Outlier Updates to Empower Contributors
Since its inception in 2023, Outlier has become a cornerstone of the AI industry—connecting hundreds of thousands of people across the globe with meaningful and flexible work. Hailing from cities and small towns across the world, Outlier contributors have earned a combined hundreds of millions of dollars to help build the foundation of today’s most advanced AI models
Read more
April 2, 2025
Advancing Frontier Model Evaluation
Frontier AI development has reached an inflection point: as models rapidly advance in capabilities, the need for sophisticated evaluation has become a decisive factor in competitive success. That’s why today we're announcing updates to Scale Evaluation, our platform that helps teams identify model weaknesses and validate improvements. Our updated platform introduces four key capabilities: instant model comparison across thousands of tests, multi-dimensional performance visualization, automated error discovery, and targeted improvement guidance—all designed to help teams identify weaknesses faster and make more confident release decisions. These updates build on Scale Evaluation’s foundation introduced last year, broadening access to frontier evaluation capabilities.
Read more
March 26, 2025
Scale AI products approved for purchase on AWS Marketplace for the U.S. National Security Community
Scale AI products have been approved for purchase on AWS Marketplace for the U.S. Intelligence Community (ICMP). ICMP is a digital catalog that makes it easy for customers in the U.S. national security community to find, test, buy, and deploy software that runs on AWS.
Read more
March 5, 2025
Introducing Thunderforge: AI for American Defense
Scale is proud to have been awarded a prime contract by the Defense Innovation Unit (DIU) for Thunderforge - the DoD’s flagship program leveraging AI for military planning and wargaming. Thunderforge represents our commitment to advancing U.S. military capabilities. Following its initial deployment, Thunderforge will expand throughout combatant commands, leveraging Scale AI's agentic applications and GenAI evaluation expertise.
Read more
February 27, 2025
Scale AI & Center for Strategic and International Studies (CSIS) Introduce Foreign Policy Decision Benchmark
Scale AI, in collaboration with the Center for Strategic and International Studies (CSIS), is proud to introduce the Critical Foreign Policy Decision (CFPD) Benchmark—a pioneering effort to evaluate large language models (LLMs) on national security and foreign policy decision-making tendencies.
Read more
February 23, 2025
MCIT & Scale AI: Paving the Way for Qatar’s Digital Future
The Ministry of Communications and Information Technology (MCIT) and Scale AI, the leader in frontier AI solutions, are announcing a strategic, long-term partnership to drive Qatar’s digital transformation.
Read more
February 11, 2025
Jailbreaking to Jailbreak: A Novel Approach to Safety Testing
Scale researchers have discovered a groundbreaking method for AI safety testing called J2 (Jailbreaking to Jailbreak), where language models are taught to systematically test their own and other models' safety measures. This hybrid approach combines human-like strategic reasoning with automated scalability, achieving success rates of over 90% in vulnerability testing, nearly matching professional human red-teaming effectiveness. While highlighting significant advances in automated security testing, these findings also reveal important challenges for the future of AI safety.
Read more