Blog
Company Updates & Technology Articles
January 22, 2026
Scale’s Next Era: Building for 2026
Scale CEO Jason Droege reflects on a record-breaking 2025 and shares how Scale is building reliable, production-ready AI systems for 2026.
Read more
January 13, 2026
The Next Phase of U.S. AI Policy: Governance, Implementation, and Global Leadership
What it will take for the United States to move from AI experimentation to real governance, government-wide implementation, and lasting global leadership.
Read more
January 12, 2026
What's different about enterprise healthcare AI? | Human in the Loop Episode 17
The team is kicking off 2026 like the rest of us: by focusing on health(care)! They discuss why adopting AI in healthcare is different from other enterprise AI initiatives and how leaders can account for those differences. And as always, they react to some of the internet's hottest takes on AI (healthcare edition).
Read more
January 8, 2026
Securing America’s Decision Advantage
How agentic AI systems give the U.S. military decision advantage through faster planning, alerting, and command and control.
Read more
December 22, 2025
MoReBench: Evaluating the Process of AI Moral Reasoning
MoReBench is a large-scale benchmark for evaluating AI moral reasoning beyond final outcomes. Instead of scoring answers alone, it assesses the intermediate reasoning traces models produce when navigating 1,000 morally ambiguous, real-world scenarios. Our findings show that moral reasoning is a distinct and underdeveloped capability, largely uncorrelated with performance on traditional math and coding benchmarks.
Read more
December 19, 2025
The Agentic Era: Building the Foundation for Autonomous Mission Assurance
Agentic AI marks a shift from reactive chatbots to autonomous mission partners. Government must adopt unified Agentic Infrastructure—combining resilient agent execution and governed AgentOps—to enable machine-speed decisions. Platforms like Scale’s SGP and Agentex deliver interoperable, durable, and accountable autonomy for mission assurance.
Read more
December 19, 2025
Open-Sourcing MCP-Atlas: A Benchmark for Real Tool Use
We’re open-sourcing MCP-Atlas, including the dataset, evaluation environment, and updated results for a benchmark designed to measure how reliably AI agents use real tools. MCP-Atlas evaluates realistic, multi-step workflows that run against real Model Context Protocol servers, exposing where agents succeed—and where they still fail—when tool discovery, parameterization, and execution must work together.
Read more
December 19, 2025
We predicted the future of AI in 2025…were we right? Plus our 2026 predictions | Human in the Loop Episode 16
Today on the podcast, the Enterprise team reviews AI predictions they made 6 months ago and sees if they came true or not. Spoiler: we got a lot wrong...but some unexpected things right! They conclude by predicting what we can expect to see in 2026 with enterprise AI.
Read more
December 18, 2025
Real Speech Breaks AI (And What We're Doing to Fix It)
Audio MultiChallenge is a new benchmark designed to stress-test native Speech-to-Speech models on what actually makes voice hard: mid-sentence corrections, audio-only cues, instruction drift, and long-horizon self-consistency. By evaluating real human conversations rather than synthetic text-to-speech we uncover where current audio systems still fail, and what it will take to build voice agents that truly listen.
Read more
December 16, 2025
Introducing the 2025 SEAL Models of the Year Awards
2025 was a landmark year for AI reasoning, coding, and multimodal capabilities, but which models actually delivered the best results? After introducing 15 new benchmarks and publishing over 450 evaluations across the industry's top 50 models, the data is finally in. We are proud to announce the winners of the inaugural SEAL Models of the Year Awards. From the "Best Agentic Model" to the "People’s Favorite," find out which LLMs proved their dominance on the leaderboards and claimed the top spots in our rigorous analysis.
Read more