Blog

Company Updates & Technology Articles

January 22, 2026

Company

Scale’s Next Era: Building for 2026

Scale’s Next Era: Building for 2026

Scale CEO Jason Droege reflects on a record-breaking 2025 and shares how Scale is building reliable, production-ready AI systems for 2026.

Read more

January 13, 2026

Government

The Next Phase of U.S. AI Policy: Governance, Implementation, and Global Leadership

The Next Phase of U.S. AI Policy: Governance, Implementation, and Global Leadership

What it will take for the United States to move from AI experimentation to real governance, government-wide implementation, and lasting global leadership.

Read more

January 12, 2026

General

What's different about enterprise healthcare AI? | Human in the Loop Episode 17

What's different about enterprise healthcare AI? | Human in the Loop Episode 17

The team is kicking off 2026 like the rest of us: by focusing on health(care)! They discuss why adopting AI in healthcare is different from other enterprise AI initiatives and how leaders can account for those differences. And as always, they react to some of the internet's hottest takes on AI (healthcare edition).

Read more

January 8, 2026

Government

Securing America’s Decision Advantage

Securing America’s Decision Advantage

How agentic AI systems give the U.S. military decision advantage through faster planning, alerting, and command and control.

Read more

December 22, 2025

Research

MoReBench: Evaluating the Process of AI Moral Reasoning

MoReBench: Evaluating the Process of AI Moral Reasoning

MoReBench is a large-scale benchmark for evaluating AI moral reasoning beyond final outcomes. Instead of scoring answers alone, it assesses the intermediate reasoning traces models produce when navigating 1,000 morally ambiguous, real-world scenarios. Our findings show that moral reasoning is a distinct and underdeveloped capability, largely uncorrelated with performance on traditional math and coding benchmarks.

Read more

December 19, 2025

Government

The Agentic Era: Building the Foundation for Autonomous Mission Assurance

The Agentic Era: Building the Foundation for Autonomous Mission Assurance

Agentic AI marks a shift from reactive chatbots to autonomous mission partners. Government must adopt unified Agentic Infrastructure—combining resilient agent execution and governed AgentOps—to enable machine-speed decisions. Platforms like Scale’s SGP and Agentex deliver interoperable, durable, and accountable autonomy for mission assurance.

Read more

December 19, 2025

Research

Open-Sourcing MCP-Atlas: A Benchmark for Real Tool Use

Open-Sourcing MCP-Atlas: A Benchmark for Real Tool Use

We’re open-sourcing MCP-Atlas, including the dataset, evaluation environment, and updated results for a benchmark designed to measure how reliably AI agents use real tools. MCP-Atlas evaluates realistic, multi-step workflows that run against real Model Context Protocol servers, exposing where agents succeed—and where they still fail—when tool discovery, parameterization, and execution must work together.

Read more

December 19, 2025

General

We predicted the future of AI in 2025…were we right? Plus our 2026 predictions | Human in the Loop Episode 16

We predicted the future of AI in 2025…were we right? Plus our 2026 predictions | Human in the Loop Episode 16

Today on the podcast, the Enterprise team reviews AI predictions they made 6 months ago and sees if they came true or not. Spoiler: we got a lot wrong...but some unexpected things right! They conclude by predicting what we can expect to see in 2026 with enterprise AI.

Read more

December 18, 2025

Research

Real Speech Breaks AI (And What We're Doing to Fix It)

Real Speech Breaks AI (And What We're Doing to Fix It)

Audio MultiChallenge is a new benchmark designed to stress-test native Speech-to-Speech models on what actually makes voice hard: mid-sentence corrections, audio-only cues, instruction drift, and long-horizon self-consistency. By evaluating real human conversations rather than synthetic text-to-speech we uncover where current audio systems still fail, and what it will take to build voice agents that truly listen.

Read more

December 16, 2025

Research

Introducing the 2025 SEAL Models of the Year Awards

Introducing the 2025 SEAL Models of the Year Awards

2025 was a landmark year for AI reasoning, coding, and multimodal capabilities, but which models actually delivered the best results? After introducing 15 new benchmarks and publishing over 450 evaluations across the industry's top 50 models, the data is finally in. We are proud to announce the winners of the inaugural SEAL Models of the Year Awards. From the "Best Agentic Model" to the "People’s Favorite," find out which LLMs proved their dominance on the leaderboards and claimed the top spots in our rigorous analysis.

Read more