Company Updates & Technology Articles
January 8, 2026
How agentic AI systems give the U.S. military decision advantage through faster planning, alerting, and command and control.
December 22, 2025
MoReBench is a large-scale benchmark for evaluating AI moral reasoning beyond final outcomes. Instead of scoring answers alone, it assesses the intermediate reasoning traces models produce when navigating 1,000 morally ambiguous, real-world scenarios. Our findings show that moral reasoning is a distinct and underdeveloped capability, largely uncorrelated with performance on traditional math and coding benchmarks.
December 19, 2025
Agentic AI marks a shift from reactive chatbots to autonomous mission partners. Government must adopt unified Agentic Infrastructure—combining resilient agent execution and governed AgentOps—to enable machine-speed decisions. Platforms like Scale’s SGP and Agentex deliver interoperable, durable, and accountable autonomy for mission assurance.
We’re open-sourcing MCP-Atlas, including the dataset, evaluation environment, and updated results for a benchmark designed to measure how reliably AI agents use real tools. MCP-Atlas evaluates realistic, multi-step workflows that run against real Model Context Protocol servers, exposing where agents succeed—and where they still fail—when tool discovery, parameterization, and execution must work together.
Today on the podcast, the Enterprise team reviews AI predictions they made 6 months ago and sees if they came true or not. Spoiler: we got a lot wrong...but some unexpected things right! They conclude by predicting what we can expect to see in 2026 with enterprise AI.
December 18, 2025
Audio MultiChallenge is a new benchmark designed to stress-test native Speech-to-Speech models on what actually makes voice hard: mid-sentence corrections, audio-only cues, instruction drift, and long-horizon self-consistency. By evaluating real human conversations rather than synthetic text-to-speech we uncover where current audio systems still fail, and what it will take to build voice agents that truly listen.
December 16, 2025
2025 was a landmark year for AI reasoning, coding, and multimodal capabilities, but which models actually delivered the best results? After introducing 15 new benchmarks and publishing over 450 evaluations across the industry's top 50 models, the data is finally in. We are proud to announce the winners of the inaugural SEAL Models of the Year Awards. From the "Best Agentic Model" to the "People’s Favorite," find out which LLMs proved their dominance on the leaderboards and claimed the top spots in our rigorous analysis.
December 15, 2025
A first of its kind study from Oxford Economics shows how data annotation is an important part of AI innovation and creating flexible earning opportunities for people across the US. Meet the contributors behind the industry and learn about their impact on the economy.
December 12, 2025
Scale AI outlines how exporting the full U.S. AI tech stack can secure global leadership, standards, and economic competitiveness.
December 9, 2025
We often discuss "AI Risk" as if it were a single, shapeless shoggoth. But the truth is that risk comes from specific sources, each requiring a different defense. This article dismantles the monolith, categorizing the six distinct vectors of danger: Adversaries, Unforced Errors, Misaligned Goals, Dependencies, Societal Impact, and Emergent Behavior. Learn to distinguish between these threats so you can move from panic to precise preparation.