Company Updates & Technology Articles
November 14, 2025
AI excels on academic tests, but it fails at real professional jobs. That's the stark finding from PRBench, our new benchmark series designed to move AI testing out of the lab and into the real world. We're launching the series with two of the most complex domains: Law and Finance. Using 1,100 high-stakes tasks sourced from 182 professionals, we tested how today's frontier models handle the nuanced, high-stakes reasoning that defines these fields. While models are great at following instructions, they fail at the expert judgment, auditable reasoning, and deep diligence required for tasks with real economic consequences.
November 13, 2025
We are open-sourcing the agentic infrastructure layer in Scale GenAI Platform: Agentex. Our Enterprise team sits down to demo Agentex and share how it’s used across our enterprise customers today. We also dive into our decision to open-source and our hopes for collaborating with the community.
November 7, 2025
While general-purpose AI models are powerful, they often fail to deliver on complex, specialized enterprise workflows that use private data. We share results from our real world work in the insurance and legal industries, highlighting how our RL-tuned agents outperformed leading LLMs and dive into how we achieved these performance gains.
November 5, 2025
Scale AI is expanding offices in New York, London, Washington D.C., and St. Louis to support growth, innovation, and reliable AI development worldwide.
October 29, 2025
Talal AlBakr joins Scale AI to build production-ready AI applications that power Saudi Arabia’s Vision 2030.
Can AI actually automate complex, professional jobs? The new Remote Labor Index (RLI) from Scale and the Center for AI Safety (CAIS) provides the first data-driven answer. By testing AI agents against 240 real-world, paid freelance projects, the RLI found that the best-performing agents could only successfully automate 2.5% of them. This new benchmark reveals a critical gap between AI's generative skill and the end-to-end reliability required for professional work, showing the immediate impact is augmentation, not mass automation.
October 28, 2025
Scale AI Partners with Korea’s AI Safety Institute to Advance Global AI Evaluation and Governance
October 27, 2025
Your cybersecurity playbook is obsolete. In the age of AI, the greatest risks aren't traditional code exploits but unpredictable model behaviors—from prompt injections and data leakage to emergent misuse. Drawing on insights from live red teaming exercises with members of Congress, NATO, and the UK Parliament, AI security expert David Campbell explains why we must treat the model itself as the new attack surface. This post unveils an enterprise playbook for proactive AI red teaming, moving beyond static checks to continuously test systems like an adversary. Learn how to map, score, and measure AI risks to get ahead of the threat before an incident occurs.
October 22, 2025
As part of our Pledge to America’s Youth, Scale AI is helping bring AI literacy into classrooms across America, starting in St. Louis.
October 16, 2025
Today on the podcast, the team is talking about the latest with enterprise agents including the problem you're probably not thinking about but should: agentic infrastructure.