Scale AI Blog
Matthew Siegel
01
SWE Atlas is Complete: Measuring Coding Agents Across the Engineering Loop
ResearchMay 7, 2026
02HiL-Bench: Your Agent is Smart. It Just Won't Ask for Help.
ResearchApr 20, 2026
03Voice Showdown: The First Arena for Voice AI
ProductMar 20, 2026
04Can Coding Agents Become Engineers? We’re Finding Out.
ResearchMar 4, 2026
05Crumbling Under Pressure: PropensityBench Reveals AI’s Weaknesses
ResearchNov 25, 2025
06The Remote Labor Index: Measuring the Automation of Work
ResearchOct 29, 2025
07Enterprise Reinforcement Learning with Rubrics as Rewards
ResearchOct 7, 2025
08Smoothing Out LLM Variance for Reliable Enterprise Evals
ResearchSep 14, 2025
09TutorBench: Grading the Next Generation of AI Tutors
ResearchSep 12, 2025
10Using Rubrics to Build Better Models
ResearchSep 2, 2025
11AI Doesn’t Live in Text Alone
EngineeringAug 19, 2025
12New Benchmarks Envision the Future of AI in Healthcare
ResearchAug 4, 2025
13The AI Risk Matrix: Evolving AI Safety and Security for Today
ResearchAug 4, 2025
14The Future is Multilingual: Scale's New Evaluation Benchmark
ResearchJul 23, 2025
15I’m Afraid I Can’t Let You Do That
ResearchJul 1, 2025
16The Future of AI Learning Environments: Verifiable Reward + Multi-Agent Interaction
ResearchJun 23, 2025