Scale AI Blog

Matthew Siegel

Jun 2026

Can AI Agents Do the Work of Drug Discovery?

Research
Jun 2026

When AI Agents Ask, Attackers Can Answer

Jun 2026

Deployment Lessons from Global Governments

Global
May 2026

SWE Atlas is Complete: Measuring Coding Agents Across the Engineering Loop

Apr 2026

HiL-Bench: Your Agent is Smart. It Just Won't Ask for Help.

Mar 2026

Voice Showdown: The First Arena for Voice AI

Testing & Evals
Mar 2026

Can Coding Agents Become Engineers? We’re Finding Out.

Research
Dec 2025

MoReBench: Evaluating the Process of AI Moral Reasoning

Research
Nov 2025

Crumbling Under Pressure: PropensityBench Reveals AI’s Weaknesses

Oct 2025

The Remote Labor Index: Measuring the Automation of Work

Research
Oct 2025

Enterprise Reinforcement Learning with Rubrics as Rewards

Enterprise
Sep 2025

Smoothing Out LLM Variance for Reliable Enterprise Evals

Research
Sep 2025

TutorBench: Grading the Next Generation of AI Tutors

Research
Sep 2025

Using Rubrics to Build Better Models

Research
Aug 2025

AI Doesn’t Live in Text Alone

Research
Aug 2025

New Benchmarks Envision the Future of AI in Healthcare

Enterprise
Aug 2025

The AI Risk Matrix: Evolving AI Safety and Security for Today

Research
Jul 2025

The Future is Multilingual: Scale's New Evaluation Benchmark

Research
Jul 2025

I’m Afraid I Can’t Let You Do That

Testing & Evals
Jun 2025

The Future of AI Learning Environments: Verifiable Reward + Multi-Agent Interaction

Research