Scale AI Research
Scale AI’s mission is to accelerate the development of AI applications. By advancing research, we aim to create AI systems capable of solving complex, human-level problems.


Reliable Weak-to-Strong Monitoring of LLM Agents
August 26, 2025
Safety, Evaluation and Alignment
Oversight
Read More

Search-Time Data Contamination
August 13, 2025
Safety, Evaluation and Alignment
Oversight
Read More

MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
July 23, 2025
Reasoning
Safety, Evaluation and Alignment
Read More

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
July 23, 2025
Science of Data
Post-Training
Read More

WebGuard: Building a Generalizable Guardrail for Web Agents
July 21, 2025
Agents
Safety, Evaluation and Alignment
Read More

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
July 15, 2025
Reasoning
Oversight
Safety, Evaluation and Alignment
Read More

Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning
June 28, 2025
Post-Training
Reasoning
Read More

FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
June 18, 2025
Safety, Evaluation and Alignment
Read More

Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models
June 16, 2025
Reasoning
Read More