Scale AI Research
Scale AI’s mission is to accelerate the development of AI applications. By advancing research, we aim to create AI systems capable of solving complex, human-level problems.


FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
June 18, 2025
Safety, Evaluation and Alignment
Read More

Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models
June 16, 2025
Reasoning
Read More

Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards
June 13, 2025
Agents
Post-Training
Reasoning
Read More

A Red Teaming Roadmap Towards System-Level Safety
June 5, 2025
Safety, Evaluation and Alignment
Read More

Assessing Robustness to Spurious Correlations in Post-Training Language Models
May 9, 2025
Post-Training
Science of Data
Read More

Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking
March 14, 2025
Reasoning
Read More

Critical Foreign Policy Decisions (CFPD)-Benchmark: Measuring Diplomatic Preferences in Large Language Models
March 8, 2025
Safety, Evaluation and Alignment
Read More

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
March 5, 2025
Safety, Evaluation and Alignment
Read More

ENIGMAEVAL: A Benchmark of Long Multimodal Reasoning Challenges
February 13, 2025
Reasoning
Safety, Evaluation and Alignment
Read More