Scale AI Research
Scale AI’s mission is to accelerate the development of AI applications. By advancing research, we aim to create AI systems capable of solving complex, human-level problems.


A Red Teaming Roadmap Towards System-Level Safety
June 5, 2025
Safety, Evaluation and Alignment
Read More

Assessing Robustness to Spurious Correlations in Post-Training Language Models
May 9, 2025
Post-Training
Science of Data
Read More

Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking
March 14, 2025
Reasoning
Read More

Critical Foreign Policy Decisions (CFPD)-Benchmark: Measuring Diplomatic Preferences in Large Language Models
March 8, 2025
Safety, Evaluation and Alignment
Read More

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
March 5, 2025
Safety, Evaluation and Alignment
Read More

ENIGMAEVAL: A Benchmark of Long Multimodal Reasoning Challenges
February 13, 2025
Reasoning
Safety, Evaluation and Alignment
Read More

J2: Jailbreaking to Jailbreak
February 11, 2025
Safety, Evaluation and Alignment
Read More

ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms
February 10, 2025
Safety, Evaluation and Alignment
Read More

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
January 29, 2025
Safety, Evaluation and Alignment
Reasoning
Read More



