Scale AI Research
Scale AI’s mission is to accelerate the development of AI applications. By advancing research, we aim to create AI systems capable of solving complex, human-level problems.


ENIGMAEVAL: A Benchmark of Long Multimodal Reasoning Challenges
February 13, 2025
Reasoning
Safety, Evaluation and Alignment
Read More

J2: Jailbreaking to Jailbreak
February 11, 2025
Safety, Evaluation and Alignment
Read More

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
January 29, 2025
Safety, Evaluation and Alignment
Reasoning
Read More

Humanity's Last Exam
January 23, 2025
Safety, Evaluation and Alignment
Reasoning
Read More

ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark
January 2, 2025
Safety, Evaluation and Alignment
Reasoning
Oversight
Read More

Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
October 11, 2024
Safety, Evaluation and Alignment
Read More

Balancing Cost and Effectiveness of Synthetic Data Generation Strategies for LLMs
September 29, 2024
Post-Training
Science of Data
Read More

Revisiting the Superficial Alignment Hypothesis
September 27, 2024
Post-Training
Read More

Planning In Natural Language Improves LLM Search For Code Generation
September 5, 2024
Post-Training
Read More