Scale at ICLR 2026

Scale AI is laying the foundation for AI innovation, serving as the engine for building, deploying, and evaluating AI.

View our Research
Scale at ICLR 2026

Join our Sessions at ICLR

Scale AI's mission is to develop reliable AI systems for the world's most important decisions.

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

Presentation Details

  • Thursday, April 23, 2026, 10:30 AM - 1:00 PM BRT

  • Pavilion 3 P3-#1416

MoReBench

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

Presentation Details

  • Thursday, April 23, 2026, 3:15 PM – 5:45 PM BRT

  • Pavilion 4 P4-#4202

ResearchRubrics

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

Presentation Details

  • Thursday, April 23, 2026, 3:15 PM – 5:45 PM BRT

  • Pavilion 3 P3-#1004

Reliable Weak-to-Strong Monitoring of LLM Agents

Reliable Weak-to-Strong Monitoring of LLM Agents

  • Poster: Sat, Apr 25, 2026 • 10:30 AM – 1:00 PM BRT; Pavilion 4 P4-#5018

  • Oral Presentation: Sat, Apr 25, 2026 • 3:15 PM – 3:25 AM BRT; Room 204 A/B

New Frontier of AI: Eval, RL, and What's Next

New Frontier of AI: Eval, RL, and What's Next

  • Fri, Apr 24, 2026 • 12:45 PM - 1:30 PM BRT

  • Room 202C

  • Speaker: Bing Liu, Head of Research

Agents in the Wild: Safety, Security, and Beyond

Agents in the Wild: Safety, Security, and Beyond

  • Sun, Apr 26, 2026 • 9:00 AM – 5:00 PM BRT

  • Room 204 A/B

  • Speaker: Bing Liu, Head of Research

Lifelong Agents: Learning, Aligning, Evolving

Lifelong Agents: Learning, Aligning, Evolving

  • Sun, Apr 26, 2026 • 9:00 AM – 5:00 PM BRT

  • Room 101C

  • Speaker: Mike Lee, Research Scientist

PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach

PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Product Overview

Frontier Data

Scale's frontier research produces specialized training data for the next generation of AI systems.

Agent Data

Agent Data

Training data that enables AI to interact with computers like humans do-teaching models to use tools, navigate interfaces, and execute real-world tasks through direct computer interaction.

Complex Reasoning Data

Complex Reasoning Data

Datasets that teach LLMs to solve complex problems through structured, step-by-step thinking-enabling models to break down challenging tasks and validate their reasoning.

PRODUCT OVERVIEW

Generative AI Data Engine

Enables rapid creation of tailored, high-quality datasets curated by vetted subject matter experts to train the world's most advanced models.

Generative AI Data Engine
Build AI

Improve Your Models By Improving Your Data

High-quality training data, curated by subject matter experts, is crucial for developing powerful, accurate, Generative AI models.

Improve Your Models By Improving Your Data

View Open Positions