Please rotate your device for the best experience.

Log inBook demoBook demo
A stylized, black-and-white grain-textured illustration featuring several rectangular and arched portals floating in a dark space. Each portal opens into a bright, ethereal world filled with soft clouds and minimalist landscapes. A central arched doorway glows intensely, casting a beam of light across the scene.
← BlogProduct

The Next Frontier of Data Training: RL Environments

By Chetan Rane·February 27, 2026·5 min read

AI is evolving from conversational interfaces into agents that take actions, from navigating software and coordinating across tools to carrying out complex, multi-step workflows that mirror real professional work. As this shift accelerates, the way models are trained is evolving as well. Frontier models increasingly need to learn through trial and error in realistic simulated environments rather than relying solely on static datasets or human preference feedback. AI labs are turning to reinforcement learning environments to safely teach agents how to plan, act, and improve without the risk of deploying unproven behavior in live production systems.

Over the past several months, we’ve been working with leading model developers to build safe, high-fidelity environments designed specifically for training and evaluating agent behavior across capabilities like tool use, computer use, and coding. Today, nearly half of all of our new data training projects involve reinforcement learning environments, reflecting how quickly model development is shifting towards training agents in real-world settings. That’s why we’re introducing Scale RL Environments, a collection of simulated worlds where agents can be trained and evaluated on a wide range of tool use and computer use workflows. These environments span across consumer, enterprise, and domain-specific use cases, allowing model builders to train and test agents in highly realistic systems without the risk, instability, or scale constraints of production software.

Our RL Environments offer:

  • Realistic data universes: Our environments replicate real-world workflows and are populated with data mimicking real business and consumer workflows. Domain experts curate artifacts that capture the complexity, ambiguity, and edge cases of real professional work.
  • Production-grade infrastructure: Run multiple training and evaluation jobs in parallel on your own setup, with seamless integration into existing agent execution frameworks and minimal engineering lift. Receive custom support from Scale’s engineering team to fit your existing environment setup.
  • Expert-designed training data: Train on tasks designed by domain experts, each paired with both process and outcome verifiers.
A side-by-side comparison titled "Tool Use" versus "Computer Use." On the left, icons for Google Maps, Git, Notion, and Local Filesystem represent specific API-driven tool integrations. On the right, a laptop illustration shows a cursor navigating between a spreadsheet and a digital receipt, representing a general AI capability to interact with a computer interface like a human user.

Inside Scale RL Environments

Each RL Environment models the workflows and task structures found in real applications and tools, while providing the control and repeatability required for reinforcement learning and evaluation. An environment pairs simulated real world workflows with defined data and system state, allowing it to be reset and inspected between runs.

A diagram illustrating a reinforcement learning (RL) loop labeled "Learning Mechanism." An "AI Agent" (represented by a glowing sphere) sends an "Action" to an "RL Environment" (represented by a laptop). In return, the environment provides an "Observation" and a "Reward Signal" back to the agent, creating a continuous feedback cycle.
RL Environments generate structured experience - trajectories of decisions, actions, and reward - that models learn from to improve agentic capabilities.

Within each environment, tasks define concrete goals and starting conditions. Agent interactions are captured as trajectories that record every action, state change, and intermediate outcome. Results are evaluated using expert-designed verifiers that check for real, measurable changes in the system—not just whether an agent produced plausible text.

Because environments are fully controlled, teams can run experiments in parallel, replay trajectories, inspect failures, and iterate quickly. RL Environments integrate directly into existing agent execution and training pipelines. Teams can use Scale’s built-in executor or their own frameworks, bringing existing models, prompts, and training loops while relying on Scale for the environments, data, and evaluation scaffolding.

Why Scale RL Environments Work

Scale’s RL Environments reflect the conditions of real work: incomplete data, conflicting records, and tasks that span multiple tools and interfaces. They require agents to maintain context, adapt mid-task, and recover from errors.

Each task begins from a known system configuration, and agent behavior is evaluated using system-level checks that measure real outcomes in the environment. This produces consistent, repeatable signals that make it possible to compare approaches and improve agents over time. Because those signals are reliable, teams can train agents at scale without relying on live systems.

So far, we’ve run a series of controlled training and evaluation experiments using open-source models across our off-the-shelf agent datasets and RL environments. In these runs, models trained within Scale RL Environments consistently showed measurable performance improvements on the corresponding public benchmarks, validating both the effectiveness of the environments and the quality of the training signal.

An interface showing a Gemini 3 Pro AI agent executing a complex calendar task. The screen is divided into sections: a prompt requesting a meeting scheduled around specific constraints, an "Execution Log" showing the agent's step-by-step reasoning and a screenshot of the calendar, and a "Verifier" panel containing JSON code to validate the agent's actions. The agent is currently on "Step 3," with its active goal being to add a specific person to the calendar view.
Track the model’s steps and actions in the environment, alongside results and verifiers to evaluate performance.

Getting Started

Teams can start exploring RL Environments through inspecting simulated applications, reviewing expert-curated artifacts, and observing how agents interact with realistic systems. For training and evaluation at scale, environments can be run programmatically using Docker and existing agent execution frameworks.

To see how we enable agent capabilities, explore Scale RL Environments.

Ready to break through your data bottleneck?

Scale's team will match your project to the right experts, fast.

Talk to our experts
Scale AI's logo

Products

Scale data engineScale GenAI PlatformScale Donovan

Solutions

EnterpriseInsuranceHealthcareUS Public SectorGlobal Public Sector

Company

AboutCareersSecurityTermsPrivacyModern Slavery Statement

Resources

BlogContact UsEventsDocumentation

Guides

Data LabelingML Model TrainingDiffusion ModelsGuide to AI for eCommerceComputer Vision ApplicationsLarge Language Models

Reliable AI for the world’s most important decisions

Manage your 

Copyright © 2026 Scale AI, Inc. All rights reserved

Terms of Use & Privacy Policy