Research

Pioneering the Era of Experience: Where Human Data Meets Agentic Interaction

byonJune 26, 2025

The remarkable capabilities of today’s AI systems are already transforming how so many people and businesses interact with technology. Driven by data generated by humans, Scale and our global contributors have played a central role bringing us to this “Era of Human Data.” 

Pioneering AI researchers David Silver and Richard Sutton contend, however, that we are at an inflection point where AI is reaching the upper limits of what can be achieved via static human-generated data alone. They envision a new frontier: “The Era of Experience,” in which AI agents learn through interacting with the world. 

At Scale, we see both opportunities and challenges in the shift toward more autonomous AI agents that can generate their own insights. Successfully scaling this new era requires building the infrastructure, evaluation frameworks, and data paradigms necessary to realize it safely, responsibly, and for the benefit of humanity. 

In this post, we’ll explore what we believe is not the end of the era of human data, but rather a paradigm shift in the form factor of human data towards rich, interactive environments.

The Call for Experiential AI

Silver and Sutton's call for an “Era of Experience” stems from their argument that relying solely on existing human knowledge creates a fundamental bottleneck. They highlight that while training on human-generated datasets has enabled AI to replicate many human capabilities, there are larger breakthroughs ahead that we can only begin to conceptualize. 

They identify several key limitations with the current framework:

  • A slowing pace of progress when development relies largely on existing static human-generated datasets for training.

  • Approaching limits on extractable knowledge from existing static datasets, especially when attempting to push into new frontiers within critical domains.

  • Vast undiscovered insights that lie beyond our documented collective knowledge, rendering them inaccessible to systems trained only on such data.

This vision for the “Era of Experience” points to a fundamental evolution in AI beyond simply overcoming these bottlenecks. It suggests combining the deep, self-generated understanding that mastered complex simulated environments (like AlphaGo), with the intelligence developed through learning from static datasets. This synthesis is key to building agents that can significantly surpass human capabilities and discover truly novel insights.

What the “Era of Experience” is All About

To get closer to superhuman intelligence, we first must measure models against top human experts—the “superhumans” already among us. This is why, with the Center for AI Safety, we created Humanity’s Last Exam (HLE), a benchmark that significantly challenges the expert reasoning depth and knowledge of frontier models. 

But measuring against human experts isn't the final goal. To truly surpass them, as Silver and Sutton argue, AI must learn beyond the confines of pre-existing human knowledge, possibly even beyond human language. This new paradigm, the "Era of Experience," is one where AI generates its own understanding by interacting with the world itself.

This approach does not replace the foundational role of static data; rather, it adds a powerful mechanism to innovate, especially in areas where human-generated examples are scarce or non-existent. It's a shift from learning from us to learning with and beyond us, setting the stage for new kinds of human-AI collaboration.

How Scale is Building the “Era of Experience”

The saturation of traditional benchmarks and the insights from evals like HLE underscore that ushering in the "Era of Experience" requires creating the conditions for experiential learning to occur. We are uniquely positioned at Scale to build these environments. 

One of the most significant hurdles in operationalizing experiential learning is the "sparse rewards" problem. Real-world signals indicating success or failure can be infrequent, delayed, or difficult to interpret, making it challenging for current training methods. Further, even acquiring sparse rewards at all that are accurate can be challenging in professional domains. We tackle this in the following ways:

  • Creating Environments & Richer Feedback Loops: Our initiatives include developing and instrumenting digital, and eventually physical, environments that provide denser, more informative feedback signals for the most advanced professional application areas. Our latest research on multi-agent learning tackles this challenge. It does so by creating learning contexts where agents can learn behaviors based on grounded consequences and guidance from other models embedded into the same experiential environment.

  • Sophisticated Experiential Data Generation: Our expertise in large-scale data annotation, curation, and quality assurance can help capture, structure, and even define the value within complex interaction traces, turning raw experience into learnable data. This process, guided by targeted human insight, facilitates autonomous learning.

Additionally, our constantly evolving evaluation suite is designed to guide AI development toward the capabilities needed for this new era.

  • EnigmaEval (Puzzle Solving & Creative Reasoning): To surpass human capabilities, agents will likely need to develop "non-human languages" of thought and novel problem-solving strategies. EnigmaEval provides a unique pathway to assess and foster these creative reasoning skills.

  • Fortress (Frontier Risk Evaluation for National Security and Public Safety): A national security and public safety benchmark that evaluates the critical trade-off between a model's safeguards against misuse and its utility, using paired adversarial and benign prompts to measure both risk and over-refusal.

  • Humanity's Last Exam (HLE): While it showcases current limitations, it also defines a clear target for what AI needs to learn beyond readily available human knowledge, pushing towards the deep reasoning required by experiential agents.

  • MASK (Model Alignment between Statements and Knowledge): As agents interact autonomously and learn from experience, ensuring their honesty even when pressured to be dishonest is integral for safety and trust. 

  • MultiChallenge (Realistic Multi-turn Conversation): The "streams of experience" Silver and Sutton describe require agents to maintain context, coherence, and memory over extended interactions. MultiChallenge pushes models in this direction, evaluating their fitness for long-term, dynamic engagement.

  • VISTA (Visual Language Understanding): For agents to have "richly grounded actions and observations," they must deeply comprehend and reason about multimodal information. VISTA’s approach to complex visual tasks helps ensure models are developing nuanced understanding of the visually observable world.

See how we evaluate and rank models on our SEAL Leaderboards.

Four Pillars of Experiential AI

Silver and Sutton’s vision for the "Era of Experience," is built upon interconnected shifts in how artificial intelligence will learn and interact with the world. Realizing each of these pillars presents unique challenges and opportunities, towards which Scale is directing expertise and efforts:

  1. Lifelong Streams of Experience

AI agents will learn from continuous, lifelong streams of experience rather than isolated interactions. These systems:

  • Accumulate knowledge throughout their "lifetime," enabling ongoing adaptation

  • Focus on long-term goals like improving user health over months or facilitating year-long language learning, valuing actions for their contribution to broader objectives

  • Contrast with contemporary AI systems that remain stateless between interactions or optimizes for single-turn exchanges, lacking the ability to strategize based on future consequences

To support this, our MultiChallenge benchmark tests for the crucial coherence and memory needed in extended conversations, representing a step towards evaluating agents over longer experiential streams. Our expertise in experiential data management will also be key to tracking and learning from these lifelong interactions.

  1. Richly Grounded Actions and Observations

AI agents will perceive and act, engaging with their environments multimodally and autonomously, far beyond current text-based communication, through:

  • Diverse sensors and effectors, like natural intelligence, rather than human dialogue alone

  • Direct interaction with digital and physical worlds—controlling interfaces, executing code, operating laboratory equipment

  • Active exploration enabling discovery of novel solutions through both human-collaborative and machine-native operations

Enabling this requires advanced multimodal understanding. Our VISTA benchmark pushes the boundaries of visual reasoning essential for such agents. Additionally, our efforts to help build and instrument rich digital environments are pivotal for facilitating and measuring these complex, grounded interactions.

  1. Environment-Derived Grounded Rewards

These experiential agents will center on environment-derived grounded rewards; learning signals will arise directly from real-world consequences rather than relying solely on human prejudgment. This approach means:

  • The "impenetrable ceiling" that often limits current AI systems (when rewards are based only on human pre-judgment of actions) can be surpassed, enabling AI to move beyond existing human knowledge.

  • Systems can rely on “grounded rewards” that stem from diverse real-world outcomes (e.g., in scientific research or personalized health), enabling discoveries beyond human anticipation.

  • Human feedback will remain valuable when reflecting genuine consequences 

  • Alignment can be maintained via user-guided flexible reward functions, which can allow a small amount of human-generated data to enable a lot of autonomous learning.

Operationalizing these grounded rewards, especially when environmental signals are sparse, is a challenge we are tackling by helping to design instrumented environments and develop novel reward mechanisms. Ensuring agent honesty through evaluations like MASK is vital for the integrity of any system learning from environmental feedback. Similarly, frameworks like Fortress are essential for verifying that as agents learn to achieve goals, they still adhere to critical safeguards against misuse.

  1. Experience-Grounded Planning and Reasoning

Agents will be enabled to develop novel, potentially non-human-like cognitive modes learned through direct environmental interaction, where:

  • Human language and thought patterns may not be optimal frameworks (as seen with AlphaProof's novel mathematical approaches)

  • Systems do not have to rely on imitating human cognition, derisking inheriting biases 

  • Systems will be grounded in real-world interaction: forming hypotheses, testing through action, and updating models based on outcomes

  • World models provide planning capacity through simulating potential futures

Fostering and evaluating these new forms of reasoning calls for innovative approaches. Our EnigmaEval benchmark assesses creative problem-solving that may diverge from human patterns, while HLE tests reasoning at the frontiers of knowledge. Our capabilities in agent-based testing within simulated environments will further support the development of such grounded reasoning.

Understanding these foundational elements of Silver and Sutton’s vision is essential to exploring humanity's evolving role in guiding, collaborating with, and ensuring the safety and benefit of these powerful emerging systems.

A Shifting Landscape

Of course, this new era presents a transformed safety landscape with both unique challenges and new opportunities for alignment. The challenges are clear: autonomous agents pursuing long-term goals may offer fewer intervention points, while their novel reasoning could defy existing alignment frameworks. We believe these risks can be mitigated through thoughtfully designed environments and rigorous evaluation protocols.

However, as Silver and Sutton note, experiential learning also introduces powerful safety advantages. Unlike static systems, these agents can adapt to real-world changes. Ensuring they adapt correctly requires robust mechanisms for sensing human feedback. Additionally, their reward functions can be incrementally adjusted, an important safety lever we are actively exploring to ensure real-world feedback leads to meaningful and safe adjustments. Even the fact that learning from experience takes time is a natural constraint, offering opportunities for  human co-evolution.

Scaling this new era responsibly means developing strong ethical frameworks and a societal commitment to navigating these advancements thoughtfully. The journey is not just about building more powerful AI, but safely integrating this new form of intelligence. Scale contributes to this mission by:

  • Championing and developing cutting-edge safety evaluations

  • Partnering with organizations like the Center for AI Safety (CAIS) and U.S. AI Safety Institute

  • Creating testing environments that allow for safer exploration of AI capabilities

  • Providing the tools and data expertise that enable the entire ecosystem to build and deploy more human-centric AI systems

A Step Forward for AI

This paradigm shift seen in the “Era of Experience” calls for our community to consider anew our roles as guides and collaborators. For Scale, this means continuing our mission to pioneer the robust evaluation methodologies, data frameworks, and interactive environments necessary for these experiential systems to develop safely and effectively. The expertise we have developed will remain essential, but applied in new contexts: creating the most valuable environments for agents to learn from, developing metrics that can evaluate increasingly autonomous systems, and building frameworks that balance learning from experience with appropriate human guidance.


The future of your industry starts here.