by Ben Levin, Aseem Doriwala and Garrett Matsuda

Leveraging human demonstration data offers a more scalable path for real-world Physical AI development than robot teleoperation. However, the training and data recipe needed for sustainable robot improvement has been unclear.
Scale AI has partnered with leading academic and industry labs to introduce EgoVerse, a large-scale dataset and training framework for training Physical AI from egocentric data. EgoVerse both releases high-quality training data and open-sources the learning pipeline for sustainable improvement across robots.
The work is a joint effort across leading academic institutions including Georgia Tech, Stanford, ETH Zurich, MIT, UC San Diego, and Meta among others.

The EgoVerse study provides one of the first large, cross-lab validations that combining human and robot data improves robot performance. The EgoVerse architecture uses high-fidelity human data to unlock co-training: Human data and robot data are part of a unified action space the model learns from.
Key findings include:
These results were replicated across multiple robot platforms and independent labs, underscoring that the findings are not vendor-, environment- or system-specific. This points to a broader shift in how robot learning systems should be built: scale alone is insufficient without the right abstractions and infrastructure to make that data usable.
“Scale’s diverse egocentric data offers an ideal dataset to train Physical AI models. The diversity, dense language annotations, and accurate 3D hand tracking have enabled strong learning signals and consistent improvements across a wide variety of challenging tasks in our evaluations.” – Danfei Xu, Professor at Georgia Tech
Scale’s Egocentric Dexterity data captures high-fidelity data using stereo cameras and IMUs which can be used in co-training together with robot data. Through a proprietary machine learning pipeline, we deliver:

Scale contributed high-volume, diverse hours of human demonstration data, along with infrastructure and engineering support to enable large-scale data aggregation and usability across the consortium.
The EgoVerse dataset, framework, and research findings are available now at https://egoverse.ai/. Learn more about Scale AI is developing the next generation of robots here: https://scale.com/physical-ai