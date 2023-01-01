Products
Data Engine
Collect, curate, and annotate data. Train models and evaluate. Repeat.
Trusted
The Best In The Business
Trusted by the world’s most ambitious AI teams.Meet our customers →
Quality
Scale can provide the core tenet of any dataset with high-quality labels from domain experts.
Cost Effective
Easily find, categorize, and fix model failures with Scale’s Data Engine. Then, optimize labeling spend with high-value curated data.
Scalability
Scale's data engine can support any ML project from lower-volume experiments to high-volume production projects. Scale up, or down, as needed.
Diversity
Scale delivers the greatest variety and diversity of data to help deliver the greatest value to your model performance.
CASE STUDIES
Learn More About Our Customers
OpenAI
How do you scalably maintain the quality of labels without having annotators check each other's work? Learn how OpenAI worked with Scale to build InstructGPT.
Toyota Research Institute
The Toyota Research Institute relies on Scale's Data Engine to annotate large volumes of training data without sacrificing quality to build a new approach to mobility.
Nuro
Nuro began its collaboration with Scale for 2D and 3D data labeling, progressing to HD map labeling, and today extending to dataset management and curation.
Harvard Medical School
The researchers at Harvard Medical School's Datta Lab leverage Scale's Data Engine to speed up their annotation process and advance their research.
FEATURES
Our Data Engine
RLHF
Powering the next generation of Generative AI.
Scale Generative AI Data Engine powers the most advanced LLMs and generative models in the world through world-class RLHF, data generation, model evaluation, safety, and alignment.
Data Labeling
The best quality data to fuel the best performing models.
Scale has pioneered in the data labeling industry by combining AI-based techniques with human-in-the-loop, delivering labeled data at unprecedented quality, scalability, and efficiency.
Data Curation
Unearth the most valuable data by intelligently managing your dataset.
Scale’s suite of dataset management, testing, model evaluation, and model comparison tools enable you to “label what matters.” Maximize the value of your labeling budget by identifying the highest value data to label, even without ground truth labels.
What is the data engine
The One-Stop-Shop For Building AI
After initial pre-training, create complex prompt-response pairs from scratch.
Apply human preferences to model outputs.
Use prompt injection techniques to find vulnerabilities.
Evaluate your model against a set of complex and diverse prompts to find weak points.
Data Inputs
Supported Annotation Types
Scale Text
- Classification
- Named Entity Recognition
- Transcription
Scale Audio
- Classification
- Transcription
Scale 3D Sensor Fusion
- Cuboid
Scale Video
- Bounding Box
- Classification
- Cuboid
- Ellipse (Multi-Geometry)
- Lines & Splines
- Point
- Polygon
- Segmentation
Scale Image
- Bounding Box
- Classification
- Cuboid
- Ellipse (Multi-Geometry)
- Lines & Splines
- Point
- Polygon
- Segmentation
RESOURCES
Learn More About The Data Engine
Why is ChatGPT so good?
OpenAI applied reinforcement learning with human feedback (RLHF) to enhance ChatGPT. Understand the role RLHF plays in enhancing large language models and how to implement it.
Guide: Data Annotation
The success of ML models is dependent on data and label quality. Read our authoritative guide to ensure you get the highest quality labels.
Guide: Computer Vision
Computer Vision focuses on developing systems that can process, analyze, and make sense of visual data. Read our guide to learn about how it works and top use cases by industry.
Guide: Training & Building Models
ML models take data as inputs and deliver a classification, a prediction, or some other indicator as an output. Read our guide to learn more about how to train models.
“One of the things we love about Scale is the fact that we can fully label the world. We can label 2D bounding boxes, 3D bounding boxes, but also semantic segmentation, including in 3D, to understand as much as possible, including scenarios we don’t foresee today.”
Adrien Gaidon
Machine Learning Lead, Toyota Research Institute
“Scale has made it easier for us to gather annotations at a good price point. The UI is simple to navigate, and the built in worker evaluation pipeline and batch options saves us time and helps enforce best practices so that we can get high-quality training data.”
Cassandra Ung
Software Engineer, Square
"Our collaboration with Scale began with more and more targeted labels for 2D and 3D data, progressed to HD map labeling, and today extends to dataset management and curation. Identifying and labeling edge cases helps us train more robust and generalizable models for our delivery robots in the real world."
Jack Guo
Head of Autonomy Platform, Nuro
"After training for years to do this research, it was frustrating how much time I was spending just annotating data. Working with Scale freed up my time to work on the parts of research that require my expertise."
Caleb Weinreb
Neuroscience Post-Doc, Harvard Medical School
“Scale already provided quality annotations to our perception team, so it was a natural extension to use their platform and solve adjacent pipeline problems of data selection and model performance debugging. The powerful search capabilities and easy-to-use tools made it easy for us to get started with our existing library of annotations.”
Oliver Monson
Sr. Manager, Data Operations, Velodyne LiDAR