misson flagOur Mission

API for Ground Truth Data

The age of machine learning is a new phase of computing.Powered by a dynamic research community, computers can now recognize images and audio, translate languages, generate realistic text, and beat humans at games.Machine learning is likely the most significant technological shift happening between 2010 and 2030. Aside from a few innovative products from large tech companies, ML has not yet made a tremendous impact.

Scale's logo

Labeled Data Is Key

Labeled data is the key bottleneck to the growth of the machine learning industry. In fact, labeled data is even more essential than algorithms.

ImageNet is a repository of 14 million labeled images in more than 20,000 categories.

By 2011, AlexNet, the first modern neural network, was the top performer on the ImageNet leaderboard. This kicked off the deep learning craze.


Data labeling is not only practically important, it is also philosophically important to the field. Machine learning is a form of metaprogramming—the developer doesn't directly write the program; the developer writes a program which itself writes the program.

The developer provides a rough framework for what the program should look like (usually a neural network), and what its goal should be (usually a labeled dataset), and that spits out a program that is nonsensical to humans, but is better than any program a human could ever write.

It's a bit demoralizing—as Andrej Karpathy once tweeted, gradient descent writes better code than you. I'm sorry.

Framework + Goal

Humans can influence both the framework and the goal. The framework is the machine learning architecture and algorithm.

The goal is the labeled dataset. This is the ceiling for how good a model can ever be. The labeled dataset directly programs the final model.

It gets “compiled” into the model via back propagation. As more and more of the software of the world is written with machine learning software, the amount of data required to power these systems will also grow.

Explore Open Datasets

accelerate hexaWhat we do

Accelerate The Development of AI

Machine learning replaces a specialized resource constraint (software engineers) with a commoditized one (data labelers). This significantly accelerates technological progress on a global scale. The growth rate of software will accelerate over the next decade as more software is created by machine learning.

Traditional Dev Cycle

Traditional Dev Cycle

Early & Influential

It is early in the age of machine learning. Deeply impactful technology takes time to gestate.

Google was founded 15 years after the invention of the internet. iPhone was invented 24 years after the invention of the internet. It’s only been 8 years since AlexNet was first launched.

Being early makes our work exciting. We have the opportunity to be profoundly influential.

AI Dev Cycle

Traditional Dev Cycle


Building infrastructure is more durable than building applications—Scale will amplify the whole impact of machine learning if we succeed, which will reach further than any single application. In this way, we will bend the curve of technology, and therefore the curve of humanity.

Our mission is critically important, and needs to happen now. By structurally increasing the coefficient of growth, we are enabling a new age of software.

accelerate hexaOur Investors

We're fortunate to have incredible investors.

Individual Investors

  • Greg Brockman

    Greg Brockman

  • Charlie Cheever

    Charlie Cheever

  • Adam d’Angelo

    Adam d’Angelo

  • Justin Kan

    Justin Kan

  • Mike Krieger

    Mike Krieger

  • Nat Friedman

    Nat Friedman

  • Drew Houston

    Drew Houston

  • Jessica McKellar

    Jessica McKellar

  • Guillermo Rauch

    Guillermo Rauch

  • Kevin Systrom

    Kevin Systrom

  • Ilya Sukhar

    Ilya Sukhar

  • Jonathan Swanson

    Jonathan Swanson

  • Lan Xuezhao

    Lan Xuezhao

Our Board

  • Alex Wang

    Alex Wang

  • Daniel Levine

    Daniel Levine

  • Mike Volpi

    Mike Volpi


Scale is growing. Grow with us.

Join us as we accelerate the development of AI applications.

Get Labeled Data Today