Scale AI’s Series C: Building the data platform to accelerate machine learning

by Alexandr Wang on August 5th, 2019

Scale AI’s Series C: Building the data platform to accelerate machine learning cover

Our mission at Scale is to accelerate the development of AI applications. We’re proud of what we’ve built over the last three years, and today we’re announcing our Series C funding round to support our continued work against that mission.

Scale has raised $100m at a valuation of over $1B. Founders Fund led the round, supported by Accel, Coatue Management, Index Ventures, Spark Capital, Thrive Capital, Kevin Systrom, Mike Krieger and Adam d’Angelo.

We are big believers in the transformational impact of AI over the coming decades. Machine learning is likely the most important technological shift of our time, and the overall benefits to the world will be comparable to those of the internet.

What we noticed after working on AI at some of the most advanced organizations in the world was that building machine learning systems was challenging due to a lack of mature infrastructure. In particular, we noticed that the critical bottleneck to further progress today was data—in particular, labeled datasets. Many engineers will tell you that getting labeled data is the hardest part of building a machine learning model.

Building AI represents a fundamentally different paradigm than building traditional software. The performance of an AI system depends more on the data than the algorithm—in most cases, without labeled data, there is no model. Similarly, for a model to improve and adapt, it requires more data rather than simply more code. This paradigm has been true since the very beginning of deep learning; the modern deep learning age was ignited by the launch of the ImageNet dataset by Fei-Fei Li’s lab at Stanford in 2009.


We set out to solve this problem by building a platform that enables organizations with large high-quality datasets needed to build AI systems. Especially now, as AI is being applied in more and more ways, this matters because:

  • Safe, accurate and unbiased AI systems depend on large volumes of high quality training data.
  • The process to acquire, label, and verify training data is slow, manual and expensive.
  • Today’s training data bottleneck limits AI's impact to a small group of well-funded technology companies.

Scale is a solution to these problems. Developers use our API to send their raw data to our platform. We then use a combination of machine learning and human insight to label and annotate this data. We pass this high-quality ground truth data back to our customers, who are then enabled to build world-class models to solve real-world problems.

Scale is now used by world leading machine learning teams at companies like OpenAI, Airbnb, Nuro, and Lyft. Our platform is being used to provide high-quality training data to an ever growing number of machine learning teams and industries, including:

  • Autonomous Vehicles: Advancements in AI for perception, prediction, and planning have enabled the development of autonomy, which in turn relies on data.
  • Natural Language and Content Understanding: More than ever before, systems can understand and drive decision-making on large corpuses of speech, text, images, and other forms of content.
  • E-commerce and Search: Powerful search systems that connect consumers to specific content, images, or products increasingly rely on AI systems to drive understanding of this content. Our platform helps engineers train those systems.
  • Mapping: Mapping using satellite, drone, and street-level imagery for use cases from agriculture to insurance rely on high quality training data to ensure accuracy.
  • Robotics: Robots in factory or warehouse settings need to be able to understand their environments, safely interact with humans and recognize products or components.
  • Augmented Reality and Virtual Reality: AR and VR systems used in gaming, real estate or manufacturing rely on being able to perceive the 3D environment around a camera, headset or user.
  • Offline retail: Cashierless checkouts, inventory management systems and footfall analysis all depend on perception systems, which in turn depend on training data.

We’re proud of the progress we’ve made building the Scale platform for our customers. It’s been an incredibly quick 3 years since I started the company, and the ability we’ve had to accelerate progress in machine learning is palpable. But, as we hope is always the case, we’re far more excited about what we will do in the future.

There is a lack of mature infrastructure for building AI, forcing teams to build their own technology stack from scratch before they can have a meaningful impact. Per our mission, we know there are many problems beyond the training data bottleneck for us to solve to continue accelerating AI development. Despite our excitement for the potential of AI, we’re closer to the beginning of its history than we are to the end.

We’re grateful to our customers for building world-leading AI products with Scale, and we’re looking forward to accelerating the development of machine learning in an even broader set of industries and technologies as we grow. If you are interested in being part of creating the infrastructure machine learning needs, take a look at our open positions. If you're interested in partnering with us, please reach out.