3d logo


The mission control for your data.

Public Database
My Dataset
camera: FRONTandweather: sunny
  • an annotated imagecheckmark
  • an annotated image
  • an annotated imagecheckmark
  • an annotated imagecheckmark
  • an annotated image
  • an annotated imagecheckmark
  • an annotated imagecheckmark
  • an annotated image
  • an annotated imagecheckmark
  • an annotated imagecheckmark
  • an annotated image
  • an annotated image
  • an annotated imagecheckmark
  • an annotated image
  • an annotated image
  • an annotated image
  • an annotated image
  • an annotated image
  • an annotated image
14 Images SelectedCreate Slice
  • a woman browsing clothesCOCO

    COCO is a large-scale object detection, segmentation, and captioning dataset. It contains 330K images (>200K labeled), 1.5 million object instances, and 80 object categories.
  • a catOpen Images

    Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships.
  • street view from inside a carBDD100K

    The Berkeley DeepDrive dataset contains over 100K videos of driving experiences, each running 30 seconds at 30 frames per second.
  • street view from inside a carPandaset

    Pandaset is a public large-scale dataset for autonomous driving provided by Hesai & Scale. It enables researchers to study challenging urban driving situations using the full sensor suit of a real self-driving-car.
  • Mnist coverMNIST

    The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems.
  • a bearImageNet

    ImageNet is an image database organized according to the WordNet hierarchy, in which each node of the hierarchy is depicted by hundreds and thousands of images.

Understanding your data = better ML

Aggregate metrics in ML are not good enough. To improve production ML, you need to understand their qualitative failure modes, fix them by gathering more data, and curate diverse scenarios.

Scale Nucleus helps you:

  • Visualize your data
  • Curate interesting slices within your dataset
  • Review and manage annotations
  • Measure and debug your model performance

Nucleus is a new way—the right way—to develop ML models, helping us move away from the concept of one dataset and towards a paradigm of collections of scenarios.

graph explaining how nucleus works
  • illustration of a heart pulse

    Ensure Dataset Health

    Understand the health of your datasets by identifying under or overrepresented labels and attributes to mitigate bias without building internal tools.

  • illustration of a circle made of dots

    Automate Data Curation

    Choose what data to label automatically, based on model performance and attributes of the raw data. Use visual search to quickly identify interesting edge cases.

  • illustration of a reving enginge meter

    Track Model Performance

    Easily upload models via API. Share data queries across teams, track performance history, compare models, and debug model performance for faster development.

illustration of shapes

How It Works

Nucleus provides advanced tooling for understanding, visualizing, curating, and collaborating on your data – allowing teams to build better ML models via a powerful interface and APIs.


I trained EfficientNet on PandaSet, check out the results! It does pretty well for cars but not for cyclists.

This is great! Let's share with the team and train it on images with more cyclists to improve that performance


Scale Nucleus enables collaboration across teams between ML engineers, PMs, operations, and managers. By sharing whole datasets, dataset slices, model predictions, and more, Nucleus is a single source of truth for data within the team, increasing iteration speed.

illustration of a reactor


Robust APIs enable you to integrate your entire machine learning development cycle with Scale Nucleus.

Explore Our Docs
  annotations: [
      label: 'car',
      type: 'box',
      x: 655.2530151367188,
      y: 269.39984741210935,
      width: 70.560546875,
      height: 50.55670166015625,
      reference_id: 'put anything you like here',
      confidence: 0.7009941479671303,

Scale AI’s mission is to accelerate the development of AI applications.

Trusted by leading machine learning teams to deliver high-quality training data, Scale Nucleus provides the tools for ML teams to manage their entire ML lifecycle from dataset management, data selection, data annotation, to model development - all in one place.

Nucleus currently supports Image data, with support for 3D Sensor Fusion, Video, Text and Document data on the roadmap.

Contact Us for Your Use Case

Scale is committed to protecting and respecting your privacy.

By submitting the form, you understand Scale will process your personal information in accordance with our Privacy Policy.