We founded Scale to create the infrastructure needed to build AI in any industry, by anyone. We started tackling this complex problem at the root — turning raw data into high quality training data for models. In this pursuit, we spent the last four years building ML-augmented annotation products for all data types, expanding our solutions for major industries, and making significant technological strides in scaling our use of ML in doing so.
But the problem of building effective, accurate, unbiased ML models still remains. To do this, aggregate metrics in ML are not good enough. Better ML starts with understanding your data in depth. To improve production ML, you need to understand their qualitative failure modes, fix them by gathering the right data, and curate diverse scenarios.
Before training a model, ML engineers must curate and sample their data, ensuring that they have the right data to solve a specific problem. This process is too often very manual. For example, to teach a self-driving car how to handle left turns, an ML team has to manually crawl through their driving sequences to isolate examples of left turns to create a training dataset. The data also needs to be representative of the ground truth of the problem you’re trying to solve. If you’re building a model to assign a gender to faces, you have to ensure representative data for all genders to have unbiased outputs. Again, that’s too often a manual and highly inefficient process.
After training, ML teams test and benchmark model performance, ensuring that the test dataset is also sufficiently representative of the problem they are trying to assess its performance against. For example, a model learning how to tell cars apart from pedestrians needs enough of both examples for accurate benchmarking. This requires ML engineers to spend significant amounts of time building one-off UIs to chart and share performance data.
After deployment, debugging the model, identifying failure modes and fixing them. Too often issues in data only arise after a model has entered deployment – requiring time-consuming debugging. One Scale customer, for example, found that their vehicle recognition algorithm didn’t perform well in certain environments – it turned out that the model was trained on a dataset where vehicles were mostly in the bottom of the image, so the model associated “bottom of the image” with “likelihood of being a car.”
The Scale team has been working to productize the concept that Andrej Karpathy calls "Operation Vacation." Nucleus is a new way—the right way—to develop ML models, helping us move away from the concept of one dataset and towards a paradigm of collections of scenarios and giving ML engineers the ability to automate time-consuming manual steps in the ML development process.
Scale Nucleus provides advanced tooling for understanding, visualizing, curating, and collaborating on your data – allowing teams to build better ML models via a powerful interface and APIs. With Scale Nucleus, you can:
Visualize your dataset, ground truth, and model predictions to improve model performance
Curate interesting slices within your dataset for active learning and identifying key edge cases
Upload and choose data to be annotated for rare event mining and dataset balancing
Search your data based on metadata or ML-produced attributes
Identify edge cases through visual search
Measure key metrics like dataset balance, class correlation, and confusion, via a powerful insights tab that shows the overall health of the data
Debug model performance
Share your dataset seamlessly to provide a single source of truth for data within your team.
Scale Nucleus is directly integrated with Scale AI’s data labeling pipeline – allowing teams to fix any issues in their data at the source. Nucleus currently supports Image data, with support for 3D Sensor Fusion, Video, Text and Document data coming soon.
We built Scale Nucleus to allow data scientists and ML engineers to manage data more efficiently and increase the marginal value of their data. Deeper insight into a dataset’s features can have several transformational effects on the development of AI:
For engineers, the easier visualization and curation of datasets lowers the barriers to entry to building ML systems.
Spotting hidden modes of failure in datasets before deployment (such as a set of driving sequences that doesn’t include any sequences at night) makes it much easier to train high-quality models, and provides a robust way of eliminating issues like bias at source.
The ease of debugging could significantly improve iteration speed on model fine-tuning post-deployment.
We are excited to take the next step in building the infrastructure to enable efficient, accurate, and unbiased ML development. If you’d like to join us in this journey and try Scale Nucleus, contact us at email@example.com or sign up on our website.