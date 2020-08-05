We founded Scale to create the infrastructure needed to build AI in any industry, by anyone. We started tackling this complex problem at the root — turning raw data into high quality training data for models. In this pursuit, we spent the last four years building ML-augmented annotation products for all data types, expanding our solutions for major industries, and making significant technological strides in scaling our use of ML in doing so.

But the problem of building effective, accurate, unbiased ML models still remains. To do this, aggregate metrics in ML are not good enough. Better ML starts with understanding your data in depth. To improve production ML, you need to understand their qualitative failure modes, fix them by gathering the right data, and curate diverse scenarios.

Before training a model, ML engineers must curate and sample their data, ensuring that they have the right data to solve a specific problem. This process is too often very manual. For example, to teach a self-driving car how to handle left turns, an ML team has to manually crawl through their driving sequences to isolate examples of left turns to create a training dataset. The data also needs to be representative of the ground truth of the problem you’re trying to solve. If you’re building a model to assign a gender to faces, you have to ensure representative data for all genders to have unbiased outputs. Again, that’s too often a manual and highly inefficient process. After training, ML teams test and benchmark model performance, ensuring that the test dataset is also sufficiently representative of the problem they are trying to assess its performance against. For example, a model learning how to tell cars apart from pedestrians needs enough of both examples for accurate benchmarking. This requires ML engineers to spend significant amounts of time building one-off UIs to chart and share performance data. After deployment, debugging the model, identifying failure modes and fixing them. Too often issues in data only arise after a model has entered deployment – requiring time-consuming debugging. One Scale customer, for example, found that their vehicle recognition algorithm didn’t perform well in certain environments – it turned out that the model was trained on a dataset where vehicles were mostly in the bottom of the image, so the model associated “bottom of the image” with “likelihood of being a car.”

The Scale team has been working to productize the concept that Andrej Karpathy calls "Operation Vacation." Nucleus is a new way—the right way—to develop ML models, helping us move away from the concept of one dataset and towards a paradigm of collections of scenarios and giving ML engineers the ability to automate time-consuming manual steps in the ML development process.

Scale Nucleus provides advanced tooling for understanding, visualizing, curating, and collaborating on your data – allowing teams to build better ML models via a powerful interface and APIs. With Scale Nucleus, you can: Visualize your dataset , ground truth, and model predictions to improve model performance

Scale Nucleus’s query function, allowing users to return images matching automatically-generated meta-tags.