In traditional software development, developers explicitly specify instructions for a computer to follow to produce outputs from inputs. When using machine learning (ML), instead of explicit instructions we provide the ML model with examples (pairs of outputs and inputs), which the model imitates trying to produce the same outputs from the corresponding inputs. These examples are called Training Data.
The quality of training data has a large impact on ML model performance. To get the most out of your models, be sure to:
When developing training data, it's not enough just to collect vast amounts of data. The data also must be annotated with an end application in mind (ex: training data for self-driving cars versus training data for retail).
This is important, becasuse you must understand how the mistakes, inconsistencies or errors in your data interact with the implicit assumptions of your model and to decide how to compromise between different errors (false positives vs false negatives, outliers).
Scale not only supports a standard group of labels (e.g. pedestrians, cyclists, cars) but labels can also be customized to support any use case.
There are two types of rare cases: expected and unexpected
Our taskers are highly trained and specialized (particularly for autonomous vehicles and robotics) and will escalate rare cases to better balance datasets.
Our platform will correct for systematic errors.
As briefly touched upon by unexpected rare cases, systems are constantly evolving. Models are capable of adapting to these evolving systems as long as they are provided with updated data. As such, to have high performing models, training datasets also need to be continuously updated to deal with new scenarios.
Developing high quality training data is a challenging problem. Certain annotation types (e.s. LiDAR Annotation) are complex, and difficult to do without developing specialized tooling and making significant investments to train people. Scale's suite of annotation tasks are capable of supporting a wide variety of end points, and allows ML engineers to focus on the more impactful and differentiated work of developing models rather than worrying about how to get their data annotated.
For more on specific use cases, take a look at our: