Deep Learning models have amazing capacity, getting better with more data seemingly without limit. To get a well-functioning model, it is not enough to have large amounts of data, you also need accurate annotations. Although large amounts of data will help the model resolve data inconsistencies between different annotations, humans can still make repeated mistakes that become ingrained in the model.
For example, when asked to draw a box around an object, it is natural for humans to always ensure that the box contains the object - i.e., to err on the side of making the box too big. Using such a model for collision avoidance will result in false positives, causing the autonomous vehicle to halt unnecessarily.
The oversizing of bounding boxes is an example of a systematic error, as opposed to a random error. Random and systematic errors impact the trained model differently.
Random vs Systematic Errors
Random errors cause the model to require more data to converge to the same result. A model trained with data with random errors will ultimately arrive at the same parameters, just more slowly and after having been fed more data.
Systematic errors cause the model to converge to a different result - this does not get better with data quantity. A model trained with oversized boxes will output oversized boxes regardless of how much data it is trained with.
Mitigation Strategies
One simple, if expensive, way to deal with random errors is to use more training data. There are also other more sophisticated and cost-efficient approaches - for example, if your training data has multiple sources, and if each source has a different amount of random errors, you can mitigate the effects of the errors by weighing the low-error sources more heavily while training.
Systematic errors, on the other hand, can only be resolved by fixing the data, or understanding the nature of error well enough to turn it into a random error. Many interesting techniques exist in this area, including Golden Label Correction, where you carefully annotate a subset of the data in order to characterize the non-randomness of the rest of the data.
At the end of the day, the key to overcoming annotation errors effectively and efficiently is understanding how the errors are generated. Let’s take the previous example of oversized bounding boxes. Taking into account the tendency of inexperienced workers to oversize boxes, Scale is able to use simple models to check for oversizing. Over time, with feedback from those models, workers learn to make fewer systematic mistakes, consistently drawing tight bounding boxes. By matching complex annotation specifications with annotator training and an understanding of the systematic errors, Scale is able to efficiently produce quality annotations in quantity, facilitating cost-effective training of accurate models.