Building In-house Labeling Operations for High-Quality Training Data
Voxel is Changing How Companies Manage Risk and Operations
Producing Quality Training Data while Automating the Process
Voxel's computer vision team faced two challenges: 1) how to maintain high-quality training data and 2) how to automate their labeling process for faster throughput – all while retaining their in-house annotation team.
Voxel had already invested the time and effort to assemble an in-house annotation team of subject-matter experts who were well-versed in handling Voxel's specific use case. Voxel saw a strategic advantage in keeping its internal labeling operations. With a team in place, Voxel began looking for a solution that could introduce greater efficiency to its labeling operations.
Until now, the team had been using an open-source solution called Computer Vision Annotation Tool (CVAT). However, the computer vision team at Voxel was ramping up the volume of annotations they needed for model training and was running into significant bottlenecks with CVAT.
From the operations side, Voxel could not efficiently and programmatically collect data and insights on the data labeling process, resulting in significant manual effort from the data operations team. The open-source tool couldn’t effectively link data quality to individual annotators. Thus, if the team produced a batch of low-quality labels, they couldn't determine whether it was the training, the annotators, or something else. This environment made it difficult for Voxel to automate its data labeling process and scale its labeling operations.
On the engineering side, with CVAT, Voxel needed to custom-build data pipelines for new customer projects. Given the complexity of the data pipelines, this process took multiple engineers four weeks to build the required data infrastructure for each project.
Labeling Operations Require APIs, Admin Features, and Integrated Tools
The confluence of these two factors led them to look for a partner with strong data annotation expertise who understands the challenges and pain points of managing one's annotation team.
Scale Studio was selected because of:
- Studio's comprehensive management features with training courses, benchmark tasks, and annotator metrics (i.e., throughput, efficiency, accuracy, etc.)
- Scale's APIs for easy integration of data pipelines and quick set up of labeling projects
- Scale's ecosystem of integrated ML tools, such as Nucleus for dataset curation and management, and Rapid for Scale-managed dataset annotation
- Scale's experience with processing billions of annotations, confirming Scale's platform reliability and tried-and-true infrastructure
- Finally, Scale's credo of "earn customer love" provides the Voxel team with the responsiveness and support necessary to achieve their ambitious goals
While Studio has proven incredibly easy to use for the project management and annotator teams, Voxel has called out the Scale's customer success and engineering teams' strong technical knowledge and responsiveness in getting the most out of the platform. For example, Scale partnered with the Voxel team to ensure that all frames of complex variable-frame-rate (VFR) videos were extracted to maximize the accuracy of the annotations–and thus the accuracy of the model.
Efficient Operations for High-Quality Training Data and Time Savings
After kicking off the project with Scale Studio, Voxel onboarded their 20+ subject-matter experts onto the platform. Studio gave Voxel's data operations managers visibility into their in-house labeling team with annotator metrics such as throughput, efficiency, and accuracy. Studio also made it easy to streamline a data labeling process with intuitive tools and standardized workflows. They could now forecast labeling capacity and plan to match variable labeling demands. Compared to their previously ad hoc and manual approaches, Voxel's operations managers saved 20% of time each week using Studio.
"Our data operations managers were spending 20% of time each week to manually log batches of data, assign datasets to annotators, and estimate project completion times. We couldn't standardize the workflow and accurately provide visibility to other teams. With Scale Studio, we streamlined this process, giving us the clarity we needed." - Harishma Dayanidhi, VP Engineering
Studio also helped Voxel's computer vision engineering team increase its capacity. With Scale's APIs, it was easy to integrate multiple data pipelines into their operations. There was now less manual work for engineers. With Studio, the engineering team cut their time by 50% to kick off new projects.