Building In-house Labeling Operations for High-Quality Training Data

Overview

Voxel is Changing How Companies Manage Risk and Operations

Voxel is on a mission to leverage AI and computer vision to fundamentally change how companies manage risk and operations. To do this, Voxel enhances its customers' security cameras with real-time AI to detect hazards, risky activities, and operational inefficiencies. "We enable safety managers to keep more people safe across large warehouses and spaces. Our technology delivers insights that can stop injuries before they happen and ultimately improve the safety culture for hazardous and dynamic environments." - Harishma Dayanidhi, VP Engineering To develop a robust computer vision system, Voxel needs large amounts of high-quality training data that they can train their models on. A plethora of situations and edge cases need to be accounted for, such as potential hazards, risky activities, and inefficiencies in an industrial setting, that Voxel's system needs to be able to identify.

The Problem

Producing Quality Training Data while Automating the Process

Voxel's computer vision team faced two challenges: 1) how to maintain high-quality training data and 2) how to automate their labeling process for faster throughput – all while retaining their in-house annotation team. 

Voxel had already invested the time and effort to assemble an in-house annotation team of subject-matter experts who were well-versed in handling Voxel's specific use case. Voxel saw a strategic advantage in keeping its internal labeling operations. With a team in place, Voxel began looking for a solution that could introduce greater efficiency to its labeling operations.

Until now, the team had been using an open-source solution called Computer Vision Annotation Tool (CVAT). However, the computer vision team at Voxel was ramping up the volume of annotations they needed for model training and was running into significant bottlenecks with CVAT. 

From the operations side, Voxel could not efficiently and programmatically collect data and insights on the data labeling process, resulting in significant manual effort from the data operations team. The open-source tool couldn’t effectively link data quality to individual annotators. Thus, if the team produced a batch of low-quality labels, they couldn't determine whether it was the training, the annotators, or something else. This environment made it difficult for Voxel to automate its data labeling process and scale its labeling operations.

On the engineering side, with CVAT, Voxel needed to custom-build data pipelines for new customer projects. Given the complexity of the data pipelines, this process took multiple engineers four weeks to build the required data infrastructure for each project.

The Solution

Labeling Operations Require APIs, Admin Features, and Integrated Tools

The confluence of these two factors led them to look for a partner with strong data annotation expertise who understands the challenges and pain points of managing one's annotation team.  

Scale Studio was selected because of:

  • Studio's comprehensive management features with training courses, benchmark tasks, and annotator metrics (i.e., throughput, efficiency, accuracy, etc.)
  • Scale's APIs for easy integration of data pipelines and quick set up of labeling projects
  • Scale's ecosystem of integrated ML tools, such as Nucleus for dataset curation and management, and Rapid for Scale-managed dataset annotation
  • Scale's experience with processing billions of annotations, confirming Scale's platform reliability and tried-and-true infrastructure
  • Finally, Scale's credo of "earn customer love" provides the Voxel team with the responsiveness and support necessary to achieve their ambitious goals

While Studio has proven incredibly easy to use for the project management and annotator teams, Voxel has called out the Scale's customer success and engineering teams' strong technical knowledge and responsiveness in getting the most out of the platform. For example, Scale partnered with the Voxel team to ensure that all frames of complex variable-frame-rate (VFR) videos were extracted to maximize the accuracy of the annotations–and thus the accuracy of the model. 

 

“I would definitely say the support we’ve received in working with the Scale team is the best part of the partnership so far… the responsiveness is amazing. If we have a problem, the Scale team always comes up with a thought-out solution.”
Harishma Dayanidhi
VP Engineering
Voxel

The Result

Efficient Operations for High-Quality Training Data and Time Savings

After kicking off the project with Scale Studio, Voxel onboarded their 20+ subject-matter experts onto the platform. Studio gave Voxel's data operations managers visibility into their in-house labeling team with annotator metrics such as throughput, efficiency, and accuracy. Studio also made it easy to streamline a data labeling process with intuitive tools and standardized workflows. They could now forecast labeling capacity and plan to match variable labeling demands. Compared to their previously ad hoc and manual approaches, Voxel's operations managers saved 20% of time each week using Studio.

"Our data operations managers were spending 20% of time each week to manually log batches of data, assign datasets to annotators, and estimate project completion times. We couldn't standardize the workflow and accurately provide visibility to other teams. With Scale Studio, we streamlined this process, giving us the clarity we needed." - Harishma Dayanidhi, VP Engineering

Studio also helped Voxel's computer vision engineering team increase its capacity. With Scale's APIs, it was easy to integrate multiple data pipelines into their operations. There was now less manual work for engineers. With Studio, the engineering team cut their time by 50% to kick off new projects.

"The ease of using Scale's APIs accelerated our timeline for building data pipelines into our operations. It cut our lead time in half. For each new project, our engineers will now free up at least two weeks of their time and can refocus on other priorities."
Harishma Dayanidhi
VP Engineering
Voxel