Yuka Advises Users about the Health Impact of Food Products and Cosmetics They Consume
Yuka Needed Massive Amounts of Data Labels Fast
Yuka’s existing database is massive, containing over 4 million products, and more are added every day. The database is growing rapidly as approximately 1,200 new products are added daily. That’s a huge amount of data, and Yuka’s small team certainly cannot manually review each new product that’s added to the platform. Adding a new product to the platform also often requires multiple transcription tasks – Adding a food item, for example, requires the application to scan both a nutritional table and an ingredients list.
Yuka partially accounts for this quantity of data by first using OCR to scan product images for text describing the product’s nutritional information and ingredients. This process is not perfect, however – OCR does not have a high accuracy across all labels, and there are a lot of instances where it fails. For example, OCR doesn’t perform well on images that feature inconsistent lighting, obstructions, or irregular text surfaces.
To ensure only high-quality information is added to the application, Yuka checks that OCR achieves a sufficient detection rate before adding a product to a database and generating its health score. If the OCR results are insufficient quality, these product images need to instead be labeled by a human annotator. Because of OCR’s limited robustness, about 60% of the images submitted to Yuka generally need to be outsourced to a human annotator. In a given day, this could be as many as 500 to 1000 images! Labeling this many images manually was out of the question for Yuka’s small team. They wanted to get transcription results quickly, too – When a user adds a new product to the database, Yuka aims to provide the product’s health score within 2-3 hours!
Scale Rapid Gives Yuka High-Quality Transcriptions Within Hours
When OCR doesn’t achieve a sufficient detection rate on an image of a nutritional table or ingredients list, Yuka sends the image to Scale so that a human annotator can manually transcribe the data. Typically, Yuka sends hundreds or thousands of these images to Scale each day.
When Scale returns the transcribed data, Yuka compares the text against their existing dictionary of known ingredients and nutritional information. If a transcribed word is missing a letter, for example, it will be flagged as an error. If the error rate is low enough, Yuka integrates that product into their database. The system then calculates a health rating for the product, which is made available to all users of the platform.
Because Yuka aims to provide users with a health rating for new products within 2 hours, they need to receive data transcriptions quickly. Scale Rapid is able to handle massive amounts of data within an exceptionally short period of time, and consistently provides Yuka with transcriptions within 2-3 hours of each request. Additionally, Scale Rapid meets all the language capabilities of Yuka’s application, consistently providing accurate transcriptions in English, French, German, Italian, and Spanish.
Users Can Obtain Health Information For New Products in 3 Hours or Less
Yuka is able to easily incorporate the transcriptions they receive from Scale Rapid into their application. When new products are added to the application, Yuka automatically sends the necessary data requests to Scale Rapid using the Scale API. Once the transcriptions are returned, they are verified by the application and then immediately added to the database. Using this pipeline, Yuka is able to provide users with product health scores in just hours. With this capability, Yuka’s database can continue to grow quickly and sustainably, all while providing an excellent user experience.