With Scale Rapid, Yuka Provides Users with a Massive Product Database


Yuka Advises Users about the Health Impact of Food Products and Cosmetics They Consume

Yuka is a mobile application that allows users to scan foods and cosmetics, and get the product’s associated health impact. For items already in the system’s database, users can get results in real-time. If a product is not already listed in the application, the users instead add images of the ingredients list, and, in the case of food products, the nutritional table.

The application first uses OCR to extract data from uploaded images. It then uses additional machine learning tasks to see if the data is relevant and processes the nutritional info and ingredients list. With this extracted information, the application can then calculate a health score for a given product. If the OCR process works perfectly, users can get the health impact of new products in real-time!

The Problem

Yuka Needed Massive Amounts of Data Labels Fast

Yuka’s existing database is massive, containing over 4 million products, and more are added every day. The database is growing rapidly as approximately 1,200 new products are added daily. That’s a huge amount of data, and Yuka’s small team certainly cannot manually review each new product that’s added to the platform. Adding a new product to the platform also often requires multiple transcription tasks – Adding a food item, for example, requires the application to scan both a nutritional table and an ingredients list.

Yuka partially accounts for this quantity of data by first using OCR to scan product images for text describing the product’s nutritional information and ingredients. This process is not perfect, however – OCR does not have a high accuracy across all labels, and there are a lot of instances where it fails. For example, OCR doesn’t perform well on images that feature inconsistent lighting, obstructions, or irregular text surfaces.

To ensure only high-quality information is added to the application, Yuka checks that OCR achieves a sufficient detection rate before adding a product to a database and generating its health score. If the OCR results are insufficient quality, these product images need to instead be labeled by a human annotator. Because of OCR’s limited robustness, about 60% of the images submitted to Yuka generally need to be outsourced to a human annotator. In a given day, this could be as many as 500 to 1000 images! Labeling this many images manually was out of the question for Yuka’s small team. They wanted to get transcription results quickly, too – When a user adds a new product to the database, Yuka aims to provide the product’s health score within 2-3 hours!

"We send Scale 500-1000 requests per day to extract text from images, and Scale reliably gets us results within hours."

Julie Chapon

Julie Chapon

Co-founder & Executive Director Yuka

The Solution

Scale Rapid Gives Yuka High-Quality Transcriptions Within Hours

When OCR doesn’t achieve a sufficient detection rate on an image of a nutritional table or ingredients list, Yuka sends the image to Scale so that a human annotator can manually transcribe the data. Typically, Yuka sends hundreds or thousands of these images to Scale each day.

When Scale returns the transcribed data, Yuka compares the text against their existing dictionary of known ingredients and nutritional information. If a transcribed word is missing a letter, for example, it will be flagged as an error. If the error rate is low enough, Yuka integrates that product into their database. The system then calculates a health rating for the product, which is made available to all users of the platform.

Because Yuka aims to provide users with a health rating for new products within 2 hours, they need to receive data transcriptions quickly. Scale Rapid is able to handle massive amounts of data within an exceptionally short period of time, and consistently provides Yuka with transcriptions within 2-3 hours of each request. Additionally, Scale Rapid meets all the language capabilities of Yuka’s application, consistently providing accurate transcriptions in English, French, German, Italian, and Spanish.

"Scale makes it easy to implement the data annotation system we need. We can easily review tasks in the platform to ensure we receive high-quality transcriptions."

Julie Chapon

Julie Chapon

Co-founder & Executive Director Yuka

The Result

Users Can Obtain Health Information For New Products in 3 Hours or Less

Yuka is able to easily incorporate the transcriptions they receive from Scale Rapid into their application. When new products are added to the application, Yuka automatically sends the necessary data requests to Scale Rapid using the Scale API. Once the transcriptions are returned, they are verified by the application and then immediately added to the database. Using this pipeline, Yuka is able to provide users with product health scores in just hours. With this capability, Yuka’s database can continue to grow quickly and sustainably, all while providing an excellent user experience.

"We’ve been working with Scale for 5 years and are extremely happy with the experience. Scale easily manages huge amounts of data, and provides us with the fast results that we need."

Julie Chapon

Julie Chapon

Co-founder & Executive Director Yuka