Scaling Menu Transcription Tasks with Scale Document

by Logan Ford and Teddy Lee on November 16th, 2020

Scaling Menu Transcription Tasks with Scale Document cover

When Covid-19 first hit, many companies had to rapidly reorient their operations – from altering their supply chains to implementing social distancing and work-from-home orders.

One sector that was particularly hard hit was the restaurant industry. When in-person dining was suspended, restaurants turned to delivery to survive. But setting up a delivery service from scratch takes time and money that many establishments, with their income drying up, did not have. Suddenly, food delivery platforms became an economic lifeline.

To ensure restaurants could onboard quickly and seamlessly in these critical early days, one of the leading delivery platforms turned to Scale.

Scaling labeling

Every time a restaurant joins one of the delivery platforms or changes its menu, the menu’s contents need to be inputted into the platform so users can select their food. Inputting menus manually is slow, expensive, and leads to mistakes. With restaurants relying on delivery to reach customers and support their staff, a leading delivery platform knew it was vital to ensure an efficient process for its partner restaurants.

This presented Scale with unique latency and quality control challenges, requiring both an incredibly high level of accuracy with a fast turnaround to provide operational efficiencies at a critical time for our customer. Our aim was to use the combination of partial automation and human quality control in our Scale Document labeling pipeline to make dramatic efficiency improvements over manual processes. In all, we nearly halved the time it takes our customer to process menus and reduced critical error rates to <1% for all items labeled, allowing them to onboard restaurants smoothly exactly when they needed it.

Building the right data infrastructure

While processing standardized documents such as food menus, government IDs or loan application forms might be intuitive and simple for humans, it is a surprisingly nontrivial problem for algorithms, involving much more than automatic transcription. In particular, the data often contains many dependencies that need to be captured accurately. For example, ordering a pepperoni pizza doesn’t just mean choosing the type, but also the size, the type of base, choosing to substitute or customize toppings (which can also vary in price depending on the size of the pizza, creating more interdependencies), and whether to add extras like burgers or sandwiches. It’s important to capture these dependencies in the menu correctly, otherwise inputting errors risk causing losses for both the restaurant and the delivery platform.


Algorithmically, this has to be represented as complex decision trees made up of categories, items, and lists of options, capturing the relationships between them. Thankfully, both our document processing pipeline and form capture tools were already set up to capture these types of nested data structures automatically without the need for customization. This meant we were able to quickly pivot our tools for our customer’s requirements.

Judicious Automation and Scaling

Once we have the right tooling in place to pull the raw data and label complex data structures, we can then start processing the data. Given how many menus delivery platforms were processing at the start of shelter-in-place, and how important for restaurants’ continued operations it was to process this data quickly with no drop-off in quality, we need to make this process as efficient as possible.

At Scale AI, we believe that a mix of judicious labeling automation with continued human oversight is the only way to provide data at the scale, quality, and low cost needed to enable many enterprise applications of AI. In particular, we aim to automate the lower-skilled parts of the data labeling pipeline to focus human review on the hard parts – quality control, edge cases, and the most complex data types. That way, we can help guarantee quality without sacrificing efficiency.

We use automation to prefill menu taxonomies for expert human labelers to review and confirm.

We trained our form capture tool to automatically understand document structure and predict the next transcription.

And when our labelers needed to add inputs manually, we built a “smart suggestion” feature that provides them with ready prompts for common items that they have already encountered in their batch of menus.

Both have helped improve both the efficiency and accuracy of labeling simultaneously: allowing us to transcribe more than 3,000 menus in a single day and return menus within short turnaround times to onboard restaurants within 24 hours from start-to-finish. By drawing on our wide network of well-trained labelers and augmenting their work with careful automation, we’re able to handle changes in demand dynamically – essential for providing a smooth service during sudden peaks in demand.

Custom Benchmarks and Integrations

Once data is flowing smoothly and efficiently through this labeling pipeline, the next step is to ensure that the data is being labeled to an incredibly high standard, with almost zero errors. We’ve developed a range of techniques to guarantee label quality, including the randomized use of benchmark quality tests, confidence-based consensus, and, for ambiguous labeling requirements, our own automated benchmark generation system.

Benchmarking is not always an objective science – the most effective benchmarks are often task-specific. The high stakes of onboarding restaurants during a pandemic required particularly high guarantees of labeling quality. To meet them, we worked closely with our customer to build custom benchmarks for this workflow, turning the most complex mislabeled menus into benchmarks that assessed our labelers’ performance.

In surveying mislabeled menus, we noticed two key groups of errors. One was associated with the cross-referencing of menu sections between many items and mistaking how they relate to each other, while the other group tended to be mistakes from larger menus where it may be less obvious when certain features of the menu are missing. By identifying these key groups and creating benchmarks out of the associated menus, we can ensure that tasker performance in such cases exceeds the necessary standard to meet menu quality needs.

The menu on the left showcases large amounts of cross-referencing and item optionality, while the menu on the right shows a fairly common amount of menu feature density.

What’s Next

AI models are rapidly becoming critical infrastructure for a huge range of businesses – a trend of automation that the COVID pandemic has rapidly accelerated. But gathering effective, accurate, and unbiased data currently remains such a challenging task that it is preventing real-world AI systems from reaching their full potential and presenting high complexity barriers for smaller companies.

We’re tackling a whole host of problems to help accelerate that progress. Right now, teams at Scale AI are making better use of machine learning to make labeling orders of magnitude more efficient, building tools to process increasingly complex data types, such as 3D point clouds in computer vision, and developing new infrastructure tools to help streamline the management of data. Our first such management tool, Scale Nucleus, is already helping our computer vision customers automate time-consuming manual steps in the ML development process. We’re now working on deploying it for our customers in natural language.

Ultimately, we want to make it as easy to deploy AI as any other type of software. If you’re interested in joining us in solving these problems, take a look at our careers page for our latest open positions. If you have projects that require high-quality data labeling, let our team know!