Scaling Menu Transcription Tasks with Scale Document
When Covid-19 first hit, many companies had to rapidly reorient their
operations – from altering their supply chains to implementing social
distancing and work-from-home orders.
One sector that was particularly hard hit was the restaurant industry. When
in-person dining was suspended, restaurants turned to delivery to survive. But
setting up a delivery service from scratch takes time and money that many
establishments, with their income drying up, did not have. Suddenly, food
delivery platforms became an economic lifeline.
To ensure restaurants could onboard quickly and seamlessly in these critical
early days, one of the leading delivery platforms turned to Scale.
Scaling labeling
Every time a restaurant joins one of the delivery platforms or changes its
menu, the menu’s contents need to be inputted into the platform so users can
select their food. Inputting menus manually is slow, expensive, and leads to
mistakes. With restaurants relying on delivery to reach customers and support
their staff, a leading delivery platform knew it was vital to ensure an
efficient process for its partner restaurants.
This presented Scale with unique latency and quality control challenges,
requiring both an incredibly high level of accuracy with a fast turnaround to
provide operational efficiencies at a critical time for our customer. Our aim
was to use the combination of partial automation and human quality control in
our Scale Document labeling pipeline
to make dramatic efficiency improvements over manual processes. In all, we
nearly halved the time it takes our customer to process menus and reduced
critical error rates to <1% for all items labeled, allowing them to onboard
restaurants smoothly exactly when they needed it.
Building the right data infrastructure
While processing standardized documents such as food menus, government IDs or
loan application forms might be intuitive and simple for humans, it is a
surprisingly nontrivial problem for algorithms, involving much more than
automatic transcription. In particular, the data often contains many
dependencies that need to be captured accurately. For example, ordering a
pepperoni pizza doesn’t just mean choosing the type, but also the size, the
type of base, choosing to substitute or customize toppings (which can also
vary in price depending on the size of the pizza, creating more
interdependencies), and whether to add extras like burgers or sandwiches. It’s
important to capture these dependencies in the menu correctly, otherwise
inputting errors risk causing losses for both the restaurant and the delivery
platform.
Algorithmically, this has to be represented as complex decision trees made
up of categories, items, and lists of options, capturing the relationships
between them. Thankfully, both our document processing pipeline and form
capture tools were already set up to capture these types of nested data
structures automatically without the need for customization. This meant we
were able to quickly pivot our tools for our customer’s requirements.
Judicious Automation and Scaling
Once we have the right tooling in place to pull the raw data and label
complex data structures, we can then start processing the data. Given how
many menus delivery platforms were processing at the start of
shelter-in-place, and how important for restaurants’ continued operations it
was to process this data quickly with no drop-off in quality, we need to
make this process as efficient as possible.
At Scale AI, we believe that a mix of judicious labeling automation with
continued human oversight is the only way to provide data at the scale,
quality, and low cost needed to enable many enterprise applications of AI.
In particular, we aim to automate the lower-skilled parts of the data
labeling pipeline to focus human review on the hard parts – quality control,
edge cases, and the most complex data types. That way, we can help guarantee
quality without sacrificing efficiency.
We use automation to prefill menu taxonomies for expert human labelers to
review and confirm.
We trained our form capture tool to automatically understand document
structure and predict the next transcription.
And when our labelers needed to add inputs manually, we built a “smart
suggestion” feature that provides them with ready prompts for common items
that they have already encountered in their batch of menus.
Both have helped improve both the efficiency and accuracy of labeling
simultaneously: allowing us to transcribe more than 3,000 menus in a single
day and return menus within short turnaround times to onboard restaurants
within 24 hours from start-to-finish. By drawing on our wide network of
well-trained labelers and augmenting their work with careful automation,
we’re able to handle changes in demand dynamically – essential for providing
a smooth service during sudden peaks in demand.
Custom Benchmarks and Integrations
Once data is flowing smoothly and efficiently through this labeling
pipeline, the next step is to ensure that the data is being labeled to an
incredibly high standard, with almost zero errors. We’ve developed a range
of techniques to guarantee label quality, including the randomized use of
benchmark quality tests, confidence-based consensus, and, for ambiguous
labeling requirements, our own
automated benchmark generation system.
Benchmarking is not always an objective science – the most effective
benchmarks are often task-specific. The high stakes of onboarding
restaurants during a pandemic required particularly high guarantees of
labeling quality. To meet them, we worked closely with our customer to build
custom benchmarks for this workflow, turning the most complex mislabeled
menus into benchmarks that assessed our labelers’ performance.
In surveying mislabeled menus, we noticed two key groups of errors. One was
associated with the cross-referencing of menu sections between many items
and mistaking how they relate to each other, while the other group tended to
be mistakes from larger menus where it may be less obvious when certain
features of the menu are missing. By identifying these key groups and
creating benchmarks out of the associated menus, we can ensure that tasker
performance in such cases exceeds the necessary standard to meet menu
quality needs.
The menu on the left showcases large amounts of cross-referencing and item
optionality, while the menu on the right shows a fairly common amount of
menu feature density.
What’s Next
AI models are rapidly becoming critical infrastructure for a huge range of
businesses – a trend of automation that the COVID pandemic has rapidly
accelerated. But gathering effective, accurate, and unbiased data currently
remains such a challenging task that it is preventing real-world AI systems
from reaching their full potential and presenting high complexity barriers
for smaller companies.
We’re tackling a whole host of problems to help accelerate that progress.
Right now, teams at Scale AI are making better use of machine learning to
make labeling orders of magnitude more efficient, building tools to process
increasingly complex data types, such as 3D point clouds in computer vision,
and developing new infrastructure tools to help streamline the management of
data. Our first such management tool,
Scale Nucleus, is already helping our computer vision customers automate time-consuming
manual steps in the ML development process. We’re now working on deploying
it for our customers in natural language.
Ultimately, we want to make it as easy to deploy AI as any other type of
software. If you’re interested in joining us in solving these problems, take
a look at our careers page for our
latest open positions. If you have projects that require high-quality data
labeling, let our team know!