Overview
Scale Document AI for Accounts Payable
SAP is a leading software corporation headquartered in Walldorf, Germany, that develops enterprise software for the “Management of business processes, developing solutions that facilitate effective data processing.” Known best for its enterprise resource planning (ERP) software, SAP aims to help companies and organizations of all sizes run their businesses profitably, adapt continuously, and grow sustainably. “We have a rich spectrum of use cases ranging from document processing, analytics, master data matching, process automation, and many more,” said Urko Sanchez, Senior Director, Head of AI Functions, Europe. “For this collaboration with Scale, we focused on improving our products around document processing, especially those dealing with invoices, purchase orders, and payments advices,” he added. “By bringing AI-powered products and services, we are helping companies to modernize and become future-proof,” he concluded.
The Problem
Document variability made high-quality training data a challenge
The team at SAP had a trove of customer documents but needed a partner to create a comprehensive dataset to enhance their accounts payable products while respecting the data ownership, privacy, and sensitivity of its customers. High-quality data is critical for performant models. SAP needed higher quality training data to train models for processing and extracting crucial information from purchase orders and invoices in English, German, and Spanish. “Our customers are very heterogeneous. We have customers that provide us with thousands of documents a week, while others might require months for a fraction of the same volume. Sometimes, customers might not even know in advance what amount of data they have available,” explained Sanchez. “Therefore, finding a reliable and flexible partner for creating high-quality training datasets was key for us,” he added.
The Solution
Data extraction at over 95% accuracy from diverse document types under 60 seconds
With deep expertise in labeling data across a wide range of use cases, the team at Scale delivered high-quality labeled data across three languages, multiple document types, and 200+ unique fields. Adding to the complexity, fields in documents that contained personally identifiable information (PII) such as names, phone numbers, and emails need to be replaced with semantically similar but different examples to preserve data privacy. Despite these challenges, Scale delivered labeled data at near-perfect accuracy on a quick ramp to high volumes.
“We decided to go with Scale because we needed a heavily technical partner that could deal with changing requirements dynamically, guaranteeing availability, quality, and scale,” said Sanchez. The Document AI team did this by partnering closely with SAP to understand the technical and business requirements of SAP’s Business Document Processing (BDP) model, then leveraging a combination of machine learning-powered pre-labeling (ML pre-labeling) and Scale’s global labeling operations to deliver high-quality data.
The Result
Enhanced Business Document Processing Services
ML models learn associations between input and output data. If there are errors in training or validation data, the resulting model will underperform or learn the wrong task. By leveraging Scale, SAP was able to improve their Business Document Processing (BDP) services portfolio. “Our services are now much more accurate and can deal with new types of documents and languages,” explained Sanchez. “Overall, we have experienced a great boost in accuracy across the board. The whole process has also been a valuable experience to understand better the trade-off between the cost of higher quality data and the accuracy of our models,” concluded Sanchez.