How ML Linting Meaningfully Improves Adaptive Document Processing at Scale
Document AI is powered by machine learning (ML) models that process documents at high accuracy and speed. When these models are fine-tuned for a customer, they become even more accurate and regularly achieve near-perfect accuracy levels and do not require any human intervention. Scale engineers continuously work to improve these models by creating new tools to make sure Document AI is highly reliable for the most critical industries we serve. This blog will highlight one of those tools, a technique called “linting”, and how it helps make Document AI’s models more accurate.
What is Linting?
At Scale, linting is the practice of creating rules or developing machine learning models that automatically detect errors in training data used to create models. In the case of Document AI, we leverage two types of linters: simple, quick linters and more complex ML-based linters.
Here is an example of a simple linter comparing field types and their responses within a document.
In the example above, the letter ‘O’ should not be in a numerical, monetary amount field and our field-type linter would flag this example for correction.
Another example of the simple quick linters is a linter that enforces known business rules. For example, a linter can ensure an invoice subtotal is equal to the sum of all line item totals.
In the example above, the subtotal on the right ($16) is inaccurate and our linter would flag this example for correction.
These simple linters are helpful where straightforward rules can easily be applied. Simple linters, while useful, can be implemented by most. What’s more unique about Scale Document AI is how we go beyond the mainstream methods and tools in the document industry to achieve high-quality.
Scenarios that require a higher level of capability or understanding are better suited to machine learning (ML)-powered linters instead of rule-based linters, which is Scale’s specialty. Our specialized ML linters are trained to perform specific tasks, such as learning to compare differences in a transcribed text and the visual representation of the text as it appears on the document’s page. In this example, when the linter finds a meaningful discrepancy, it automatically corrects it or flags the document for review by highly experienced and skilled reviewers. This human-in-the-loop review is optionally available based on customers’ needs and preferences.
In the example shown above, we trained an ML-based linter to understand the context to know that “Account #” on the bottom left refers to a bank account number as it is in a section with ACH instructions while “Account Number” on the top right is referring to a business account number. If “Account Number” in the example on the top right had been identified as a bank account number, the linter would flag this for correction.
ML linters are inherently more difficult to implement than rule-based linters as they have the same challenges as developing any other machine learning models. Developing an ML model is a statistical process prone to variation. That variation can manifest as unreliability or as overfitting. For example, unreliability can lead to an ML model producing wildly different predictions on two slightly different documents. With overfitting, the model learns the detail and noise in the training data and fails to perform on new data. In the case of linting, the ML models are trained on data with errors to help identify and flag other errors. The data used for training linters is inherently not of the highest quality; therefore it’s common that while the ML linter can still make decent suggestions, the model itself is not the most performant. This requires our ML team to place more restrictions and guardrails, like ensuring the confidence threshold is set high to only flag errors and make suggestions on things the model is fairly certain about. This prevents the model from producing too many unreliable linter suggestions that lead to human reviewers simply ignoring or not trusting these suggestions.
Developing and deploying ML linters is also an iterative process with a feedback loop. As we deploy ML linters for our document processing models, the document processing models improve upon being retrained with higher quality data. As the document processing models improve, it gives better first results as well as learning other errors to surface to the human reviewers.
How Do Linters Improve Models?
The power of machine learning to recognize patterns between input and output data is its greatest strength as well as its Achilles’ heel. If the training data contains errors, the model will learn the wrong task. Having the right linters in place increases the accuracy of machine learning models by ensuring the data used for training is of the highest quality.
For Document AI, we began by designing simple linters to check field types such as monetary amounts, quantities, and ZIP codes. We then expanded beyond these simple linters and implement ML-based linters to check for complex fields such as addresses with varying formats that we would need to parse and enforce.
To make things more concrete, let’s think about the use case of invoices. At first glance, it seems straightforward to recognize an invoice: they all list the products sold or the services rendered, the price of each, and how payment should be made. However, invoice layouts, details, and content vary widely from business to business. Some invoices are particularly formal, containing everything from company bank account numbers and line item descriptions to account manager names and contact information. On the other end of the spectrum, an informal email between longtime business partners can also serve as an invoice. The unstructured and ever-changing nature of invoices make it challenging to model comprehensively and accurately.
When building Document AI’s machine learning model for processing invoices, using linters resulted in a 16.3% relative decrease in transcription errors in the training data. This relative decrease in transcription errors is crucial, as the misspelled text makes it impossible for our document parsing models to extract the right information. This linter also helped remove the need for an additional layer of human quality assurance (QA) review while still ensuring that training data is pristine.
Training our models on the higher quality training data led to a 14.3% relative increase in our document processing model’s accuracy. This improvement translates to increased model generalizability and robustness, and more accurate models mean fewer manual reviews, providing additional value for our customers.
Linters make Document AI more accurate and hence deliver higher quality models to our customers who rely on high accuracy document processing.
Parting Thoughts
Scale Document AI distills insights from your messy, changing document data and delivers fast, accurate models tailored to your business without the need for templates. If you’re interested in learning more, take a look at our product video or visit our product page.
Acknowledgments:
We thank Malcolm Greaves from Scale’s Document AI Machine Learning team for his experiments that laid the groundwork for this post and for frequent consultations throughout the writing of this blog.