TransformX Virtual Conference

120 Speakers + 80 Sessions + 20K Attendees

TransformX Logo

Safely handling autonomous trucking edge cases with synthetic data.

Overview

Kodiak Is Building Perception and Autonomy Systems that Drive the Future of Freight Transportation.

Kodiak Robotics is an autonomous technology company that is building self-driving capabilities and technologies for the long-haul trucking industry. Based in Mountain View, CA, Kodiak leverages a unique sensor fusion system combined with a lightweight mapping solution to safely navigate all aspects of highway driving and deliver freight efficiently and on-time. Kodiak’s team, which includes several self-driving industry veterans, is redefining the long-haul trucking industry by “building the world’s most efficient, reliable, and respected end-to-end delivery solution.”

The Problem

Large Training Datasets, yet Few Examples of Important Edge Cases.

In most ideal driving scenarios, trucks don’t encounter pedestrians on the highway. When they do, though, knowing how to detect and navigate unexpected situations is a requirement for any production-level, autonomous vehicle system. Kodiak’s software stack learns how to identify and navigate rare scenarios by training models on examples. But it’s often difficult to collect enough examples in the real world to reliably handle certain edge cases. For Kodiak, one of those challenging edge cases is pedestrians walking on the highway.

“We wanted to iterate quickly without substantial overhead, and that’s what Scale provided to us. Scale’s approach is unique in that they both generate the data and provide technical partnership. Ultimately, our work together was aligned towards driving model improvements.”

Derek Phillips

Derek Phillips

Senior Software Engineer Kodiak

The Solution

Increase Model Robustness by Training on Synthesized Rare Scenarios.

The Kodiak team chose Scale to provide synthetic data to augment Kodiak’s existing ground-truth training data with simulated pedestrians. Scale provides a unique human-in-the-loop synthetic data generation process to create diverse and realistic synthetic data. Trained taskers can validate the placement and poses of synthetic pedestrians to ensure the synthetic data is realistic. Scale delivers the data using the same dashboard and APIs as their existing annotation pipeline, making integration seamless.


You can explore synthetic pedestrians augmented on top of the existing open-source PandaSet Lidar dataset above.

The Result

Nucleus Helps Kodiak Identify Where More Synthetic Data will Improve Accuracy.

In Nucleus, Kodiak plans to continue to use Natural Language search and Autotags to find the specific scenes in their dataset that had edge cases they needed to improve their model on. This includes—among other scenarios—scenes where construction workers are present and where a vehicle is traveling under a bridge.

For efficiency, the Kodiak team centralized all of their data, including multiple labeling projects and raw, unlabeled data, into a single dataset. This allows the team to quickly iterate on model experiments, query for specific attributes or metadata on the fly, and close the loop for a more end-to-end data and model management system.

Going forward, the team is able to review both insights and model metrics in Nucleus in order to identify scenes with poor IoU (intersection over union) and curate subsets of data where their model wasn’t performing well, in which additional synthetic data might be helpful.

"Kodiak uses synthetic data not as an alternative to real-world data, but as a complement. Scale Synthetic is an important enabler of that approach. By using Scale Synthetic to efficiently generate a large number of rare edge cases, the Kodiak Driver leverages the best of both real and synthetic data. Everything integrates seamlessly with our existing data labeling pipeline and data management tooling."

Akshay Khatri

Akshay Khatri

Perception Software Engineer Kodiak

Get Started Today