Design Thinking for ML Systems at Scale

byon March 28, 2022


At Scale AI, we’re always looking for ways to apply machine learning (ML) in

our internal systems to deliver the best results for our customers. We’ve

previously written about Scale’s

ML-powered pre-labeling

as well as our

active tooling, like Autosegment, which helps our Taskers label faster and at higher quality.

In this piece, we discuss how we think about and incorporate some basic tenets

of design thinking to develop ML systems that make a difference for our

labeling pipeline.

Design thinking

is a mindset and set of practices which foster the collaboration required to

solve problems in human-centered ways and was notably popularized by

organizations such as IDEO. It’s useful for machine learning engineers who

need to understand the full scope of how the system they are building will be

used and are responsible for both business and operational outcomes. We’ll

describe how design thinking techniques can be applied to scoping machine

learning problems and introduce some concepts to help business stakeholders

communicate more effectively with ML engineers.

Phases of Developing an ML SystemIdealized phases of developing an ML system. In practice, it’s not this


In our experience, the different phases of developing an ML system are:

  1. Discovery - Locate a business or operational pain point

  2. Design - Map the different ways a new or existing model could fit into the
  3. extant pipeline

  4. Decision & Alignment - Align with operational and engineering
  5. stakeholders on scope and resourcing

  6. Development, Integration, Deployment - Deliver the system. Build data
  7. pipelines, train models, integrate into the pipeline, etc.

  8. Measurement & Experimentation - Identify the effect of the ML system as
  9. an intervention on the existing pipeline.

  10. Continuous improvement - Monitor and retrain to account for distribution
  11. shifts and new edge cases

This is a descriptive mental model, not a prescriptive blueprint. In reality,

a project is always moving between different phases and can often be in

multiple phases at the same time. For example, measurement and development

form a tight iteration loop. But, it’s still useful to think about these

phases in isolation.

Concepts from design thinking are mostly employed in the

Discovery and Design phases when the problem

statements are the most inchoate and ill-formed. In these phases, the need or

specific outcome is communicated to an ML engineer and an initial strategy to

address that need or deliver that outcome is designed. We focus on the

Discovery and Design phases in this blog and cover the remaining phases


Phase 1: Discovery

An ML system gains life when a need is discovered. By need, we distinguish

between a request for a specific feature, such as a linter (a tool we use to

ensure data quality), and the underlying operational or business pain point or

problem - such as a new labelling capability requiring ML support. It’s not

uncommon for operational or business teams to stumble into the

XY problem

by pattern matching and requesting a clone of an existing feature, without

realizing that their underlying problem is actually quite different. Or, they

might ask for something which is incredibly difficult, unaware that a slightly

modified version would save literally months of development time. So an ML

engineer should always hunt down the true problem that needs to be solved.

Pain points and problems can be project-specific or exist horizontally across

business teams or units. Historically, many of the problems that the ML team

at Scale have tackled are project-specific and brought to us

by our cross-functional stakeholders. For example, a project manager running

into issues with a labeling project or a product manager looking to launch a

new feature will surface these project-specific issues to our team. Tackling

these isolated problems can result in marooned solutions which aren’t readily


Solving horizontal issues, in comparison, maximizes our

leverage, since the ML systems we build will then scale with our business.

However, cross-business issues that can be solved with ML are often harder to

identify because of the sheer operational complexity. These issues require a

more active discovery process and take more time and deeper alignment with

partners. The problems with the highest stakes are not obvious to ML

engineers, who don’t get day to day exposure to them, even as these frictions

impose significant pain on the operations or business side. To get a sense for

this, imagine explaining the pain of managing multiple python environments and

package versions to an operational lead - it can be incredibly painful for

you, yet esoteric, even cryptic, to them.

As another example, while working on an image semantic segmentation project,

we learned that pre-labels generated by a model specific to that project were

sometimes ignored by taskers, but the root causes were unclear. A close

investigation identified a number of culprits, ranging from the inflexibility

of pre-labels to having optimized for the wrong metric during development.

Autosegment, developed by Sean Li, Saleh Hamadeh, and the two of us, aims to tackle

these problems. Autosegment solves a challenge for all image segmentation

projects and isn’t restricted to a particular project. Learning about this

challenge and building the appropriate solution required input from numerous

stakeholders - from the project leads, who wanted to accelerate taskers; the

taskers themselves, who used our tools and knew what worked and what didn’t;

to engineers, who were familiar with the implementation of image segmentation

in our interface and understood the kinds of tradeoffs that different tools

would be need to make.

Stakeholder concernsExample overlap of concerns between different stakeholders for a new

ML-assisted tool.

Identifying the pain points of the end user always requires talking to the end

user. Instead of theorizing what could be a problem for them, it’s always more

effective to speak to users directly and observe them as they encounter the

problem. When building systems to accelerate labeling operations, ML engineers

need to build user empathy. A simple way to build user empathy is to spend

time tasking and dogfooding our own platform. There is no substitute for first

hand experience for understanding what makes tasking hard.

Phase II: Design

After Discovery, the project enters a design phase where an

initial approach is sketched out - what kind of model to build, what sort of

data is needed, the timelines, and so on.

Designing effective ML systems at Scale requires two things: a clear need,

which we’ve discussed above, and knowing the many different ways a model can

fit into the labelling pipeline.

This last point is worth further elaboration. The same machine learning model

can be used for a variety of operational purposes. For example, a model that

predicts bounding boxes for cars can be used to:

  1. Estimate the number of boxes in a task (task difficulty estimation)
  2. Prepopulate the scene with bounding boxes (pre-labelling)

  3. Suggest fixes to bounding boxes made in real-time by a tasker (active
  4. tooling)

  5. Identify missing boxes or unexpected boxes in a completed scene (linting)
  6. For each of these, the inference API could have the exact same request and
  7. return formats. However, each of the four approaches above require different
  8. operational adjustments, not all of which may be feasible. Task difficulty
  9. estimation makes assumptions about what makes a task difficult or easy for
  10. humans and implicitly maps these to legible subpopulations of taskers.
  11. Active tooling can require nontrivial operational investment in teaching
  12. taskers how to use the new tool and engineering lift to build new frontend
  13. components. Building a useful ML system means designing how an ML model will
  14. interface with the rest of the operational and engineering systems, in
  15. conjunction with actually training the ML model.

One model, many usesThe same model may be applied in different ways to an existing pipeline,

requiring different accommodations by engineering or operational teams.

We’ve noticed two particular traps for ML engineers which can lead them on

wild goose chases during the design phase. The first is misalignment between

the machine learning problem and the business problem. The second is confusing

the customer and the end user of a system.

ML engineers are trained to use models to solve machine learning tasks, like

box detection or text classification. For a box detection dataset, you’d use a

box detection model. For a text classification dataset, a text classification

model. When the business problem looks like an image classification problem,

you might reach for an image classification model, only to discover that what

you really need to solve is tabular regression.

Faced with using ML to assist, say, labelling semantic segmentation training

data, a common reflex is to train a semantic segmentation model to do the

labelling. But if you already had a capable model, you wouldn’t need to

collect training data for it! The mistake here is to apply the familiar task

paradigm - semantic segmentation - to solve the business problem -

accelerating semantic segmentation.

Many operational questions take the form “Is something X or not X?” To an ML

engineer, this sounds like a classification problem, and it’s easy to reach

for a classification model which directly answers the question. Putting aside

whether modeling X directly is feasible, it’s often the case that we already

have an operational or engineering system which could answer this question

more effectively with some additional information. In this situation, it may

be faster to figure out how to extract the augmenting information rather than

replicate the existing system altogether; a strong reason to apply a modeling

scheme that looks very different from the modeling question.

Leveraging existing systemsModeling the business problem to solve doesn’t always mean reaching for the

most obvious machine learning task paradigm.

The second trap is mistaking the end user of a system with the people who

request or manage the system. When we build a tool like

Autosegment, even though the end user of the active tooling is the Tasker, the internal

customer is actually the Delivery Operations team. What we build has to meet

the customer’s expectations, so even if Taskers enjoy using our tool, we need

to make sure to move the needle for the Delivery Operations team. But the

expectations and needs of the end user may differ or conflict from the

internal customer (e.g. latency tradeoff with cost).

In order to avoid falling into these two traps, ML engineers have to be

deliberate in how they transport the business problem into a machine learning

context; and always keep in mind the different stakeholders.

Phases III - VI: Decision and Alignment Through Continuous Improvement

After the initial Discovery and Design phases, the project needs a go / no-go

decision by both the

ML team as well as cross-functional teams. In this phase, all

teams must decide whether to allocate the necessary resources to the project.

If the ML team decides to dedicate resources but the cross-functional team

cannot commit the time or spare the bandwidth, the effort will ultimately fall

short. Some factors teams must consider when making this decision include:

prioritization, level of effort needed, and available bandwidth.

Once all teams commit to the project, we can develop the

ML-specific components of the ML system. This is when we build data pipelines,

train models, and ship an endpoint. Once the model is developed, the project

nearly always needs engineering and operational integration.

For example, if we are developing a new pre-labeling model, the model requires

integration with our labeling pipelines to attach the pre-label to the task.

Now the project is actually deployed. Deployment should also

follow best practices: meaning the feature should only be enabled for a small

percent of tasks or users at the beginning, making it easier to revert if

there are any issues. The system should then be constantly

measured. Especially in the first few weeks, close attention

should be paid to ensure no issues arise.

We then use the metrics and qualitative feedback gathered in the measurement

step to continuously improve the model. Models often need to

be retrained to account for new edge cases. As we maintain the production

model, we often encounter data distribution shifts or taxonomic changes which

must be addressed.


We hope this post provided some insight on how we design our ML systems and

how you and your teams can effectively partner with horizontal teams to deploy

models that serve your business needs. Throughout the phases of ML

development, we strive to ensure alignment with our cross-functional

stakeholders. While gaining and maintaining alignment takes time and

deliberate effort, it allows us to unlock capabilities on our platform that

wouldn’t otherwise be possible.

The future of your industry starts here.