AI, ML and Computer Vision

Understanding the Problem

Computer vision is a broad area of research, concerned with the problem of getting computers to “see.”

Computer vision has started to become quite powerful in recent years due to advances in artificial intelligence, or AI.

AI is a catch-all term for getting computers to do smart things. The more technically correct phrase in this case would be machine learning, and specifically deep learning - a set of techniques within machine learning.

Leveraging deep learning for computer vision will be key to powering self-driving cars, robotics, and much more. Modern computer vision systems are heavily reliant on deep learning.

Getting Started

How do I develop a computer vision system?

Developing a computer vision system can be accomplished in 4 steps:

Collect a reasonable amount of representative data.

For images, this is usually on the order of 10k-50k images to train an initial model. This can go up to millions or even hundreds of millions of images, depending on how robust you want your computer vision system to be.

As a rule of thumb, the sources of data for the actual use case and training data should generally be similar. In technical terms, you want your live test set distribution to match your training set distribution. For instance, if your camera is a fish eye camera, your computer vision models would generally find fish eye images more useful than non-fish eye images. Another example: if you're distinguishing cats from dogs, you'd want many examples of the kinds of dogs and cats you'll see later on, and not just one kind of cat or dog.

different square shapes

Turn collected data into training data: label, or annotate, your collected data accurately.

Deep learning systems essentially learn to imitate their training datasets: so labeling collected data in the way you want your code to consume the output of the computer vision model is important. For example:

  • If you'd like your computer vision systems to draw boxes around objects in an image, then you need to label boxes in the images you've collected.
  • If you'd like it to simply recognize cats and dogs, then you need to categorize your images as cats and dogs.
  • If you'd like to know what every individual pixel corresponds to, then you'll need to label every individual pixel - this is called semantic segmentation.
cuboid shape

Train your model on the annotated data.

If you haven't done this before, you can get surprisingly good results by following online tutorials. Here is a starter tutorial we like from Google Brain which does categorization:

Visit Google Brain
tags

Find gaps in your system, and address them. Repeat.

If you're doing a simple categorization task, your system might be good enough to use after the first time you train it. Unfortunately, in many cases, this won't be true - and this is where you'll need an expert or a lot of patience.

How do you understand why your system is behaving a certain way, and what you should do to improve it? Should you label more data? Should you label more data of a specific edge case? Should you fix mistakes on some of your labelled data? Should you change how you're training your model? Should you change your model? This is the crux of a machine learning engineer's job.

If the project will be worked on for more than a few weeks, it's strongly recommended to come up with an objective metric you can use to track progress. This way, you can see which of your changes have the most impact.

different square shapes
Our Products

Where does Scale fit in?

Built by engineers who understand the difficulty of building a diverse training dataset, Scale's suite of annotation products accelerates the data labeling step to accelerate the development of computer vision systems.