Computer Vision for Dummies

Siddharth Das
4 min readFeb 12, 2020

--

Source: https://www.mdpi.com/2227-7080/9/1/2/htm#

Overview:

  • What is Computer Vision?
  • How is computer vision different from image processing?
  • Tasks in Computer Vision

What is Computer Vision?

Computer vision is a field of Artificial Intelligence that works on enabling computers to see, identify and process images in the same way that human vision does, and then provide appropriate output. In other words, it is imparting human intelligence and instincts to a computer.

How is computer vision different from image processing?

Image processing is the process of simplifying or enhancing an image, by changing brightness, contrast or anything. It is just a type of digital signal processing and is not concerned with understanding the content of an image.

While Computer Vision is only concerned about extracting some useful information from an image (or video). A given computer vision system may require image processing to be applied to raw input.

Tasks in Computer Vision

  1. Image Classification:

Categorizing the entire image into a class such as “people”, “animals”, “outdoors”.

Input: Colored or B/W images each having a single object only.

Output: A single class label.

Choice of Model:

Image Classification: The above image will be classified as a cat.

2. Object Localization:

Locate the presence of objects in an image and indicate their location with a bounding box.

Input: Images having one or more objects of same category/label.

Output: One or more bounding boxes defined by two points or one point, width, and height.

Choice of Model:

Object localization: Locate object(“cat”) in an image.

3. Object Detection:

Locate the presence of objects with a bounding box and predict types or classes of the located objects in an image. So, it’s like combining image classification and object localization.

Input: Image with one or more objects of one or more category/label.

Output: One or more bounding boxes (e.g. defined by a point, width, and height), and a class label for each bounding box.

Choice of Model:

Object Detection: Bounding box for each object in the image.

4. Segmentation:

Identifying parts of the image and understanding what object they belong to. It is again divided into the following categories:

4A. Semantic segmentation — classifies all the pixels of an image into meaningful classes of objects. These classes are “semantically interpretable” and correspond to real-world categories. For instance, you could isolate all the pixels associated with a cat and color them green. This is also known as dense prediction because it predicts the meaning of each pixel.

Semantic Segmentation: 3 colored regions correspond to 3 classes- background, cats, ground.

4B. Instance segmentation — identifies each instance of each object in an image. It differs from semantic segmentation in that it doesn’t categorize every pixel. If there are three cars in an image, semantic segmentation classifies all the cars as one instance, while instance segmentation identifies each individual car.

Instance Segmentation: Identity which pixels belong to which instance.

4C. Panoptic segmentation — In the panoptic segmentation task we need to classify all the pixels in the image as belonging to a class label, yet also identify what instance of that class they belong to. So, panoptic segmentation is a combination of instance and semantic segmentation.

Panoptic segmentation

5. Key Point detection:

Key point detection involves detecting people and localizing their key points simultaneously. Key points are spatial locations or points in the image that define what is interesting or what stands out in the image. They are invariant to image rotation, shrinkage, translation, distortion, and so on.

--

--

Siddharth Das

Research Associate @EVSTS | Ex Machine Learning Engineer @IntellibotRPA