Computer Vision Tutorial: A Step-by-Step Introduction to Image Segmentation Techniques (Part 1)

Both the images are using image segmentation to identify and locate the people present.

In image 1, every pixel belongs to a particular class (either background or person).

Also, all the pixels belonging to a particular class are represented by the same color (background as black and person as pink).

This is an example of semantic segmentation Image 2 has also assigned a particular class to each pixel of the image.

However, different objects of the same class have different colors (Person 1 as red, Person 2 as green, background as black, etc.

).

This is an example of instance segmentation Let me quickly summarize what we’ve learned.

If there are 5 people in an image, semantic segmentation will focus on classifying all the people as a single instance.

Instance segmentation, on the other hand.

will identify each of these people individually.

So far, we have delved into the theoretical concepts of image processing and segmentation.

Let’s mix things up a bit – we’ll combine learning concepts with implementing them in Python.

I strongly believe that’s the best way to learn and remember any topic.

  Region-based Segmentation One simple way to segment different objects could be to use their pixel values.

An important point to note – the pixel values will be different for the objects and the image’s background if there’s a sharp contrast between them.

In this case, we can set a threshold value.

The pixel values falling below or above that threshold can be classified accordingly (as an object or the background).

This technique is known as Threshold Segmentation.

If we want to divide the image into two regions (object and background), we define a single threshold value.

This is known as the global threshold.

  If we have multiple objects along with the background, we must define multiple thresholds.

These thresholds are collectively known as the local threshold.

Let’s implement what we’ve learned in this section.

Download this image and run the below code.

It will give you a better understanding of how thresholding works (you can use any image of your choice if you feel like experimenting!).

First, we’ll import the required libraries.

View the code on Gist.

Let’s read the downloaded image and plot it: View the code on Gist.

It is a three-channel image (RGB).

We need to convert it into grayscale so that we only have a single channel.

Doing this will also help us get a better understanding of how the algorithm works.

View the code on Gist.

Now, we want to apply a certain threshold to this image.

This threshold should separate the image into two parts – the foreground and the background.

Before we do that, let’s quickly check the shape of this image: gray.

shape (192, 263) The height and width of the image is 192 and 263 respectively.

We will take the mean of the pixel values and use that as a threshold.

If the pixel value is more than our threshold, we can say that it belongs to an object.

If the pixel value is less than the threshold, it will be treated as the background.

Let’s code this: View the code on Gist.

  Nice!.The darker region (black) represents the background and the brighter (white) region is the foreground.

We can define multiple thresholds as well to detect multiple objects:   View the code on Gist.

There are four different segments in the above image.

You can set different threshold values and check how the segments are made.

Some of the advantages of this method are: Calculations are simpler Fast operation speed When the object and background have high contrast, this method performs really well But there are some limitations to this approach.

When we don’t have significant grayscale difference, or there is an overlap of the grayscale pixel values, it becomes very difficult to get accurate segments.

  Edge Detection Segmentation What divides two objects in an image?.There is always an edge between two adjacent regions with different grayscale values (pixel values).

The edges can be considered as the discontinuous local features of an image.

We can make use of this discontinuity to detect edges and hence define a boundary of the object.

This helps us in detecting the shapes of multiple objects present in a given image.

Now the question is how can we detect these edges?.This is where we can make use of filters and convolutions.

Refer to this article if you need to learn about these concepts.

The below visual will help you understand how a filter colvolves over an image : Here’s the step-by-step process of how this works: Take the weight matrix Put it on top of the image Perform element-wise multiplication and get the output Move the weight matrix as per the stride chosen Convolve until all the pixels of the input are used The values of the weight matrix define the output of the convolution.

My advice – it helps to extract features from the input.

Researchers have found that choosing some specific values for these weight matrices helps us to detect horizontal or vertical edges (or even the combination of horizontal and vertical edges).

One such weight matrix is the sobel operator.

It is typically used to detect edges.

The sobel operator has two weight matrices – one for detecting horizontal edges and the other for detecting vertical edges.

Let me show how these operators look and we will then implement them in Python.

Sobel filter (horizontal) = 1 2 1 0 0 0 -1 -2 -1   Sobel filter (vertical) = -1 0 1 -2 0 2 -1 0 1   Edge detection works by convolving these filters over the given image.

Let’s visualize them on this article.

View the code on Gist.

It should be fairly simple for us to understand how the edges are detected in this image.

Let’s convert it into grayscale and define the sobel filter (both horizontal and vertical) that will be convolved over this image: View the code on Gist.

Now, convolve this filter over the image using the convolve function of the ndimage package from scipy.

View the code on Gist.

Let’s plot these results: View the code on Gist.

View the code on Gist.

Here, we are able to identify the horizontal as well as the vertical edges.

There is one more type of filter that can detect both horizontal and vertical edges at the same time.

This is called the laplace operator: 1 1 1 1 -8 1 1 1 1   Let’s define this filter in Python and convolve it on the same image: View the code on Gist.

Next, convolve the filter and print the output: View the code on Gist.

Here, we can see that our method has detected both horizontal as well as vertical edges.

I encourage you to try it on different images and share your results with me.

Remember, the best way to learn is by practicing!.  Image Segmentation based on Clustering This idea might have come to you while reading about image segmentation.

Can’t we use clustering techniques to divide images into segments?.We certainly can!.In this section, we’ll get an an intuition of what clustering is (it’s always good to revise certain concepts!) and how we can use of it to segment images.

Clustering is the task of dividing the population (data points) into a number of groups, such that data points in the same groups are more similar to other data points in that same group than those in other groups.

These groups are known as clusters.

One of the most commonly used clustering algorithms is k-means.

Here, the k represents the number of clusters (not to be confused with k-nearest neighbor).

Let’s understand how k-means works: First, randomly select k initial clusters Randomly assign each data point to any one of the k clusters Calculate the centers of these clusters Calculate the distance of all the points from the center of each cluster Depending on this distance, the points are reassigned to the nearest cluster Calculate the center of the newly formed clusters Finally, repeat steps (4), (5) and (6) until either the center of the clusters does not change or we reach the set number of iterations The key advantage of using k-means algorithm is that it is simple and easy to understand.

We are assigning the points to the clusters which are closest to them.

Let’s put our learning to the test and check how well k-means segments the objects in an image.

We will be using this image, so download it, read it and and check its dimensions: View the code on Gist.

It’s a 3-dimensional image of shape (192, 263, 3).

For clustering the image using k-means, we first need to convert it into a 2-dimensional array whose shape will be (length*width, channels).

In our example, this will be (192*263, 3).

View the code on Gist.

(50496, 3) We can see that the image has been converted to a 2-dimensional array.

Next, fit the k-means algorithm on this reshaped array and obtain the clusters.

The cluster_centers_ function of k-means will return the cluster centers and labels_ function will give us the label for each pixel (it will tell us which pixel of the image belongs to which cluster).

View the code on Gist.

I have chosen 5 clusters for this article but you can play around with this number and check the results.

Now, let’s bring back the clusters to their original shape, i.

e.

3-dimensional image, and plot the results.

View the code on Gist.

Amazing, isn’t it?.We are able to segment the image pretty well using just 5 clusters.

I’m sure you’ll be able to improve the segmentation by increasing the number of clusters.

k-means works really well when we have a small dataset.

It can segment the objects in the image and give impressive results.

But the algorithm hits a roadblock when applied on a large dataset (more number of images).

It looks at all the samples at every iteration, so the time taken is too high.

Hence, it’s also too expensive to implement.

And since k-means is a distance-based algorithm, it is only applicable to convex datasets and is not suitable for clustering non-convex clusters.

Finally, let’s look at a simple, flexible and general approach for image segmentation.

  Mask R-CNN Data scientists and researchers at Facebook AI Research (FAIR) pioneered a deep learning architecture, called Mask R-CNN, that can create a pixel-wise mask for each object in an image.

This is a really cool concept so follow along closely!.Mask R-CNN is an extension of the popular Faster R-CNN object detection architecture.

Mask R-CNN adds a branch to the already existing Faster R-CNN outputs.

 The Faster R-CNN method generates two things for each object in the image: Its class The bounding box coordinates Mask R-CNN adds a third branch to this which outputs the object mask as well.

Take a look at the below image to get an intuition of how Mask R-CNN works on the inside: Source: arxiv.

org We take an image as input and pass it to the ConvNet, which returns the feature map for that image Region proposal network (RPN) is applied on these feature maps.

This returns the object proposals along with their objectness score A RoI pooling layer is applied on these proposals to bring down all the proposals to the same size Finally, the proposals are passed to a fully connected layer to classify and output the bounding boxes for objects.

It also returns the mask for each proposal Mask R-CNN is the current state-of-the-art for image segmentation and runs at 5 fps.

  Summary of Image Segmentation Techniques I have summarized the different image segmentation algorithms in the below table.

I suggest keeping this handy next time you’re working on an image segmentation challenge or problem!.Algorithm Description Advantages Limitations Region-Based Segmentation Separates the objects into different regions based on some threshold value(s).

a.

Simple calculations b.

Fast operation speed c.

When the object and background have high contrast, this method performs really well When there is no significant grayscale difference or an overlap of the grayscale pixel values, it becomes very difficult to get accurate segments.

Edge Detection Segmentation Makes use of discontinuous local features of an image to detect edges and hence define a boundary of the object.

It is good for images having better contrast between objects.

Not suitable when there are too many edges in the image and if there is less contrast between objects.

Segmentation based on Clustering Divides the pixels of the image into homogeneous clusters.

Works really well on small datasets and generates excellent clusters.

a.

Computation time is too large and expensive.

b.

k-means is a distance-based algorithm.

It is not suitable for clustering non-convex clusters.

Mask R-CNN Gives three outputs for each object in the image: its class, bounding box coordinates, and object mask a.

Simple, flexible and general approach b.

It is also the current state-of-the-art for image segmentation High training time   End Notes This article is just the beginning of our journey to learn all about image segmentation.

In the next article of this series, we will deep dive into the implementation of Mask R-CNN.

So stay tuned!.I have found image segmentation quite a useful function in my deep learning career.

The level of granularity I get from these techniques is astounding.

It always amazes me how much detail we are able to extract with a few lines of code.

I’ve mentioned a couple of useful resources below to help you out in your computer vision journey: Computer Vision using Deep Learning 2.

0 Course Certified Program: Computer Vision for Beginners I always appreciate any feedback or suggestions on my articles, so please feel free to connect with me in the comments section below.

You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window) Related Articles (adsbygoogle = window.

adsbygoogle || []).

push({});.

. More details

Leave a Reply