Dress Segmentation with Autoencoder in Keras

Dress Segmentation with Autoencoder in KerasExtract dresses from photographsMarco CerlianiBlockedUnblockFollowFollowingJun 1Fashion Industry is a very profitable field for Artificial Intelligence.

There are a lot of areas where Data Scientist can develop interesting use cases and provide benefits.

I have already demostrated my insterest for this sector here, where I developed a solution for recomendation and tagging dresses from Zalando onlinestore.

In this post I try to go further developing a system that receives as input a raw images (taken from the web or made with smartphone) and try to extract dresses shown in it.

Keeping in mind that challanges of Segmentation are infamous for the extreme noise presents in the original images; we try to develop a strong solution with clever tricks (during preprocessing) that deal this aspect.

At the end you can also try to merge this solution with the previous one cited.

This permits you to develop a system for real time recomendation and tagging for dresses, through photographs you take while out and about.

THE DATASETRecently also a Kaggle competition was launched on Visual analysis and Segmentation of clothing.

It is a very interesting challenge but this is not for us.

My object is to extract dresses from photographs so this dataset is not adequate due to its redundancy and fine-grained attributes.

We need images which contain mostly dresses, so the best choiche was to build the data ourselves.

I collected from the web images containg people wearing woman dresses of varius types and in different scenarios.

The next step required to create masks: this is necessary in every task of object segmentation if we want to train a model which will be able to focus only on the points of really interest.

Below I report a sample of data at our disposal.

I collected the original images from internet and then I enjoy myself to cut them further, separating people from dresses.

Exemple of image segmentationWe operate this discrimination because we want to mark separation among background, skin and dress.

Backgrounds and skins are the most relevant sources of noise in this kind of problem, so we try to suppress them.

With these cuttings we are able to recreate our masks as shown below, this is made simply binarizing the image.

The skin is obtained as difference among person and dress.

Exemple of masksAs final step we merge all in a single image of three dimensions.

This picture decodes the relevant features of our original image which we are interested in.

Our purpose is to mantain separation among background, skin end dress: this result is perfect for our scope!Final maskWe iterated this process for every image in our dataset in order to have for every original image an associated mask of three dimensions.

THE MODELWe have all at our disposal to create our model.

The worflow we have in mind is very simple:We fit a model which receives as input a raw image and outputs a three dimensinal mask, i.


it is able to recreate from the original images the desired separation among skin/background and dress.

In this way, when a new raw image come in, we can separate it in three different parts: background, skin and dress.

We take into consideration only the channel of our interest (dress), use it to create a mask from the input image and cut it to recreate the original dress.

All this magic is possible due to the powerfull of UNet.

This deep convolutional Autoencoder is often used in task of segmentation like this.

It is easy to replicate in Keras and we train it to recreate pixel for pixel each channel of our desired mask.

Before to start training we decided to standarize all our original image with their RGB mean.

RESULTS AND PREDICTIONSWe notice that during prediction, when we encounter an image with high noise (in term of ambiguos background or skin) our model start to struggle.

This inconvenince can be exceeded simple increasing the number of training images.

But we also develop a clever shortcut to avoid this mistakes.

We make use of GrubCut Algorithm provided by OpenCV.

This algorithm was implemented to separate foreground from background making use of Gaussian Mixture Model.

This makes for us beacause it helps to point the person in foreground denoising all around.

Here the simple function we implement to make it possible.

We assume that the person of our interest stand in the middle of the image.

def cut(img): img = cv.

resize(img,(224,224)) mask = np.



uint8) bgdModel = np.


float64) fgdModel = np.


float64) height, width = img.

shape[:2] rect = (50,10,width-100,height-20) cv.

grabCut(img,mask,rect,bgdModel,fgdModel,5, cv.

GC_INIT_WITH_RECT) mask2 = np.


astype('uint8') img2 = img*mask2[:,:,np.

newaxis] img2[mask2 == 0] = (255, 255, 255) final = np.



uint8)*0 + img2 return mask, finalGrubCut in actionNow we apply UNet and are ready to see some results on new images!Input — GrubCut + Prediction —Final DressInput — GrubCut + Prediction — Final DressInput — GrubCut + Prediction — Final DressInput — GrubCut + Prediction — Final DressInput — GrubCut + Prediction — Final DressOur preprocess step, combined with UNet powers, are able to achive great performance.

SUMMARYIn this post we develop an end to end solution for Dress Segmentation.

To achive this purpose we make use of powerfull Autoencoder combined with clever preprocess techniques.

We plan this solution in order to use it in a realistic scenario with real photographs, with the possibility to build on it a visual recomendation system.

CHECK MY GITHUB REPOKeep in touch: Linkedin.

. More details

Leave a Reply