Deep Learning — German Traffic Sign dataset with Keras

Deep Learning — German Traffic Sign dataset with KerasNavin krishnakumarBlockedUnblockFollowFollowingJan 12Deep Learning course offered by New York Data Science Academy is great to get you started on your journey with deep learning and also encourages you to do a full fledged deep learning project.

I decided to do an image recognition challenge using the German Traffic sign data set.

I have never worked on image recognition before and hence this project was a great learning experience personally.

Problem Statement and Goal of the ProjectThe German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011.

German Traffic Sign BenchmarksJ.

Stallkamp, M.

Schlipsing, J.

Salmen, C.

Igel, Man vs.

computer: Benchmarking machine learning algorithms for traffic…benchmark.



deTraffic sign detection is a high relevance computer vision problem and is the basis for a lot of applications in industry such as Automotive etc.

Traffic signs can provide a wide range of variations between classes in terms of color, shape, and the presence of pictograms or text.

In this challenge, we will develop a deep learning algorithm that will train on German traffic sign images and then classify the unlabeled traffic signs.

The deep learning model will be built using Keras (high level API for tensorflow) and we will also understand various ways to preprocess images using OpenCV and also use a cloud GPU service provider.

We will be working with Keras for our algorithm building.

Keras was chosen as it is easy to learn and use.

Keras also seamlessly integrates well with TensorFlow.

After Tensorflow, Keras seems to be the framework that is widely used by the deep learning community.

The Entire code for the project could be found on my GitHub account.

Navkrish04/German-Traffic-Sign-ClassificationDeep Learning algorithm for Image recognition challenge – Navkrish04/German-Traffic-Sign-Classificationgithub.

comAlgorithmic ProcessSimilar to any machine learning model building process we will also be executing the same golden steps defined belowUnderstand the dataPreprocess the dataBuild the architecture of the modelTest the modelIterate the same process until you achieve the optimal resultsDeploy the model (Not considered for this exercise)Data UnderstandingThe Image dataset consists of 43 classes (Unique traffic sign images).

Training Set has 34799 Images , Test set has 12630 images and the validation set has 4410 images.

# Understand the dataprint("Training Set:", len(X_train))print("Test Set:", len(y_test))print("Validation Set:", len(X_valid))print("Image Dimensions:", np.

shape(X_train[1]))print("Number of classes:", len(np.

unique(y_train)))n_classes = len(np.

unique(y_train))Sample ImagesClass distributionCouple of inferences from the data that we will tackle during the preprocessing stagea) Class bias issue as some classes seem to be underrepresentedb) Image contrast seems to be low for lot of imagesEstablishing a score without any preprocessingIt’s always a good practice to understand where your model stands without doing any preprocessing as that would help you establish a score for your model, which you could improve upon each iteration.

The evaluation metric for our model would be “accuracy” score.

I had resource constraints and was running the tests model on my mac (8GB RAM) and hence used a simple“dense” or “fully” connected neural network architecture for baseline scores and other testing.

Dense Network Architecturemodel = Sequential()model.

add(Dense(128, activation='relu', input_shape=(32*32*3,)))model.


add(Dense(128, activation='relu'))model.




add(Dense(128, activation='relu'))model.




add(Dense(128, activation='relu'))model.


add(Dense(n_classes, activation='softmax'))The model = Sequential() statements loads the network.

The input shape is 32*32*3 (as images have 3 color channels) .

In Keras, there is no specific input layer command as the input shape is the implicit input layer.

The number of parameters on the first layer would be 393344 ((32*32*3*128) + 128)).

We can calculate the number of parameter for the other layers in the same fashion.

The Activation function is “relu”.

During hyperparameters optimization we can check with Tanh, Sigmoid and other activation function if they are better suited for the task.

For now we stick on to “relu”.

There are 4 hidden layers of 128 neurons with relu activation and after each hidden layer except the last one a dropout(50%) function is included.

The output layer has the softmax activation since we are dealing with multi class classification and there are 43 classes.

The model was able to achieve an accuracy score of 84% without any preprocessing.

Data PreprocessingNow that we have a score at hand, lets understand if preprocessing the images would lead to a better accuracy score and help our model.

Data Augmentation is used to increase the training set data.

Augmenting the data is basically creating more images from the available images but with slight alteration of the images.

We generally need data proportional to the parameters we feed the neural networks.

I found OpenCV to be excellent for image preprocessing.

Here’s the link to the general tutorials to use OpenCV with Python implementation.

Some of the techniques used in the process are Rotation, Translation, Bi lateral filtering, Grayscaling and Local Histogram Equilization.

Introduction to OpenCV-Python Tutorials – OpenCV-Python Tutorials 1 documentationAnd that will be a good task for freshers who begin to contribute to open source projects.

Just fork the OpenCV in…opencv-python-tutroals.


ioSlight Rotation of Images: I used 10 degrees rotation of images.

It would not make much sense to rotate images more than that as that might lead to wrong representations of the traffic signs.

Let’s view few images after slight rotation(not that noticeable in few images also)M_rot = cv2.

getRotationMatrix2D((cols/2,rows/2),10,1)Images after 10 degree rotationImage Translation: This is a technique by which you shift the location of the image.

In layman terms, if the image’s location is (x1,y1) position, after translation it is moved to (x2,y2) position.

As you can see from the below images, the location is slightly moved downwards.

Images after translationBilateral Filtering: Bilateral filtering is a noise reducing , edge preserving smoothening of images.

Gray Scaling: Gray scaling of images is done to reduce the information provided to the pixels and also reduces complexity.

def gray_scale(image): return cv2.

cvtColor(image, cv2.

COLOR_RGB2GRAY)Local Histogram Equalization: This is done to increase the contrast of the images as we had identified during “Data Understanding” that the images might need an increase in contrast.

def local_histo_equalize(image): kernel = morp.

disk(30) img_local = rank.

equalize(image, selem=kernel) return img_localHere are the images after all the preprocessing.

Images after preprocessingFixing Class Bias with Data augmentation: We are set to increase to the training set images with data augmentation, it would also make sense to address the class bias issue.

Hence during augmentation, all the classes were fed with 4000 images.

In the original dataset Class 2 had the maximum number of training images with 2010 records.

The number 4000 (Max class records * ~2)is an arbitrary number I took to make all classes have same number of records.

We can definitely play around this distribution further.

Here’s the code snippet that makes all the classes to have the same number of records as we need.

for i in range(0,classes): class_records = np.


size max_records = 4000 if class_records != max_records: ovr_sample = max_records – class_records samples = X_train[np.

where(y_train==i)[0]] X_aug = [] Y_aug = [i] * ovr_sample for x in range(ovr_sample): img = samples[x % class_records] trans_img = data_augment(img) X_aug.

append(trans_img) X_train_final = np.

concatenate((X_train_final, X_aug), axis=0) y_train_final = np.

concatenate((y_train_final, Y_aug)) Y_aug_1 = Y_aug_1 + Y_aug X_aug_1 = X_aug_1 + X_augClass distribution after fixing class biasModel Score after Data augmentation and after fixing class Bias:The same dense neural network architecture as one used above was able to better it’s accuracy score to 88.

2% after data preprocessing, which suggests to us that preprocessing of the images (Augmenting the data) was worth the effort.

Convolutional Neural NetworksThe next step in the model building journey would be to use a much sophisticated architecture to boost our model performance.

Research in the field of computer vision has established that Convolutional neural networks performs exceedingly better at image recognition challenges and hence should be the first choice.

Our goal from the project was to systematically build a deep learning model and understand how each step would affect the model performance.

Hence CNN was not used at the first place.

It’s also beyond the scope of the article to explain how CNN’s work.

Here’s an intuitive article on the same.

An intuitive guide to Convolutional Neural NetworksIn this article, we will explore Convolutional Neural Networks (CNNs) and, on a high level, go through how they are…medium.


orgConvolutional Neural Network ArchitectureHere’s the Convolutional neural network architecture for the modelmodel_conv = Sequential()## If You preprocessed with gray scaling and local histogram equivalization then input_shape = (32,32,1) else (32,32,3)model_conv.

add(Conv2D(32, kernel_size=(3, 3),activation='relu', input_shape=(32, 32, 1)))model_conv.

add(MaxPooling2D(pool_size=(2, 2)))model_conv.

add(Conv2D(128, kernel_size=(3, 3), activation='relu'))model_conv.

add(MaxPooling2D(pool_size=(2, 2)))model.


add(Conv2D(128, kernel_size=(3, 3), activation='relu'))model_conv.

add(MaxPooling2D(pool_size=(2, 2)))model.




add(Conv2D(128, kernel_size=(3, 3), activation='relu'))model_conv.

add(MaxPooling2D(pool_size=(2, 2)))model.





add(Dense(128, activation='relu'))model_conv.



add(Dense(n_classes, activation='softmax'))There are 4 convolutional layers + Max Pooling layers .

The kernel size for the convolutional layers is (3,3).

The Kernel refers to the filter size.

The general size used are (5,5) or (3,3).

One thing to note here is that the input shape is (32,32,1).

In the dense networks we had (32,32,3) as we had not done grayscaling.

Since we performed grayscaling on our images, the channels value is would become one.

A max pooling layer is added with a pool size of (2,2) along with Batch Normalization.

Max pooling layers is used to reduce the dimensionality which helps shorten the training time and also helps reduce overfitting.

Then there are also two fully connected layers before the output layer.

Note here that we need to flatten the output before this layer as the input expected is one dimensional vector.

Since this is a multiclass classification the solftmax activation is used.

CNN ArchitectureI ran the model on my computer for 100 epochs and it took 4 days to complete(was curious to know how long it runs).

The model score boosted to 97.


Kind of explains the hype around CNN’s.

Now it made more sense to either buy some GPU’s for faster processing or go to a cloud service provider to experiment different architectures.

I found FloydHub to be excellent in that regard.

Using Floydhub is extremely easy.

We just need to upload the dataset and import the Python code through GitHub or manually upload the code.

The entire code now runs in approximately 15 minutes and I can definitely test with different architectures going forward.

Way ForwardThis experience of building a deep learning model from scratch and also follow the process to build one was a great learning experience.

I am constantly learning new stuffs everyday in this journey and trying new improvements.

The next few steps to implement would beIdentify the best architecture along with the best hyperparameters.

Also to try AlexNet or VGGNet.

Use transfer learning.

. More details

Leave a Reply