OCR for Scanned Numbers using Google’s AutoML Vision

OCR for Scanned Numbers using Google’s AutoML VisionHussein Moghnieh, Ph.


BlockedUnblockFollowFollowingMay 18IntroductionIn a previous blog post http://tiny.

cc/rcit6y, I detailed the steps I used to create an OCR for scanned numbers, such as the one shown in the figure below, by training a k-Nearest Neighbour (k-NN) machine learning model on a set of features extracted from images (numbers) using Histogram of Oriented Gradients (HOG).

Figure 1.

Scanned numbers to OCRTo recap, the steps to achieving the OCR and the corresponding effort involved in each task or stage can be summarized as:Image processing to find the bounding box around the numbers (25%)Extract digits from numbers and create a train/test set (10%)Identify and apply suitable feature extraction (Histogram of Oriented Gradients) (35%)Experiment and apply different Machine Learning algorithms (KNN vs SVM) (20%)Present the data in a meaningful way (10%)AutoML VisionIn this blog, I’ll detail the steps to create an OCR for the same problem using AutoML.

As shown in the figure below, AutoML eliminated a tedious task of my work which is finding and applying a suitable feature extraction and machine learning technique.

The model attained 98% accuracy which was slightly better than the solution presented in the previous blog.

Figure 2.

Stages replaced by AutoML1 — Image Processing: Train/Test Dataset PreparationRecapping from my previous blog, my train/test set included digits as shown below (example of 0, 2, 5, 6, and 8 datasets).

Those digits were extracted from scanned numbers shown in Figure 1.

Figure 3.

Splitting numbers into single digitsThe train/test dataset is then organized in folders where each folder’s name represents the label of the digits stored in that folder.

Figure 4.

Train / Test dataset sorted in folders2 — Train/Test Dataset UploadGoogle AutoML provides two options to upload the train/test dataset.

Option 1: Upload the digits and store them on Google cloud and provide a CSV file listing the URLs of each digit and its corresponding lablel.

Option 2 (the one used in my blog): Zip the folders containing the dataset and upload the zipped file.

Google ML will consider the folder name to be the label of the digits stored in each folder.

Figure 5.

Uploading train/test dataset to Google MLAutoML correctly identified the labels of each digit (figure 6) and provided basic dataset analysis (figure 7), and is ready to create an ML model (or models?) by clicking “Start Training”.

Figure 6.

Dataset uploaded and label identifiedFigure 7.

Basic dataset analysis4 — Model AccuracyIt took 15 minutes to train the model.

The ML model accuracy/precision and a confusion matrix are shown below.

Figure 7.

Model accuracy5- Confusion MatrixThe confusion matrix reads as follow:The model is able to predict digits (4,9,0,1,2 and 7) with 100% accuracy (based on the train/test dataset).

There is an 8.

7% chance that it will confuse 8 to be a 6.


8% chance that it will confuse a 5 to be a 3.


9% chance that it will confuse 3 to be a 5.

Figure 8.

Confusion matrix6 — PredictTo test the model, I simply uploaded the digit to run the prediction on or used the API provided by Google (figure 8).

Figure 8.

ML model is exposed as an API and hosted on Google CloudCorrect prediction: example 1Figure 9.

Accurate predictionCorrect prediction: example 2Figure 10.

Accurate predictionWrong prediction: example 3This is unexpected, the model confused a 6 to be an 8, although the confusion matrix did show that it can predict a 6 with 100% accuracy.

As a future work, I’d probably train a second or more Machine Learning models each is tailored for a pair of digits that are confusing to the first model (i.


6 and 8).

I’d probalby also provide partial images to train the models, for instance, provide only the upper half of the 6 and 8 digits since those digits mostly differ in their upper quadrants.

Figure 11.

Wrong predictionIt is interesting to see how Google ML can automate a tedious task of a data scientist’s work which is finding a suitable feature extraction technique and tuning a Machine Learning model.

However, I beleive that the hard part of a data scientist’s job is actually identifying the problems that can be solved using Machine Learning and most importantly being able to extract a dataset that is suitable to be solved using Machine Learning.

The latter requires solid programming skills and deep understanding of how different Machine Learning techniques work.


. More details

Leave a Reply