Check-mark State Recognition will take NLP projects to the next level!

Check-mark State Recognition will take NLP projects to the next level!Check-mark state detection from scanned images and further analysis on them can add meaningful features to NLP projectsMataprasad AgrawalBlockedUnblockFollowFollowingJan 26There are numerous business transactions where paper documents are still used as primary data exchange.

For example,paper questionnaires or surveys having objective type of questions to answer or,insurance claim documentsinternational trade finance or bank documents, where carrier’s carry importer/exporter’s documents which are further digitized using scanners as images or PDF files.

The scanned images are either loaded as is or transformed using OCR (Optical Character Recognition) techniques to get the text content.

However, HTML elements like tables, check-boxes, logos, radio buttons, and handwritten texts are lost in OCR conversion.

Capturing these elements has some useful applications and there is a way to capture them.

There are many Optical Mark Recognition(OMR) tools, open source as well as paid which can extract such elements.

However, these tools require defining an area to capture in a form or template.

Of course, this is a design-time change i.


if the template or document structure changes (which is likely), then propagation of template change right from design stage to production can be very lengthy and frustrating.

We faced similar problem recently in our project and this post talks about the approach taken and the technical details.

“Check-mark State Recognition” from images using Deep LearningSome basics first,Documents stored as images — Many applications generate or store transaction documents or PDFs as imagesA check-mark field is an element on a machine-readable formUsually rectangular or rounded- rectangular in shape and often called a “check box”A mark is made (a check/tick, an X, a large dot, inking over, etc.

) — it need to be capturedOCR engines can capture characters/text but not special zones like checkbox, table etc.

Sample bank form having many checkbox elements of different typesCheck-mark states can be in any form but not limited toSolution Highlightsimage types supported as input: jpeg, tiffIt extracts check-marks in real time, using OCR clue words e.


“Collect interest” is a clue word for checkbox opposite to it in above pictureThe extraction works in two steps: detection and recognitionUses mix of OCR, Image Processing and Machine Learning techniquesjson configuration — to define the templates and list of checkmarks along with clue words and image zone pixels for each checkbox in each templateTemplate layout analysis, image pre-processing are pre-requisites to fill the configuration fileStates of the check-marks is captured in a classification model — Checked (1) , Unchecked (0), Noise/unable to read (-1)Delivers an accuracy rate of up to 94%*Here is a sample configuration file.

It lists a template for a customer having two check-boxes in a template along with clue words and pixels (width and height) to capture the image zone.

{"templates": [ { "templateID": "1", "templateName": "Customer_NAME_ABCD", "templateCode": "TEMPLATE_ABCD", "cb_size": [-100, 60], "keywords": [ { "keywordID": "1", "entity": "Cable advice for NP", "code": "CABLEADVICENP", "keyword": "Cable advice of non-payment", "clue_string": "Cable advice of non-payment", "pixels": [-75,-20] }, { "keywordID": "2", "entity": "Recover Charges", "code": "RECOVERYCHARGES", "keyword": "Recover Charges", "clue_string": "Recover Charges", "pixels": [-75,-20] },….

Solution DiagramSolutionDetailsCheck-mark extraction is performed in below steps,Step 1.

Model Building• Import Keras python library for deep learning and related python packagesfrom keras.

models import Sequentialfrom keras.

layers import Conv2Dfrom keras.

layers import MaxPooling2Dfrom keras.

layers import Flattenfrom keras.

layers import Densefrom keras.

callbacks import ModelCheckpointfrom keras.

callbacks import TensorBoardfrom keras.


image import ImageDataGeneratorimport osimport fnmatchimport numpy as npimport timefrom keras.

preprocessing import imagefrom keras.

models import load_modelimport matplotlib.

pyplot as plt• Define the model — create sequence and add layers# Initializing the CNNclassifier = Sequential()# Step 1 — Convolutionclassifier.

add(Conv2D(32, (3, 3), input_shape = (60, 100, 3), activation = ‘relu’))# Step 2 — Poolingclassifier.

add(MaxPooling2D(pool_size = (2, 2)))# Adding a second convolutional layerclassifier.

add(Conv2D(32, (3, 3), activation = ‘relu’))classifier.

add(MaxPooling2D(pool_size = (2, 2)))# Step 3 — Flatteningclassifier.

add(Flatten())# Step 4 — Full connectionclassifier.

add(Dense(units = 128, activation = ‘relu’))classifier.

add(Dense(units = 3, activation = ‘softmax’))• Compile the model — Specify loss functions and optimizer(s)# Compiling the CNNclassifier.

compile(optimizer = ‘adam’, loss = ‘categorical_crossentropy’, metrics = [‘accuracy’])modelcheckpoint = ModelCheckpoint(‘.


hdf5’, monitor=’val_acc’, verbose=0, save_best_only=True, save_weights_only=False, mode=’auto’, period=1)tbcheckpoint = TensorBoard(log_dir=’.

/logs’, histogram_freq=0, batch_size=32, write_graph=True, write_grads=False, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)• Fit the model — Execute the model using image data# Part 2 — Execute the model using image datatrain_datagen = ImageDataGenerator(rescale = 1.

/255, shear_range = 0.

2, zoom_range = 0.

2, horizontal_flip = True)test_datagen = ImageDataGenerator(rescale = 1.

/255)training_set = train_datagen.

flow_from_directory(‘cbimagestrn’, target_size = (60, 100), batch_size = 32, class_mode = categorical’)test_set = test_datagen.

flow_from_directory(‘cbimagestst’, target_size = (60, 100), batch_size = 32, class_mode = 'categorical')# Part 3 — Model trainingclassifier.

fit_generator(training_set, steps_per_epoch = 1000,epochs = 50, validation_data = test_set, validation_steps = 500,callbacks=[modelcheckpoint, tbcheckpoint])…• Make predictions — Use the model to generate predictions on the new or test data# Part 4 — Model accuracy testing (Making new predictions)# use test data / images to predict [obvious and trivial code is eliminated]…result = model.

predict_proba(test_image_arr)…# from the result, we can know how many files are mis-classified or notStep 2.

Detection• The system detects rectangular regions that could potentially contain check-marks.

• In this approach, clue words from OCR are used to locate the pixel co-ordinates to identify the block/region to capture• If there is a change in existing template(s) or new templates are added to the system, the only change required is in the config fileStep 3.

Recognition and Classification• Once the regions had been detected, the image is cropped for that region to recognize it as a check-mark object.

• A Machine Learning model is used to classify the cropped image as, Checked, Unchecked, NoiseImage Processing = CNN from Tensorflow, Classifier: optimizer = ‘adam’, loss = ‘categorical_crossentropy’*on a baseline data setConclusionApplications using forms or template structures where OCR techniques are used to capture the text can benefit from capturing the check-mark states to enrich the data and reduce the analysis time.

If the velocity with which the forms change in any business transactions is moderate to high, then a blend of OCR and Deep Learning techniques, described above, can help you become more agile.

Thanks for reading.

Any comments and suggestion — please do share.

References:[1] Keras Documentation.

. More details

Leave a Reply