LSTM-based Handwriting Recognition by Google

This story will discuss about Fast Multi-language LSTM-based Online Handwriting Recognition (Carbune et al.

, 2019) and the following are will be covered:DataArchitectureExperimentDataCarbune et al.

leverage both open and close dataset to validate the model.

As usual, IAM-OnDB dataset is used to train a model.

During the experiment, they use two representations which are Raw Touch Points and Bézier Curves.

Raw Touch PointsData will be convert to 5-dimensional points which are x coordinate, y coordinate, timestamp of the touchpoint since the first touch, pen-up or pen-down and new stroke or not.

Some preprocessing a necessarySince the size of image can be various, normalization of x and y coordinates are necessary.

Surrogate are 20% larger than observed touch points if writing area is unknownEquidistant linear resampling along the strokes with value of 0.

05.

In other word, a line of length 1 will have 20 points.

Bézier CurvesAuthors also evaluate Bézier Curves whether it is better than Raw Touch Points.

From experiment, Bézier Curves demonstrates a better result.

Bézier Curves is a natural way to present trajectories in space.

Authors found that it is more promising that Raw Touch Points.

Authors use cubic polynomials of Bézier curves to calculate features.

Show “go” in bézier curve method (Google AI Blog)You may check out this story if you want to refresh the concept of Bézier Curves.

After calculating the curve, 10-dimensional vectors will be generated and feeding into neural network.

They are :Vector between the endpoints (red dots).

It is blue line in the figure.

Distances between the control points (green dots) and the endpoints.

They are green dashed lines in the figure.

Angels between each control point and end points.

That is green arcs in the figure.

Three time coefficients (Not shown in figure)A boolean indicator whether this is pen-up or pen-downBézier curve used to feed the network (Carbune et al.

, 2019)ArchitectureHandwriting challenge is a well defined problem and there are multiple approaches to deal with it.

Segment-and-decode classifiers is one of the example.

It splits word to sub-words and classifying it one by one.

Another line is using Hidden Markov Models (HMM).

It use a chain concept to recognize handwriting and returning string.

Carbune et al.

use Bézier Curves as a feature and feeding into Bidirectional LSTM to learn the feature and using softmax layer to get a probability distribution over all possible characters.

This is not the end of model, softmax layer generates a sequence of classes and Carbune et al.

use Connectionist Temporal Classification (CTC) to decode it and getting the finalized output.

ExperimentAuthors measure the character error rates (CER) between different model on IAM-OnDb.

For CER, lower is better.

You may notice that deeper LSTM is better and curves to outperform than raw touch points.

Comparison on IAM-OnDB for different input representation and layers (Carbune et al.

, 2019)Comparison among different models (Carbune et al.

, 2019)CER of Bézier Curves model (Carbune et al.

, 2019)Take AwayDifferent from previous study, this approach calculate input features (ie.

Bézier Curves) rather than using CNN or other neural network to learn the feature.

Bézier Curves seems quite make sense that it can represent the input without any loss of data.

About MeI am Data Scientist in Bay Area.

Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related.

Feel free to connect with me on LinkedIn or following me on Medium or Github.

I am offering short advise on machine learning problem or data science platform for small fee.

Extension ReadingMore about Bézier CurvesReferenceV.

Carbune, P.

Gonnet, T.

Deselaers, H.

A.

Rowley, A.

Daryin, M Calvo, L.

L.

Wang, D.

Keysers, S.

Feuz, P.

Gervais.

Fast Multi-language LSTM-based Online Handwriting Recognition.

2019.

.

. More details

Leave a Reply