So, we can just add the values of alpha to our scores.

And finally, we take one more logsumexp operation at the end to return the final values of reaching the end of the sentence — so we are still in log-space.

Finding the best sequence of labelsNow that we computed the partition function, we have almost everything done.

If we compute the backward algorithm as well — which is just traversing the sequence backwards, we could find the label that maximizes P(yk | X) at each timestep k.

Interestingly, this would be the optimal solution if we assume the CRF is the true distribution.

And it can be formulated like this:Where the α scores come from the forward algorithm and the β scores come from the backward algorithm.

And in order to find the best sequence of label y* we take the argmax at each timestep:Viterbi algorithmBut, turns out we don’t need to compute the backward algorithm in order to find the most probable sequence of labels.

Instead we can simply keep track of the maximum scores at each timestep during the forward algorithm.

And after finishing it, we can follow the backward trace of the max operations (argmax) in order to decode the sequence that maximizes the scores.

This is exactly what the code bellow does:This algorithm is known as Viterbi algorithm.

It is almost the same as the forward-algorithm we used in the log_partition function, but instead of having regular scores for the whole sequence, we have maximum scores and the tags which maximize these scores.

In other words, we replaced the torch.

logsumexp operation by torch.

max, which returns the max and the argmax together.

Everything we need to do now is pick these final tags and follow the backward trace to find the whole sequence of “argmax” tags.

The continuation of the above code goes as follows:See that for each sample we are iterating over the backtrace of that sample, and in each timestep we are inserting the tag which maximizes the scores at the beginning of best_path.

So, at the end, we have a list where the first element corresponds to the first tag and the last element corresponds to the last tag valid tag (see line 15) of the sequence.

So, we compute the find_best_path operation for all samples in the batch and bingo!.We finished.

Putting all togetherThe full code is available here: https://github.

com/mtreviso/linear-chain-crf.

Take a look at main.

py and bilstm_crf.

pyto see the CRF in practice!ConclusionWell, this post got longer then I intended ????.

Feel free to make this code more efficient and leave a comment to show us how you managed to do it ????.

Finally, I think it is worth to mention that If you’d like to use a CRF model in production, I strongly suggest you to use a well-tested and efficient implementation, like this great pytorch package, or the one provided by the allennlp library.

In the next post we’ll see how we can vectorize our for-loops to calculate the partition function and the Viterbi decode.

More infoCRFs are only one among many sequential models available.

I highly recommend you to take a look at this Noah Smith presentation on sequential models or at this lecture by André Martins to see some visual examples of the algorithms presented in this post.

Or even better, you can attend the next Lisbon Machine Learning Summer School, which will cover sequential models and many more interesting topics about Machine Learning and Natural Language Processing!ReferencesHugo Larochelle’s lectures on CRFs: http://www.

dmi.

usherb.

ca/~larocheh/neural_networks/content.

htmlPytorch tutorial — Advanced: Making Dynamic Decisions and the Bi-LSTM CRF: https://pytorch.

org/tutorials/beginner/nlp/advanced_tutorial.

htmlMichael Collins’s notes on Log-Linear Models, MEMMs, and CRFs: http://www.

cs.

columbia.

edu/%7Emcollins/crf.

pdfMichael Collins’s notes on the Forward-Backward Algorithm: http://www.

cs.

columbia.

edu/~mcollins/fb.

pdfTutorial by Sutton and McCallum — An Introduction to Conditional Random Fields: https://homepages.

inf.

ed.

ac.

uk/csutton/publications/crftutv2.

pdfEdwin Chen blog post: http://blog.

echen.

me/2012/01/03/introduction-to-conditional-random-fields/An Overview of Conditional Random Fields by Ravish Chawla: https://medium.

com/ml2vec/overview-of-conditional-random-fields-68a2a20fa541????.Special thanks to @flassantos31, @erickrfonseca and @thales.

bertaglia for reviewing this post.

.. More details