Stemming? Lemmatization? What?

This allows it to do better resolutions (like resolving is and are to “be”).Another thing to note about lemmatization is that it’s often times harder to create a lemmatizer in a new language than it is a stemming algorithm..Because lemmatizers require a lot more knowledge about the structure of a language, it’s a much more intensive process than just trying to set up a heuristic stemming algorithm.Luckily, if you’re working in English, you can quickly use lemmatization through NLTK just like you do with stemming..To get the best results, however, you’ll have to feed the part of speech tags to the lemmatizer, otherwise it might not reduce all the words to the lemmas you desire..More than that, it’s based off of the WordNet database (which is kind of like a web of synonyms or a thesaurus) so if there isn’t a good link there, then you won’t get the right lemma anyways.Wrapping UpOne more thing before I wrap up here: If you choose to use either lemmatization or stemming in your NLP application, always be sure to test performance with that addition..In many applications, you may find that either ends up messing with performance in a bad way just as often as it helps boost performance..Both of these techniques are really designed with recall in mind, but precision tends to suffer as a result..But if recall is what you’re aiming for (like with a search engine) then maybe that’s alright!Also, this blog post mostly centers around the English language..Other languages, even if they seem somewhat related, have drastically different results with stemming and lemmatization..The general concepts remain the same, but the specific implementations will be drastically different..Hopefully this blog at least helps with the high-level if you’re planning on working with a different language entirely!If you enjoyed this post and are hungry for more NLP readings, why not check out another blog post I wrote about how word embeddings work and the different types you can encounter..Or, if you like sentences more, why not check out my summary of a paper that analyzed how different sentence embeddings affect downstream and linguistic tasks!Originally hosted at: hunterheidenreich.com. More details

Leave a Reply