Labeling Unstructured Text for Meaning to Achieve Predictive Lift

They often try to use standard NLP techniques (such as tokenization, stemming, parts-of-speech tagging, parsing, named entity recognition, etc.). But these almost always prove inadequate when trying to build a high-quality model.Label the text for meaningHowever, all is not lost. The key is adding structure and meaning to the raw data, and by doing so, enable the machine to understand the text and thus begin to learn. By labeling the text for meaning you are providing a map for the machine to navigate the text.We recommend several advance NLP techniques, including:Labeling nouns and noun phrasesMuch of the meaning in text is stored in nouns and noun phrases. Unfortunately, machines don’t know even the simplest of things such as – lions and zebras are animals living in Africa and that one eats the other. However, the machine can learn these things via the creation of a semantic ontology of entities.The purpose of the semantic ontology is to: a) encode commonalities between concepts in a specific domain (e.g. both “yellow fever” and “malaria” are “diseases spread by mosquitoes”), and b) to encode how words relate to concepts, which may vary depending upon the context (e.g..The entity ontology is a semantically unambiguous dictionary that allows the text to be labeled for meaning and thus enable the machine to learn with more accuracy than simply processing the ambiguous raw text.A key problem with building ontologies is that to date most have been created by humans, which is both costly and time-consuming..Advanced NLP systems can now create a semantic entity ontology in an automated way, simplifying, speeding and reducing the cost of this critical step.A few notes about data labeling nouns and noun phrases.OverfitTake full-text medical records involving heart disease at a specific hospital..By labeling the data using an entity ontology, we can prevent overfit and help the machine to understand the difference between doctors and the factors that can cause heart attacks.OntologiesSome systems create what is known as an orthogonal ontology where nouns and noun phrases are only placed in one concept..(For example, “hand,” see above, has multiple meanings and the machine needs to know which one.) Carefully consider the type of ontology being created and how it will be used before you invest heavily in its creation.While labeling the nouns and noun phrases for meaning is probably the most important step for improving your model, here are two other techniques to consider.Labeling adverbs and adjectives for sentimentWhen using sentiment analysis for AI, document-level sentiment is essentially useless..Without labeling for meaning the machine is very likely to misunderstand these sentences and many others, which will result in a poor model.NLP tools that enable full-text to be labeled for meaning will improve your model and provide you with predictive lift.. More details

Leave a Reply