A Step-by-Step Must-Read NLP Guide to Learn ELMo for Extracting Features from Text

Take a moment to ponder the difference between these two.

The verb “read” in the first sentence is in the past tense.

And the same verb transforms into present tense in the second sentence.

This is a case of Polysemy wherein a word could have multiple meanings or senses.

Language is such a wonderfully complex thing.

Traditional word embeddings come up with the same vector for the word “read” in both the sentences.

Hence, the system would fail to distinguish between the polysemous words.

These word embeddings just cannot grasp the context in which the word was used.

ELMo word vectors successfully address this issue.

ELMo word representations take the entire input sentence into equation for calculating the word embeddings.

Hence, the term “read” would have different ELMo vectors under different context.

  Implementation: ELMo for Text Classification in Python And now the moment you have been waiting for – implementing ELMo in Python!.Let’s take this step-by-step.


Understanding the Problem Statement The first step towards dealing with any data science challenge is defining the problem statement.

It forms the base for our future actions.

For this article, we already have the problem statement in hand: Sentiment analysis remains one of the key problems that has seen extensive application of natural language processing (NLP).

This time around, given the tweets from customers about various tech firms who manufacture and sell mobiles, computers, laptops, etc.

, the task is to identify if the tweets have a negative sentiment towards such companies or products.

It is clearly a binary text classification task wherein we have to predict the sentiments from the extracted tweets.


About the Dataset Here’s a breakdown of the dataset we have: The train set contains 7,920 tweets The test set contains 1,953 tweets You can download the dataset from this page.

 Note that you will have to register or sign-in to do so.

Caution: Most profane and vulgar terms in the tweets have been replaced with “$&@*#”.

However, please note that the dataset might still contain text that could be considered profane, vulgar, or offensive.

Alright, let’s fire up our favorite Python IDE and get coding!.  3.

Import Libraries Import the libraries we’ll be using throughout our notebook: View the code on Gist.


Read and Inspect the Data # read data train = pd.


csv”) test = pd.


csv”) train.

shape, test.

shape Output: ((7920, 3), (1953, 2)) The train set has 7,920 tweets while the test set has only 1,953.

Now let’s check the class distribution in the train set: train[label].

value_counts(normalize = True) Output: 0    0.

744192 1    0.

255808 Name: label, dtype: float64 Here, 1 represents a negative tweet while 0 represents a non-negative tweet.

Let’s take a quick look at the first 5 rows in our train set: train.

head() We have three columns to work with.

The column ‘tweet’ is the independent variable while the column ‘label’ is the target variable.


Text Cleaning and Preprocessing We would have a clean and structured dataset to work with in an ideal world.

But things are not that simple in NLP (yet).

We need to spend a significant amount of time cleaning the data to make it ready for the model building stage.

Feature extraction from the text becomes easy and even the features contain more information.

You’ll see a meaningful improvement in your model’s performance the better your data quality becomes.

So let’s clean the text we’ve been given and explore it.

There seem to be quite a few URL links in the tweets.

They are not telling us much (if anything) about the sentiment of the tweet so let’s remove them.

View the code on Gist.

We have used Regular Expressions (or RegEx) to remove the URLs.

Note: You can learn more about Regex in this article.

We’ll go ahead and do some routine text cleaning now.

View the code on Gist.

I’d also like to normalize the text, aka, perform text normalization.

This helps in reducing a word to its base form.

For example, the base form of the words ‘produces’, ‘production’, and ‘producing’ is ‘product’.

It happens quite often that multiple forms of the same word are not really that important and we only need to know the base form of that word.

We will lemmatize (normalize) the text by leveraging the popular spaCy library.

View the code on Gist.

Lemmatize tweets in both the train and test sets: train[clean_tweet] = lemmatization(train[clean_tweet]) test[clean_tweet] = lemmatization(test[clean_tweet]) Let’s have a quick look at the original tweets vs our cleaned ones: train.

sample(10) Check out the above columns closely.

The tweets in the ‘clean_tweet’ column appear to be much more legible than the original tweets.

However, I feel there is still plenty of scope for cleaning the text.

I encourage you to explore the data as much as you can and find more insights or irregularities in the text.


Brief Intro to TensorFlow Hub Wait, what does TensorFlow have to do with our tutorial?.TensorFlow Hub is a library that enables transfer learning by allowing the use of many machine learning models for different tasks.

ELMo is one such example.

That’s why we will access ELMo via TensorFlow Hub in our implementation.

Before we do anything else though, we need to install TensorFlow Hub.

You must install or upgrade your TensorFlow package to at least 1.

7 to use TensorFlow Hub: $ pip install “tensorflow>=1.


0″ $ pip install tensorflow-hub 7.

Preparing ELMo Vectors We will now import the pretrained ELMo model.

A note of caution – the model is over 350 mb in size so it might take you a while to download this.

import tensorflow_hub as hub import tensorflow as tf elmo = hub.


dev/google/elmo/2″, trainable=True) I will first show you how we can get ELMo vectors for a sentence.

All you have to do is pass a list of string(s) in the object elmo.

View the code on Gist.

Output: TensorShape([Dimension(1), Dimension(8), Dimension(1024)]) The output is a 3 dimensional tensor of shape (1, 8, 1024): The first dimension of this tensor represents the number of training samples.

This is 1 in our case The second dimension represents the maximum length of the longest string in the input list of strings.

Since we have only 1 string in our input list, the size of the 2nd dimension is equal to the length of the string – 8 The third dimension is equal to the length of the ELMo vector Hence, every word in the input sentence has an ELMo vector of size 1024.

Let’s go ahead and extract ELMo vectors for the cleaned tweets in the train and test datasets.

However, to arrive at the vector representation of an entire tweet, we will take the mean of the ELMo vectors of constituent terms or tokens of the tweet.

Let’s define a function for doing this: View the code on Gist.

You might run out of computational resources (memory) if you use the above function to extract embeddings for the tweets in one go.

As a workaround, split both train and test set into batches of 100 samples each.

Then, pass these batches sequentially to the function elmo_vectors( ).

I will keep these batches in a list: list_train = [train[i:i+100] for i in range(0,train.

shape[0],100)] list_test = [test[i:i+100] for i in range(0,test.

shape[0],100)] Now, we will iterate through these batches and extract the ELMo vectors.

Let me warn you, this will take a long time.

# Extract ELMo embeddings elmo_train = [elmo_vectors(x[clean_tweet]) for x in list_train] elmo_test = [elmo_vectors(x[clean_tweet]) for x in list_test] Once we have all the vectors, we can concatenate them back to a single array: elmo_train_new = np.

concatenate(elmo_train, axis = 0) elmo_test_new = np.

concatenate(elmo_test, axis = 0) I would advice you to save these arrays as it took us a long time to get the ELMo vectors for them.

We will save them as pickle files: View the code on Gist.

Use the following code to load them back: View the code on Gist.


Model Building and Evaluation Let’s build our NLP model with ELMo!.We will use the ELMo vectors of the train dataset to build a classification model.

Then, we will use the model to make predictions on the test set.

But before all of that, split elmo_train_new into training and validation set to evaluate our model prior to the testing phase.

View the code on Gist.

Since our objective is to set a baseline score, we will build a simple logistic regression model using ELMo vectors as features: View the code on Gist.

Prediction time!.First, on the validation set: preds_valid = lreg.

predict(xvalid) We will evaluate our model by the F1 score metric since this is the official evaluation metric of the contest.

f1_score(yvalid, preds_valid) Output: 0.

789976 The F1 score on the validation set is pretty impressive.

Now let’s proceed and make predictions on the test set: # make predictions on test set preds_test = lreg.

predict(elmo_test_new) Prepare the submission file which we will upload on the contest page: View the code on Gist.

These predictions give us a score of 0.

875672 on the public leaderboard.

That is frankly pretty impressive given that we only did fairly basic text preprocessing and used a very simple model.

Imagine what the score could be with more advanced techniques.

Try them out on your end and let me know the results!.  What else we can do with ELMo?.We just saw first hand how effective ELMo can be for text classification.

If coupled with a more sophisticated model, it would surely give an even better performance.

The application of ELMo is not limited just to the task of text classification.

You can use it whenever you have to vectorize text data.

Below are a few more NLP tasks where we can utilize ELMo: Machine Translation Language Modeling Text Summarization Named Entity Recognition Question-Answering Systems   End Notes ELMo is undoubtedly a significant progress in NLP and is here to stay.

Given the sheer pace at which research in NLP is progressing, other new state-of-the-art word embeddings have also emerged in the last few months, like Google BERT and Falando’s Flair.

Exciting times ahead for NLP practitioners!.I strongly encourage you to use ELMo on other datasets and experience the performance boost yourself.

If you have any questions or want to share your experience with me and the community, please do so in the comments section below.

You should also check out the below NLP related resources if you’re starting out in this field: Natural Language Processing (NLP) course Certified Program: Natural Language Processing (NLP) for Beginners You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window) Related Articles (adsbygoogle = window.

adsbygoogle || []).


. More details

Leave a Reply