Identify Top Topics using Word Cloud

before the command and it’ll work like it is in a command line.

I am using it to get the wordcloud package.

!pip install wordcloudI now have all the libraries that I need so I import all of them.

We get the libraries numpy, pandas, matplotlib, collections to use Counter and wordcloud to create our Word Cloud.

Working with datasetTo begin with, I first import the dataset file into a pandas DataFrame.

Note that the encoding of this file for proper reading is latin-1.

Then, I output the column names to identify which one matches with the headings.

We can see that there are 6 columns: author, date, headlines, read_more, text and ctext.

However, in this project I will be working with headlines.

So, I convert all the headlines to lower case using lower() method and combine them into a variable all_headlines.

Word CloudNow, we’re ready to create our Word Cloud.

After doing one round of analysis, I identified one of the top words being will.

However, it does not provide any useful information on the topic.

Thus, I included it in the set of stopwords so that it is not considered while identifying the top words from the headings.

I then call the WordCloud method using these stopwords, keep the background of the output image as white and set maximum words to be 1000.

The image is saved as wordcloud.

I use rcParams to define the size of the figure and set the axis as off.

I then use imshow to display the image and show to show it.

From the image, we can clearly see the top two topics as India and Delhi.

One can clearly see how useful a word cloud is to identify the top words in a collection of text.

We can even verify the top words using the bar charts.

I first get filtered_words by splitting all words from the combined headings while avoiding the stopwords.

Then, I used Counter to count the frequency of each word.

I then extract the top 10 words and their count.

Next, I plot the data and label the axis and define a title for the chart.

I used barh to display a horizontal bar chart.

This also is in alignment with the results from the Word Cloud.

Moreover, as Delhi has a higher count, it is bolder and bigger than India in the Word Cloud.

ConclusionIn this article, I discussed about what Word Clouds are, their potential application areas and a project that I worked on to understand them.

As always, please feel free to share your views and opinions.

.. More details

Leave a Reply