Text Mining 101: A Stepwise Introduction to Topic Modeling using Latent Semantic Analysis (using Python)

Source: confessionsofabookgeek.com Have a look at the below text snippet: As you might gather from the highlighted text, there are three topics (or concepts) – Topic 1, Topic 2, and Topic 3..A good topic model will identify similar words and put them under one group or topic..In this article, we will learn about a text mining approach called Topic Modeling..We have a paid NLP course as well with a dedicated module for Topic Modeling..Overview of Latent Semantic Analysis (LSA) Implementation of LSA in Python Data Reading and Inspection Data Preprocessing Document-Term Matrix Topic Modeling Topics Visualization Pros and Cons of LSA Other Techniques for Topic Modeling What is a topic model?.A Topic Model can be defined as an unsupervised technique to discover topics across various text documents..For the time being, let’s understand a topic model as a black box, as illustrated in the below figure: This black box (topic model) forms clusters of similar and related words which are called topics..But what happens when there’s an impossible number of these digital text documents?.Source: topix.io/tutorial/tutorial.html Topic modeling helps in exploring large amounts of text data, finding clusters of words, similarity between documents, and discovering abstract topics..  Steps involved in the implementation of LSA Let’s say we have m number of text documents with n number of total unique terms (words)..We wish to extract k topics from all the text data in the documents..Then, we will reduce the dimensions of the above matrix to k (no. of desired topics) dimensions, using singular-value decomposition (SVD)..Vector representation for the terms in our data can be found in the matrix Vk (term-topic matrix).. More details

Leave a Reply