Machine Learning as a Service: Part 1Sentiment analysis: 10 applications and 4 servicesSebastian KwiatkowskiBlockedUnblockFollowFollowingJun 9She loves me, she loves me not …Table of ContentsPart 1: Sentiment AnalysisPart 2: Speech SynthesisWhat is sentiment analysis?The explosive growth in user-generated content and the digitization of archive material have created massive data sets containing opinions expressed by large numbers of people on just about every single topic.In some cases, the generation of this data is structured through the user interface..It does not contain a standardized summary saying “This content expresses a positive, negative, mixed or neutral view.”WordPress.com, for example, reports that bloggers using their platform have published more than 87 million posts just in May of 2018. According to YouTube CEO Susan Wojcicki, more than 400 hours of content are uploaded to the video-sharing site every minute. Meanwhile, the Google Books project has digitized at least 25 million volumes in 400 languages.Whenever a user types into a free text field or speaks into a microphone, an inference is required to categorize the sentiment.Sentiment analysis is the field that focuses on exactly this task..It is a branch of natural language processing that studies functions designed to map a text document to a representation of a sentiment.With the advent of accurate speech and text recognition, the reach of sentiment analysis extends beyond readily accessible digital text data and covers an increasing number of media.What can I do with sentiment analysis?Sentiment analysis helps us understand the past, predict the future and take appropriate measures in the present.Suppose you had access to an analysis of the opinions expressed by your customers, competitors, students or other subjects of interest..What would you do with this knowledge?Here are ten ideas:Box office revenues: Asur & Huberman (2010) include a ratio of positive to negative sentiments in a model trained to predict box office revenues of movies in advance of their release.Brand monitoring: Ghiassi et al.(2013) describe a system designed to monitor tweets that express sentiments about brands and celebrities.Computational history: Acerbi (2013) generate a time series of positive and negative mood using an archive of books published in the 20th century.Customer feedback: Gamon (2005) explores sentiment analysis in the context of customer surveys and feedback provided in knowledge bases.Drop out rates: Wen et al..(2016) use sentiment mining to identify the basic shapes of emotional trajectories in Project Gutenberg’s fiction collection.Sentiment analysis as a sub-task: Pang & Lee (2008) mention the use of sentiment analysis as a component in higher-order systems..And what do I get in return?We’ve said that sentiment analysis takes a text document as input and returns a representation of a sentiment as output.There is little to say about the input..This is simply the text content of the book/comment/customer survey/email/news article/product review/tweet or other type of document that you would like to analyze.Now, let’s turn to the output.Binary sentiment analysisBinary sentiment analysis, the simplest case, asks the following question: “Is the opinion expressed in the text document positive or negative?”Here, the output is either a probability or a score.Let’s consider probabilities first.A high probability indicates that the given text is likely to express a positive opinion..For example, an output of 0.9 indicates a 90% probability that the expressed opinion is positive.Conversely, a low probability indicates that the given text is likely to be an expression of a negative view..For example, an output of 0.1 indicates a 10% probability that the opinion is positive or, put differently, a 90% probability of a negative opinion.Alternatively, the prediction of the sentiment can be expressed as a score..The three probabilities could be ordered as follows: negative probability, neutral probability and positive probability.The prediction for a review that focuses on technical details could, for example, have a distribution similar to this one: [ 0.1, 0.85, 0.05 ].Now, suppose that a customer publishes a mixed review, listing both positive and negative aspects of a product..For requests above the 50 million mark, the price is set to $0.025.Given a credentials provider, a text and a language code, a prediction of the sentiment can be requested as follows:Sentiment analysis with Amazon ComprehendThe API supports batch requests with up to 25 documents (with, at most, 5,000 characters) and generates a probability distribution over four classes: negative, mixed, neutral and positive.Unsurprisingly, Comprehend achieved the best performance on the 1,000 Amazon product reviews..Combined with accuracy rates of close to 90% on the other two data sets, this makes Amazon’s API the runner-up in the benchmark.Google Cloud Natural Language APIGoogle’s Cloud Natural Language API supports nine languages and generates two sentiment analysis values: score and magnitude.The score of a document’s sentiment indicates the overall emotion of a document.The magnitude indicates how much emotional content is present within a document and is often proportional to the length of the document.Documents that express few emotions or mixed emotions have a neutral score around 0.0..Performance-wise, the Cloud Natural Language API is the clear winner of our competition.Microsoft Text Analytics APIMicrosoft’s sentiment analyzer performs binary classification and, consequently, assigns a probability to every document.. More details
- 7 Data Trends for 2020 (and one non-trend)
- What are Autoencoders? Learn How to Enhance a Blurred Image using an Autoencoder!
- Introducing Databricks Ingest: Easy and Efficient Data Ingestion from Different Sources into Delta Lake
- New Data Ingestion Network for Databricks: The Partner Ecosystem for Applications, Database, and Big Data Integrations into Delta Lake