Perfume Recommendations using Natural Language Processing

Perfume Recommendations using Natural Language ProcessingDoc2Vec, Latent Semantic Analysis, and Sentiment Analysis come together to make relevant recommendations in a chatbot interface.

Claire LongoBlockedUnblockFollowFollowingFeb 6Photo by Jessica Weiller on UnsplashIntroductionNatural Language Processing(NLP) has many intriguing applications to Recommender Systems and Information Retrieval.

As a perfume lover and a Data Scientist, the unusual and highly descriptive language used in the niche perfume community inspired me to use NLP to create a model to help me discover perfumes I might want to purchase.

“Niche” perfumes are rare perfumes created by small, boutique perfume houses.

Similar to wine, there is a whole subculture surrounding niche perfumes that comes with its own poetic vocabulary perfect for NLP!There are two things I wanted this model to do:I want to be able to describe a perfume and get relevant recommendations based on my description.

Because of the modeling approach used, and because the language of perfume is so rich, this model can recommend perfumes that match a description of a mood, a feeling, a personality, or an event like a vacationTake sentiment into account.

I want to be able to describe what I don’t like as well as what I do like, and still receive relevant recommendations.

DataI wrote a python script to scrape data from a popular niche perfume website.

They didn’t seem to mind.

 ;-)I scraped three sources of text data and concatenated these into one document per perfume:DescriptionReviewsList of Notes in the PerfumeHere is an example of all three text data sources for my personal favorite perfume, Delma.

Some Fun Results!I created a chatbot interface in a python notebook using a model that ensembles Doc2Vec and Latent Semantic Analysis(LSA).

The Doc2Vec and LSA represent the perfumes and the text query in latent space, and cosine similarity is then used to match the perfumes to the text query.

The top five most relevant perfumes are returned as recommendations.

Here is an example interaction with the chatbot.

A simple query for a Christmas scent returns five seasonally-appropriate perfumes.

The first perfume for this request has the very on-topic note of myrrh!Christmas perfume recommendationsHere are a few more fun examples:The query “Looking for my signature beach scent.

I plan to wear it to the beach or pool” returns perfumes with notes of seasalt, coconut, and seaweed.

My own beach scent, Eau de Soleil Blanc by Tom Ford, appears second on the list!.I can attest that this perfume does smell like a day at the beach on a tropical vacation!The query “I’m looking for a perfume for a vacation in Italy to a small Italian island.

” returns perfumes with Sicilian orange and lemon, and a perfume from a perfume house Carthusia called capri.

Why is sentiment so important?Consider this chatbot message.

“I like peaches and pears.

Boozy vanilla sweet smelling gourmands.

”Notice the fourth recommended perfume has notes of coconut and tobacco.

What if I hate those notes?.I updated my query to include this information, and got an updated list of recommendations.

“I like peaches and pears.

Boozy vanilla sweet smelling gourmands.

I don’t like tobacco, or coconut.

”The fourth perfume disappears from the recommendations!The Model:The first step in the model is to identify the sentiment of each sentence from the chatbot message.

I used VADER to do this.

(It was super easy to use, and gave me great results.

I highly recommend trying it out if you have a project where you’d like to use sentiment analysis.

) I concatenate all positive and neutral sentences into one string, and all negative sentiment sentences into another string.

I now have two documents I can use to find similar perfumes.

The perfumes have text descriptions, reviews, and a list of notes.

The model consists of two document embeddings, one from LSA and the other from Doc2Vev.

To train the LSA and Doc2Vec models, I concatenated perfume descriptions, reviews, and notes into one document per perfume.

I then use cosine similarity to find perfumes that are similar to the positive and neutral sentences from the chatbot message query.

I remove recommendations of perfumes that are similar to the negative sentences.

To calculate cosine similarity between the chatbot message and perfume documents, I calculate cosine similarity from the LSA embedding and the Doc2Vec embeddings separately, and then averaged the both scores to come up with a final score.

LSA is a Bag of Words(BoW) approach, meaning that the order (context) of the words used are not taken into account.

The words used are simply tokenized with TF-IDF, and then compressed into embeddings with SVD.

Similar words like “bike” and “bicycle” will appear completely independent in this approach, when in fact these two words should be interpreted to have similar semantic meaning.

This is the drawback of BoWs approaches.

However, I have seen many BoW approaches outperform more complex deep learning methods in practice, so LSA should still be tested and considered as a viable approach.

Doc2Vec is a deep learning approach to learning embeddings from a text document.

Because of its architecture, this model considers context and semantics within the document.

The context of the document and relationships between words are preserved in the learned embedding.

By ensembling the Doc2Vec with the LSA, I was able to get great 1–1 matchings such as returning rose perfumes when I ask for it, and I was also able to leverage the complexities of the language and return relevant results when I describe something more abstract such as a mood or event.

ConclusionBecause this is an unsupervised model, it is difficult to quantify how well it works.

I inspected results carefully and was delighted by how relevant some of the recommendations where!.But to truly test a model like this, I would deploy it in an A/B test it while monitoring if customers purchase the recommended items.

This could give me an estimate of how much revenue a model like this could bring to the business.

If you want to try this model for yourself and get some fun recommendations, you can clone the repo and run the chatbot in the run_model.

ipynb notebook.


. More details

Leave a Reply