Build your own Recommender System within 5 minutes!

Build your own Recommender System within 5 minutes!Meet NanduBlockedUnblockFollowFollowingMay 27The most successful and widespread application of machine learning technologies in business is the Recommendation System.

You are browsing through Spotify to listen to a song but cannot decide which one.

You are surfing through YouTube to watch some videos but are not able to decide which video to look at.

There are so many other instances like these where we have abundant data but we can’t decide what we would like.

This is where the recommender systems jumps in for our help.

Recommender systems are ubiquitous in today’s marketplace and have great commercial importance, as evidenced by the large number of companies that sell recommender systems solutions.

Recommendation systems changed the way inanimate websites communicate with their users.

Rather than providing a static experience in which users search for and potentially buy products, recommender systems increase interaction to provide a richer experience.

Recommender systems identify recommendations autonomously for individual users based on past purchases and searches, and on other users’ behavior.

Today, we will focus on providing a basic recommendation system by suggesting items that are most similar to a particular item, in this case, movies.

Keep in mind, this is not a true robust recommendation system, to describe it more accurately, it just tells you what movies/items are most similar to your movie choice.

Let’s get started!Import Librariesimport numpy as npimport pandas as pdGet the DataYou can get the data-set by clicking here.

The columns in the data-set named as ‘u.

data’ represent User ID, Item ID, Rating and timestamp as we have defined in the code below.

column_names = ['user_id', 'item_id', 'rating', 'timestamp']df = pd.

read_csv('u.

data', sep=' ', names=column_names)df.

head()Our data should look something like this.

Now let’s get the movie titles:movie_titles = pd.

read_csv("Movie_Id_Titles")movie_titles.

head()We can merge them together:df = pd.

merge(df,movie_titles,on='item_id')df.

head()EDALet’s explore the data a bit and get a look at some of the best rated movies.

Visualization Importsimport matplotlib.

pyplot as pltimport seaborn as snssns.

set_style('white')%matplotlib inlineLet’s create a ratings dataframe with average rating and number of ratings:df.

groupby('title')['rating'].

mean().

sort_values(ascending=False).

head()titleMarlene Dietrich: Shadow and Light (1996) 5.

0Prefontaine (1997) 5.

0Santa with Muscles (1996) 5.

0Star Kid (1997) 5.

0Someone Else's America (1995) 5.

0Name: rating, dtype: float64df.

groupby('title')['rating'].

count().

sort_values(ascending=False).

head()titleStar Wars (1977) 584Contact (1997) 509Fargo (1996) 508Return of the Jedi (1983) 507Liar Liar (1997) 485Name: rating, dtype: int64ratings = pd.

DataFrame(df.

groupby('title')['rating'].

mean())ratings.

head()Now set the number of ratings column:ratings['num of ratings'] = pd.

DataFrame(df.

groupby('title')['rating'].

count())ratings.

head()sns.

jointplot(x='rating',y='num of ratings',data=ratings,alpha=0.

5)Okay!.Now that we have a general idea of what the data looks like, let’s move on to creating a simple recommendation system:Recommending Similar MoviesNow let’s create a matrix that has the user ids on one access and the movie title on another axis.

Each cell will then consist of the rating the user gave to that movie.

Note there will be a lot of NaN values, because most people have not seen most of the movies.

moviemat = df.

pivot_table(index='user_id',columns='title',values='rating')moviemat.

head()Most rated movie:ratings.

sort_values('num of ratings',ascending=False).

head(10)Let’s choose two movies: starwars, a sci-fi movie.

And Liar Liar, a comedy.

Now let’s grab the user ratings for those two movies:starwars_user_ratings = moviemat['Star Wars (1977)']liarliar_user_ratings = moviemat['Liar Liar (1997)']starwars_user_ratings.

head()user_id0 5.

01 5.

02 5.

03 NaN4 5.

0Name: Star Wars (1977), dtype: float64We can then use corrwith() method to get correlations between two pandas series:similar_to_starwars = moviemat.

corrwith(starwars_user_ratings)similar_to_liarliar = moviemat.

corrwith(liarliar_user_ratings)Upon executing this command, a warning will be issued which will look something like this.

/Users/marci/anaconda/lib/python3.

5/site-packages/numpy/lib/function_base.

py:2487: RuntimeWarning: Degrees of freedom <= 0 for slice warnings.

warn("Degrees of freedom <= 0 for slice", RuntimeWarning)Let’s clean this by removing NaN values and using a DataFrame instead of a series:corr_starwars = pd.

DataFrame(similar_to_starwars,columns=['Correlation'])corr_starwars.

dropna(inplace=True)corr_starwars.

head()Now if we sort the dataframe by correlation, we should get the most similar movies, however note that we get some results that don’t really make sense.

This is because there are a lot of movies only watched once by users who also watched star wars (it was the most popular movie).

corr_starwars.

sort_values('Correlation',ascending=False).

head(10)Let’s fix this by filtering out movies that have less than 100 reviews (this value was chosen based off the histogram from earlier).

corr_starwars = corr_starwars.

join(ratings['num of ratings'])corr_starwars.

head()Now sort the values and notice how the titles make a lot more sense:corr_starwars[corr_starwars['num of ratings']>100].

sort_values('Correlation',ascending=False).

head()Now the same for the comedy Liar Liar:corr_liarliar = pd.

DataFrame(similar_to_liarliar,columns=['Correlation'])corr_liarliar.

dropna(inplace=True)corr_liarliar = corr_liarliar.

join(ratings['num of ratings'])corr_liarliar[corr_liarliar['num of ratings']>100].

sort_values('Correlation',ascending=False).

head()Great Job!You do not need a market research to find out whether a customer is willing to purchase at a shop where they’re getting maximum help in scouting the right product.

They’re also much more likely to return to such a shop in the future.

To get an idea about the business value of recommender systems: A few months ago, Netflix estimated, that its recommendation engine is worth a yearly $1billion.

There is more advanced and non-traditional method to power your recommendation process.

These techniques namely deep learning, social learning, and tensor factorization are based on machine learning and neural networks.

Such cognitive computing methods can take the quality of your recommenders to the next level.

It’s safe to say that product recommendation engines will improve with the use of machine learning.

And create a much better process for customer satisfaction and retention.

.. More details

Leave a Reply