Maybe if we had additional features other than just movie series there might be some other similarities between the Star Wars movies and one of the other movies user B has seen.
Collaborative RecommendationsAgain, as we briefly touched on above, collaborative recommendation systems focus on similarities between users.
Looking at our Netflix example of our four users, it is realistic to assume that different users will score each movie on a different scale.
For example, someone may only score movies between 3 and 5 and another person may score primarily between 1 and 3.
To account for this user variation we want to normalize the utility matrix when doing collaborative recommendations.
This normalization will put everyone’s scoring system on the same scale.
Figure 3 (Mining of Massive Datasets — pg 324)To normalize the data we first find the average movie rating for each user.
Then we create a new matrix where, for each user, we subtract the user’s average rating from each provided movie rating.
A movie with a negative score is a movie that had a below average score for that particular user, while a positive score indicates the user rated the movie above their average score.
The chart above shows the normalized ratings for each of our four users.
User A had an average score of 3.
33 (4, 5, and 1).
When we remove 3.
33 from each of the three provided scores and we get 0.
67 (2/3), 1.
67 (5/3) and -2.
33 (-7/3).
I want to highlight what we did here.
In the original matrix, both users A and C rated movies at a 4.
User A rated HP 1 at a 4 while user C rated SW1 at 4.
However, in our normalized matrix, we can see that these values are no longer the same.
User A has HP1 at 2/3 while user C has SW 1 at 1/3.
This is because user C had a higher average score compared to user A, meaning a low score from user C carries more weight compared to a low score from user A.
Now that everyone’s rating is on the same scale we can accurately compare the similarities between users.
The way we do this is by calculating the cosine for any two users.
A large cosine means that the angle we are measuring is small, and vice versa.
Since the angle represents the similarity between users, a large cosine (small angle) implies that there is a small difference between the two users being measured.
Figure 4 — A and C — (Mining of Massive Datasets — pg 324)Let’s walk through the formula for calculating the cosine.
First, we select two users to compare.
As an example we will compare users A and C.
Next, we find all of the movies where both users have submitted a rating.
From our matrix, we can see that this happens to be two movies: TW and SW1.
We then multiply the movie ratings from each user together and add the products to get our numerator.
The denominator takes each rating from user A, squares the rating, and adds all ratings from user A together before taking the square root.
It does this again for user C and then multiplies the two results together to get the denominator.
We can see our cosine for user A and C comes to -0.
559, meaning they have opposite tastes in movies.
Figure 5 — A and B — (Mining of Massive Datasets — pg 323)Let’s see if user A is more similar to user B or user C.
We know the cosine between user A and C.
So we just repeat the steps above, but this time between A and B.
There is only one movie that both user A and B have rated (HP 1) therefore, our numerator has only one multiplication.
We see that the cosine between user A and B is 0.
092.
Remember, the higher the cosine the more similar the users.
Based on this we can safely conclude that user A is more similar to B than they are to C.
If we compare A to a new user X and get a cosine of 0.
099, user A would be even more similar to user X than to user B.
UV-DecompositionOne of the most widely accepted forms of estimating user ratings is the UV-Decomposition method.
To show how this method works, let’s view the utility matrix in Figure 6 below.
We can still assume that the rows are various users and the columns various movies, but this time there are only two missing rating values.
Figure 6 (Mining of Massive Datasets — pg 328)The first step to the UV-Decomposition is determining the number of dimensions that can similarly categorize our users and movies.
In our example we will set our number of dimensions to two, but this could be larger.
We then take our matrix above and create two new matrixes: user matrix (U) and item matrix (V).
The U matrix will be 5 by 2 (for 5 users and 2 dimensions) while the V matrix will be 2 by 5 (for 2 dimensions and 5 movies).
These two new matrixes (U and V) are populated with random numbers (for now we will populate them with 1s) and then they are multiplied together to create another matrix that is the same size as our original utility matrix (5 by 5).
The goal is to incrementally change the numbers in our U and V matrixes to get this new matrix as close as possible to our original utility matrix.
Figure 7 (Mining of Massive Datasets — pg 329)To measure the success of each incremental adjustment of U and V, we calculate an RMSE (root mean squared error) between the two matrixes.
The smaller the RMSE, the closer (or more similar) the two matrixes are.
Therefore, we want to find the best values to plug into U and V that will result in the lowest RMSE between the two 5 by 5 matrixes.
See the Appendix section below to get a detailed walkthrough of both the calculation for RMSE as well as how Matrix Multiplication works.
Once we have found the combination of values for our U and V matrix that minimize the RMSE, we can find the estimated values for the missing ratings in our original utility matrix.
SummaryLeskovec, Rajaraman, and Ullman cover more than what is discussed above, but I wanted to cover the three main topics of recommender systems.
Content-Based Recommender systems focus only on the similarity across items while Collaborative Recommender systems focus primarily on the similarity across users.
UV-Decomposition is a great method that uses a U and V matrix to find the best estimate for missing values in a matrix.
AppendixMatrix MultiplicationBefore walking through the RMSE calculation let’s first go over how matrix multiplication works.
When we multiply U (5 x 2) by V (2 x 5), what is actually happening?.How do we get the resulting 5 x 5 matrix that is filled with 2s?To start off, order matters in matrix multiplication.
The number of columns in the first matrix must equal the number of rows in the second matrix.
In our case, U has 2 columns and V has 2 rows, so we are good.
The next trick is that the output matrix will have the same number of rows as the first matrix, and the same number of columns as the second matrix.
In our case, U has 5 rows and V has 5 columns, so the resulting matrix is going to be 5 by 5.
Now, how do we populate the 5 by 5 matrix?.Let’s take a different example that will make explaining matrix multiplication easier, here are our U and V matrixes:Figure 8To calculate the upper left value of the 5 by 5 matrix we take the upper left value of U (3) and multiply that by the upper left value of V (5).
Then we take the upper right value of U (4) and multiply that by the bottom left value of V (7).
So the formula looks like: (3 x 5) + (4 x 7) = 43.
Here is our matrix:Figure 9To populate the first row of our 5 by 5 matrix, we continue to use the first row in our U matrix, but for our V matrix we move one column over each time.
So the number to the right of 43 will be: (3 x 8) + (4 x 4) = 40, and so on.
Here is our matrix again:Figure 10To populate the rows we repeat the process above, but we move down the U rows.
So the value below 43 will be: (2 x 5) + (8 x 7) = 66.
The value below 40 will be (2 x 8) + (8 x 4) = 48, and so on.
Here is the final result of our 5 by 5 matrix:Figure 11RMSE CalculationJust to briefly recap, RMSE is used in UV-Decomposition to compare the magnitude of difference between the original matrix (with the actual user ratings) and the newly calculated matrix (from calculating U x V).
Here is what the initial setup looks like:Figure 12 (Mining of Massive Datasets — pg 328–329)The question is, how do we measure how similar these matrixes are?.We do this with the RMSE.
If we want to calculate the RMSE between the two above matrixes, the first step is to subtract each value of the calculated matrix from the original.
The result looks like:Figure 13All we did here was subtract the values in our calculated matrix (the matrix of 2s) from our original matrix.
So 3, in the upper left-hand corner, came from 5 minus 2.
The next step is to square each value in this new matrix to get the following:Figure 14Once we have squared each value in the matrix we then add each row together (for example, we will sum 9 + 0 + 4 + 4 + 1 to get 18, and so on.
The result looks like the following:Figure 15The final combination of steps is to: 1) sum all values in this 5 by 1 matrix, 2) divide by the total number of provided ratings from our original utility matrix (in our case this would be 5 x 5 – 2, which is 23, because we had two blanks in our original utility matrix), 3) take the square root of this result and that value is the RMSE.
So, here is a visual for the final 3 steps of our example:Figure 16 (23 = 5 x 5 – 2)The square root of 3.
26 is 1.
81, so the RMSE between our original utility matrix and the matrix of 2s is 1.
81.
Works CitedJure Leskovec, Anand Rajaraman, Jeffrey D.
Ullman.
Mining of Massive Datasets.
2014.
http://infolab.
stanford.
edu/~ullman/mmds/book.
pdf.