Predicting Ratings with Matrix Factorization Methods

Predicting Ratings with Matrix Factorization MethodsHéctor LiraBlockedUnblockFollowFollowingFeb 19TL;DRMatrix Factorization methods approximate a matrix of ratings, R, by the product of two matrices, P and Q.

Predictions for users that were not used to estimate P and Q need to be made using Ordinary Least Squares estimators.

Matrix Factorization MethodsThe idea of Matrix Factorization Methods is to ‘decompose’ the ratings matrix, R, into a product of two lower dimension matrices, P and Q, the former representing a matrix where we capture the affinity of users with a number of dimensions, k, and the latter representing the similarity of each item to those dimensions.

Formally, if we have data for n users over m items, the matrix R has dimensions of n×m, P has dimensions of k×n, and Q has dimensions k×m.

The estimation is stated as:The k dimensions are called latent factors and represent intrinsic interactions between users and items.

The algorithm tries to describe these latent features by creating item and user profiles.

With these profiles, we can predict the rating a user would give an item and recommend items that are predicted to receive high ratings by the user.

In this blog post, we will use the recosystem library in R to make recommendations to users.

The recosystem library provides functions to easily train models using Matrix Factorization methods.

If you’d like to learn more about how to use this library and how the algorithm finds the estimated matrices I suggest you read the documentation:recosystem: Recommender System Using Parallel Matrix FactorizationPredictions for Out-of-Sample UsersTaking the example from the recosystem documentation and the training and test data provided, we will focus on making predictions for users in the test set that were not used to train the model.

The data used here is in a (user_index, item_index, rating) format.

Ratings take values in the {1, 2, 3, 4, 5} set.

First, train the model:# Import the necessary librarieslibrary(recosystem)library(dplyr)# Reading the datatrain_file <- data_file(system.

file("dat", "smalltrain.

txt", package = "recosystem"))test_file <- data_file(system.

file("dat", "smalltest.

txt", package = "recosystem"))# Create the model objectr = Reco()# Train the model# Grid search to find the optimum parametersset.

seed(123) # This is a randomized algorithmopts_tune <- r$tune(train_file)$min# Train the model and store it locallyr$train(train_file, opts = opts_tune, out_model = file.

path("/your/file/path", "model.

txt"))Next, make predictions for all users and store them in a vector:pred <- r$predict(test_file, out_memory())Find users that are in the test set that were not used to train the model:# First, read the training and test set as tablestrain_file <- read.

table(system.

file("dat", "smalltrain.

txt", package = "recosystem"))test_file <- read.

table(system.

file("dat", "smalltest.

txt", package = "recosystem"))setdiff(unique(test_file$V1), unique(train_file$V1))Out:[1] 219The user with index 219 belongs to the test set and was not used to train the model.

How do the predictions for this user look?Append the predictions to the test_file data frame and find the predictions for this user:test_file_pred <- data.

frame(test_file, pred = pred)test_file_pred %>% filter(V1 == 219)Out: V1 V2 pred1 219 316 3.

0072 219 640 3.

0073 219 340 3.

0074 219 198 3.

0075 219 543 3.

0076 219 572 3.

0077 219 461 3.

0078 219 932 3.

0079 219 900 3.

00710 219 289 3.

00711 219 226 3.

007All of the predicted ratings for this user coincide with the average rating across all users in the training set:mean(train_file$V3)Out:[1] 3.

007Taking the average ratings and inserting this as the predicted rating for every user-item pair is equivalent to not making recommendations at all.

There is no way to rank items this way for a user.

This would imply that we could not make personalized recommendations for new users (and not used to train the model), even though the user might have already provided some ratings or evidence of liking some items.

OLS Estimators DerivationThe correct way to make predictions on ratings for a specific user is to consider two cases:when the user was used to train the model, andwhen the user was not used to train the model.

When the user u was used to train the model, predict the rating of item i using the inner productWhen the user u was not used to train the model, we need to estimate the vector pᵤ.

To do this, pose the following problem:We want to estimate a matrix Rᵤ of dimension 1×m with two lower dimension matrices: a matrix Pᵤ of dimension k×1 and the matrix Q of dimension k×m.

Formally,If we consider users not used to train the model for which we know a certain number of ratings of the m items used to train the model, we could have a similar problem to estimating a linear regression:Estimate β inThen, the way to make predictions is to estimate the matrix Pᵤ the same way we estimate β in linear regression: using Ordinary Least Squares.

The vector to estimate in the linear regression problem appears in the right-hand side of the matrix product Xβ, while the matrix to approximate in our original problem appears in the left-hand side.

However, the derivation is as simple as in linear regression:Define the residual sum of squares asTake the derivative of RSS with respect to P and set it equal to zero:Then,Note that RQ’P and P’QR’ are both the same scalars and RR’ is independent of P.

Then,Finally,Whereis our estimated matrix of Pᵤ.

As in linear regression, we need to have more observations than variables to train the model.

In our case, we need to use at least k ratings from a user on any of the m items used to train the model.

Having this matrix, we predict the ratings of a user not included to train the model as if she was included to train the model: take the inner productExampleLet’s create some example data for the user 219 we found earlier in the data provided by the recosystem library.

Define nᵤ as the number of items the user 219 rated from the m items used to train the model.

set.

seed(123)n_u <- 12items_u <- sample(train_file$V2, n_u)ratings_u <- sample(c(1, 2, 3, 4, 5), n_u, replace = T)p_u <- data.

frame(V1 = 219, V2 = items_u, V3 = ratings_u)Read the model into your memory:model <- read.

table("/your/file/path/model.

txt", skip = 5)Extract the item vectors from the items the user 219 rated.

This should be a matrix of size k×nᵤ:q_u <- model %>% filter(V1 %in% paste0("q", items_u)) %>% select(-c(V1, V2)) %>% as.

matrix %>% tCalculate (QQ’)⁻¹:first_prod <- tcrossprod(q_u, q_u) %>% solveCalculate Q’(QQ’)⁻¹:second_prod <- crossprod(q_u, first_prod)Finally, calculate RQ’(QQ’)⁻¹:p_u_est <- crossprod(as.

matrix(p_u$V3), second_prod)p_u_estOut: V3 V4 V5 V6 V7 V8 V9 V10 V11 V12[1,] 7.

470921 -1.

211137 2.

3026 0.

4800567 2.

62823 -4.

101964 -0.

6423028 -7.

016185 0.

05477537 4.

571988The estimators in p_u_est should match the estimators from a linear regression model without the bias parameter:data_for_lm <- data.

frame(y = p_u$V3, t(q_u))lm_model <- lm(y ~ .

– 1, data = data_for_lm)lm_model$coefficientsOut: V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 7.

47092111 -1.

21113661 2.

30260034 0.

48005674 2.

62822953 -4.

10196436 -0.

64230284 -7.

01618547 0.

05477537 4.

57198777To make predictions for this user, take the product of the matrix P’ and Q:# Create a matrix Q_u by taking the columns of Q from the items you want predictions foritems_to_predict <- (test_file_pred %>% filter(V1 == 219))$V2Q_u <- model %>% filter(V1 %in% paste0("q", items_to_predict)) %>% select(-c(V1, V2)) %>% as.

matrix %>% t# Make predictionspredicted_ratings <- crossprod(t(p_u_est), Q_u)We obtain very different predicted ratings than the ones calculated before:test_file_pred %>% filter(V1 == 219) %>% mutate(new_pred = as.

numeric(predicted_ratings))Out: V1 V2 pred new_pred1 219 316 3.

007 5.

8930662 219 640 3.

007 2.

4817163 219 340 3.

007 1.

8109614 219 198 3.

007 6.

9977995 219 543 3.

007 2.

7931216 219 572 3.

007 6.

6035117 219 461 3.

007 4.

3147758 219 932 3.

007 4.

3904169 219 900 3.

007 3.

75651910 219 289 3.

007 3.

78230611 219 226 3.

007 4.

033049ConclusionTraining recommender system models has been made easy with libraries such as recosystem.

When taking one of these algorithms into production make sure that you are using the model correctly.

We provided a way to make recommendations for users that are not considered to train a recommender system model.

The use of Ordinary Least Squares is needed in these cases.

If you want to learn more about making recommendations for users in Matrix Factorization algorithms I suggest you read this Quora answer:https://www.

quora.

com/How-do-I-predict-values-with-Matrix-Factorization-method-in-a-recommender-system.

. More details

Leave a Reply