From Content-Based Recommendations to Personalization: A Tutorial

This allows customers to view similar hotel options that they may not have the chance to explore otherwise.Once the customer selects a particular hotel, which we define as the “anchor,” we can provide recommendations of similar hotels by following these three steps:normalize the feature space by converting each feature (i.e., column) into a standard scorecompute the Euclidean distances between the anchor hotel and the other pieces of inventorysort the hotel inventory in ascending order by the Euclidean distanceBelow is some code to sort the hotels by the similarity_distance given an anchor hotel_id.A few things to note here:Normalizing the feature space is very important since we are dealing with features that have different units..The choice of normalization scheme depends on the problem and the data — in this case we are using a standard score approach because the data is normally distributed — but, min/max scaling or TF-IDF (for comparing documents) may also be useful for other applications.Choosing the right distance or similarity score can have a big impact on the quality of recommendations..We are using Euclidean distance because we are embedding geo-coordinates in our feature space..Using Cosine similarity instead would be a huge problem for hotels with the same heading that are far apart because they have the same angle in the [lat, lng] space (shoutout to Gilbert Watson for pointing this out)..Other types of similarity scores can be investigated using the convenient Scipy function cdist.It is important to backtest your recommendation algorithm to pick the best normalization scheme and similarity score and tune any other parameters..For example, looking at recent customer searches and purchases and using recall/precision rate at k can help tune the hyperparameters to find the optimal algorithm configuration.Let’s see what happens when we make the anchor hotel_id = 10:Table 2..The 5 most similar hotels to hotel_id = 10These results look promising!.The algorithm found hotels of a similar price and rating to the anchor that are also nearby as measured by the distance_from_anchor field (in miles).Next, let’s look at a second example where we set the anchor to be hotel_id = 21, which is much cheaper and has lower ratings than hotel_id = 10:Table 3..The 5 most similar hotels to hotel_id = 21Again, the algorithm is able to find similar and nearby hotels that are much different than the results in Table 2.Regardless of the anchor, it is important to measure the goodness of the recommendations by tracking the number of customer views, clicks, and conversions and then iterating on the algorithm.Personalized RecommendationsBuilding upon the content-based similarity algorithm, we can extend this methodology to create personalized recommendations for each customer..Instead of using the hotel that the customer clicked-on as the anchor and finding similar hotels, we can use purchase history to compute normalized features for each customer..The personalized anchor becomes the “virtual hotel” that we use to sort inventory in new markets based on a similarity score..This is akin to a look-a-like model where we try to recommend new hotels that are most similar to what customers have previously purchased.Here is some example code of how we can modify the get_hotel_recommendations function from the previous section to provide personalized recommendations given a dictionary of pre-computed user features.One lever we are using here to bias the model is to artificially set the distance feature equal to the minimum of the normalized distances for the set of available hotels.. More details

Leave a Reply