A Simple Introduction to Collaborative Filtering
These days whether you look at a video on YouTube, a movie on Netflix or a product on Amazon, you're going to get recommendations for more things to view, like or buy. You can thank the advent of machine learning algorithms and recommender systems for this development.
Recommender systems are far-reaching in scope, so we're going to zero in on an important approach called collaborative filtering, which filters information by using the interactions and data collected by the system from other users. It's based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.
A Quick Primer On Recommender Systems
A recommender system is a subclass of information filtering that seeks to predict the "rating" or "preference" a user will give an item, such as a product, movie, song, etc.
Recommender systems provide personalized information by learning the user’s interests through traces of interaction with that user. Much like machine learning algorithms, a recommender system makes a prediction based on a user's past behaviors. Specifically, it’s designed to predict user preference for a set of items based on experience.
Mathematically, a recommendation task is set to be:
- Set of users (U)
- Set of items (I) that are to be recommended to U
- Learn a function based on the user's past interaction data that predicts the likeliness of item I to U
Recommender systems are broadly classified into two types based on the data being used to make inferences:
- Content-based filtering, which uses item attributes.
- Collaborative filtering, which uses user behavior (interactions) in addition to item attributes.
Some key examples of recommender systems at work include:
- Product recommendations on Amazon and other shopping sites
- Movie and TV show recommendations on Netflix
- Article recommendations on news sites
What is Collaborative Filtering?
Collaborative filtering filters information by using the interactions and data collected by the system from other users. It's based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.
The concept is simple: when we want to find a new movie to watch we'll often ask our friends for recommendations. Naturally, we have greater trust in the recommendations from friends who share tastes similar to our own.
Most collaborative filtering systems apply the so-called similarity index-based technique. In the neighborhood-based approach, a number of users are selected based on their similarity to the active user. Inference for the active user is made by calculating a weighted average of the ratings of the selected users.
Collaborative-filtering systems focus on the relationship between users and items. The similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items.
There are two classes of Collaborative Filtering:
- User-based, which measures the similarity between target users and other users.
- Item-based, which measures the similarity between the items that target users rate or interact with and other items.
Collaborative filtering Using Python
Collaborative methods are typically worked out using a utility matrix. The task of the recommender model is to learn a function that predicts the utility of fit or similarity to each user. The utility matrix is typically very sparse, huge and has removed values.
In the following matrices, each row represents a user, while the columns correspond to different films by Pixar. The cosine similarity is the simplest algorithm needed to find the similarity of the vectors. The last, which is the utility matrix following the first matrix, contains only partial data, which is needed to predict the likeliness of the expected rating by the "root" that could be given by the user.
In the following matrices, each row represents a user, while the columns correspond to different movies, except the last one which records the similarity between that user and the target user. Each cell represents the rating that the user gives to that movie.
cosine_similarity(p, q) = p.q
cosine_similarity(joe, beck) =
When a new user joins the platform, we apply the simplest algorithm that computes cosine or correlation similarity of rows (users) or columns (movies) and recommends items that are k-nearest neighbors.
These are many equations that can deal with the question of similarity measures, a few include:
- Pearson similarity
- Jaccard similarity
- Spearman rank correlation
- Mean squared differences
- Proximity–impact–popularity similarity