Member-only story
The job of a top-N recommender system is to produce a finite list of the best things to present to a given person. Ultimately, the goal is to put the best content we can find in front of users as a top-N list.
Our success depends on our ability to find the best top recommendations for people. So, it makes sense to focus on finding things people will love, and not our ability to predict the items people will hate.
There are many ways to do it. Here is one of them:
DISTRIBUTED DATA STORE: We start with some data store representing the individual interests of each user. For example, their ratings for movies that they’ve seen, or implicit ratings such as the stuff they’ve bought in the past. In practice, this is usually a big, distributed NoSQL data store like Cassandra or MongoDB or Memcached or something. It has to vend lots of data, but with very simple queries.
NORMALIZATION: Ideally, this interest data is normalized using techniques such as mean centering or Z scores to ensure that the data is comparable between users. But, in the real world, our data is often too sparse to normalize it effectively.
CANDIDATE GENERATION PHASE: The first step is to generate recommendation candidates, items we think might be interesting to the user based on their past behavior. So, the candidate generation phase…