How does Netflix predict which movies you’ll like best?

The simplest way to predict your rating for a movie is simply to average everyone else’s rating of the movie.  (ie:  They can just give you the 10 movies with the highest average rating)  Of course, it can get much more complex that than, especially when NFLX was giving away a million dollars to anyone who could improve their rating algorithm!  The real meta of this problem is to determine other people who are most like you, and then use their collective ratings on movies you haven’t seen yet.

Neighborhood-based model (k-NN): The general idea is “other people who rated X similarly to you… also liked Y”.  To predict if John will like “Toxic Avenger”, first you take each of John’s existing movie ratings, and for each one (eg: “Rocky”), find the people who rated both “Rocky” & “Toxic Avenger”.  You then compare the ratings given to both movies by these people, and calculate how correlated these 2 movies are.  If it’s a strong correlation between their ratings, then “Rocky” is a strong neighbor in predicting John’s rating for “Toxic Avenger”.  You’ll weigh in the average rating given (by “Rocky raters”) to “Toxic Avenger” highly.  You do this for all the movies that John has already rated, and find each one’s strongest neighbor(s), and calculated a predicted “Rocky” rating from each movie John has already rated.  You then calculate a weighted average of all these predictions to come up with your ultimate prediction for John’s rating of “Rocky”.  Lastly, if you do this for every movie in the entire database, you can determine a “Top 10 suggestions” list for John.


Here is some general reading on the contest:

The BellKor solution to the Netflix Prize

This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize

The Netflix Prize: 300 Days Later

The Greater Collaborative Filtering Groupthink: KNN



How They Check if a Credit Card Is Valid

Before the era of ethernet, TCP/IP, and packet verification, electronic data was transmitted over telephone wires.  (Think back to the days of AOL & modems, and that screeching noise when you connected.)  A big problem with this method was the risk of external interference garbling your signal.  What if a bird flew into the phone wire?  Or if someone picked up the other line?   Or it started raining?  Any external interference could result in the  information being sent at that moment to be garbled.  So, if you were transmitting something like a credit card number, how would the receiver know that it wasn’t garbled?  (For example, what if a 3 was garbled into a 4?)   This was a problem solved at IBM back in the 1950s.  Of course, the same issues arise when human error is introduced.  (What if you are reading the credit card aloud over the phone, and the other person types in one of the digits incorrectly?)

For the rest of this post, I am merely introducing a lecture I attended called “Identification Numbers and Check Digit Schemes”, by Joe Kirtland.  Thanks to Joe for sharing his full PPT slides with me (link below).  It talks about the checkdigit systems used to validate credit cards, bar-codes, ISBN numbers, currency serial numbers, etc.  It’s a great real-world application of Algebra, Geometry, and algorithms.

Here is a very crude example that illustrates the concept:  Let’s say you want to transmit the number “34515″.  One validation algorithm requires that the sum of the individual digits be divisible by 10 (mod 10 = 0).  Currently, the sum of the digits is 18 (3+4+5+1+5=18)  So, to make the sum divisible by 10, you just tack on a 2 at the end , and transmit “345152″.  The receiver of the data is told to ignore that last number, which is called a check digit.  The receiver then adds the numbers up and sees if the sum is divisible by 10.  If even one number was changed, it won’t work*.  If the final number he gets doesn’t check out, something went wrong, and you need to resend.

*Question:  This method is not foolproof, can you think of some reasons why?

If this topic interests you, check out the following slides, where Joe explored check digits as they relate to UPC codes, ISBN numbers, credit cards, and serial numbers on various currency.

Click here to view Joe’s full PPT lecture