Categories
Computer Programming Puzzles

Quantitative Interview Questions

He returned for another round of Goldman’s grilling, which ended in the office of one of the high-frequency traders, another Russian, named Alexander Davidovich. A managing director, he had just two final questions for Serge, both designed to test his ability to solve problems.

The first: Is 3,599 a prime number?

Serge quickly saw there was something strange about 3,599: it was very close to 3,600. He jotted down the following equations: 3599 = (3600 – 1) = (60– 12) = (60 – 1) (60 + 1) = 59 times 61. Not a prime number.

The problem wasn’t that difficult, but, as he put it, “it was harder to solve the problem when you are anticipated to solve it quickly.” It might have taken him as long as two minutes to finish. The second question the Goldman managing director asked him was more involved—and involving. He described for Serge a room, a rectangular box, and gave him its three dimensions. “He says there is a spider on the floor and gives me its coordinates. There is also a fly on the ceiling, and he gives me its coordinates as well. Then he asked the question: Calculate the shortest distance the spider can take to reach the fly.” The spider can’t fly or swing; it can only walk on surfaces. The shortest path between two points was a straight line, and so, Serge figured, it was a matter of unfolding the box, turning a three-dimensional object into a one-dimensional surface, then using the Pythagorean theorem to calculate the distances. It took him several minutes to work it all out; when he was done, Davidovich offered him a job at Goldman Sachs. His starting salary plus bonus came to $270,000.

Michael Lewis: Did Goldman Sachs Overstep in Criminally Charging Its Ex-Programmer?
Categories
Computer Programming Life

The Math of Y2K

Airplanes crashing! ATM machines stopping! Computers exploding!? What was the Y2K doomsday all about?

Most computer programs written prior to the mid 1990s used only 2 characters to keep track of the year. Back in the 1970s and 1980s, computing power was expensive, so it made sense not to waste 2 extra digits storing a bunch of redundant “19__” prefixes. The computer would simply store a 55 or 97, instead of 1955 or 1997.

The problem with Y2K was that any date calculation written this way would fail once the year rolled over to 2000. Why? Let’s look at a very simple example. Let’s say a computer program calculates how old you are in order to determine if you’re eligible for certain medical benefits. The code could look something like this:

birth_year = 29;
current_year = 95;
age = current_year – birth_year;
If (age >= 65) then medicare_eligible = TRUE;

Normally, this works fine. In the above example, the age = 95 – 29 = 66, and he can get medicare benefits. But, notice what happens when this same code runs in the year 2000 !

birth_year = 29;
current_year = 00;
age = current_year – birth_year;
If (age >= 65) then medicare_eligible = TRUE;

Now age = 00 – 29. That’s negative 29 years old. Clearly, -29 is not greater than 65. So, the computer thinks you’re not old enough to get medicare benefits! The logic goes haywire unless all the date codes are expanded to 4 digits. Once you do that, 2000 – 1929 = 71.

Apply this simple calculation error to anything that needed to compare dates, or determine how old something is, or how much time has passed. This is why people were expecting a computing catastrophe when the date rolled over.

Categories
Algorithms Computer Programming Statistics

How does Netflix predict which movies you’ll like best?

The simplest way to predict your rating for a movie is simply to average everyone else’s rating of the movie.  (ie:  They can just give you the 10 movies with the highest average rating)  Of course, it can get much more complex that than, especially when NFLX was giving away a million dollars to anyone who could improve their rating algorithm!  The real meta of this problem is to determine other people who are most like you, and then use their collective ratings on movies you haven’t seen yet.

Neighborhood-based model (k-NN): The general idea is “other people who rated X similarly to you… also liked Y”.  To predict if John will like “Toxic Avenger”, first you take each of John’s existing movie ratings, and for each one (eg: “Rocky”), find the people who rated both “Rocky” & “Toxic Avenger”.  You then compare the ratings given to both movies by these people, and calculate how correlated these 2 movies are.  If it’s a strong correlation between their ratings, then “Rocky” is a strong neighbor in predicting John’s rating for “Toxic Avenger”.  You’ll weigh in the average rating given (by “Rocky raters”) to “Toxic Avenger” highly.  You do this for all the movies that John has already rated, and find each one’s strongest neighbor(s), and calculated a predicted “Rocky” rating from each movie John has already rated.  You then calculate a weighted average of all these predictions to come up with your ultimate prediction for John’s rating of “Rocky”.  Lastly, if you do this for every movie in the entire database, you can determine a “Top 10 suggestions” list for John.

 

Here is some general reading on the contest:

The BellKor solution to the Netflix Prize

This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize

The Netflix Prize: 300 Days Later

The Greater Collaborative Filtering Groupthink: KNN