Netflix Prize May Have Been Achieved 83
MadAnalyst writes "The long-running $1,000,000 competition to improve on the Netflix Cinematch recommendation system by 10% (in terms of the RMSE) may have finally been won. Recent results show a 10.05% improvement from the team called BellKor's Pragmatic Chaos, a merger between some of the teams who were getting close to the contest's goal. We've discussed this competition in the past."
No info about the Netflix prize (Score:2, Insightful)
C'mon, the Netflix prize isn't THAT well known. At least you could have given some basic info about it.
Re:No info about the Netflix prize (Score:5, Informative)
Re: (Score:2)
the new york times had a great story about it in november
slashdot story below includes links to that nyt article, and other slashdot stories about the netflix prize:
http://science.slashdot.org/article.pl?sid=08/11/22/0526216 [slashdot.org]
Re:No info about the Netflix prize (Score:5, Informative)
What, you didn't even read the /summary/?
I know, this is Slashdot, but 'some basic info about it' is /right there/.
Re: (Score:1)
Re: (Score:3, Informative)
Re: (Score:2, Informative)
Explanation of what RMSE is. [lmgtfy.com]
Re:No info about the Netflix prize (Score:5, Informative)
Re: (Score:1)
1 Million split 7 ways (Score:4, Funny)
most of them did it (Score:5, Insightful)
for simple intellectual satisfaction, like a giant puzzle or a game of chess
money is not the motivation for everything in this world
Re: (Score:2)
alas, basementman (Score:2)
batman even has a better sense of humor than you ;-)
Re: (Score:1, Funny)
Re: (Score:2, Insightful)
Well, it was for AT&T. No, they don't want the prize money; they're donating it charity. But what they do have now is an algorithm that can be turned into a commercial product or service. The individual researchers may not have had money as their primary motivator, but their employer sure has hell did.
Re: (Score:1)
money is not the motivation for everything in this world
But it's waaaay ahead of what's in second place.
Re: (Score:1)
Re:1 Million split 7 ways (Score:5, Insightful)
Well, just like the Ansari X Prize didn't cover the costs of developing and launching a suborbital rocket, the Netflix Prize isn't really meant to be a large enough prize to fully fund the development of a new recommendation algorithm. The purpose of the prize is to stimulate interest and get people started. The real reward will come when they turn their algorithm into commercialized software - the rewards from making such a thing applicable outside of Netflix could be large indeed.
Re: (Score:2)
The X-Prize was designed to encourage the creation of a vehicle that would demonstrate the feasibility of a new market. It was backed up with a whole lot of market research which showed that people would happily sign up for a flight on such a vehicle. The anticipated business plan that it was trying to encourage was:
1. Build a vehicle that is very reusable and can put passengers into space.
2. Win the prize and get the PR.
3. Take bookings and reservation fees to fund the next flight.
4. Fly the first passen
Re: (Score:2)
And will continue to be "well underway" for the next 5 years with no-one having flown in it is my bet..
Re:1 Million split 7 ways (Score:5, Insightful)
Pretty sure having it on their CV means they can effectively write their own pay cheque in terms of job opportunities.
Re: (Score:2)
Re: (Score:2)
If this was your only job, you would have to pay rent/utilities on the building where you work in addition to rent/utilities on the building where you live. You also have to pay your own health insurance, dental/vision, and retirement/401K.
The above should cut it down to around a bit above (if not below when you have an office) the average starting salary of an Engineer (55K at last I checked).
$1M ain't what it used to be.
Re: (Score:2)
Unless one of them kills all of their partners like in The Dark Knight that ain't much of a prize.
Yeah, but if they deliver him to the competition in a body bag they get another 500 hundred grand.
A million if alive, so they can teach him some manners first.
Re: (Score:2)
only $71,428.57 each ... that ain't much of a prize.
That'd pay off my house, with a little to spare for new windows. I call that much of a prize.
Do they keep the prize money? (Score:2)
Re:Do they keep the prize money? (Score:4, Informative)
AT&T have committed to giving all money to charity. The person at yahoo developed his entry while working at AT&T, so I will be surprised if yahoo gets any of it.
Well done! (Score:5, Informative)
Well done Bellkor.
But now the real race begins.
Now that the 10% barrier has been reached, people have 30 days to submit their final results. At the end of the 30 days, whoever has the best result wins.
This is going to be a great month!
Re: (Score:3, Informative)
Now that the 10% barrier has been reached, people have 30 days to submit their final results. At the end of the 30 days, whoever has the best result wins.
That's true, but like the story title indicates, the prize may have been achieved. From the contest rules:
Re: (Score:3, Interesting)
Actually, this email has been sent out
"As of the submission by team "BellKor's Pragmatic Chaos" on June 26, 2009 18:42:37 UTC, the Netflix Prize competition entered the "last call" period for the Grand Prize. In accord with the Rules, teams have thirty (30) days, until July 26, 2009 18:42:37 UTC, to make submissions that will be considered for this Prize. Good luck and thank you for participating!"
they were able to get the extra 0.5% over the top (Score:5, Funny)
by simply ignoring data from anyone who ever rented SuperBabies: Baby Geniuses 2, Gigli, From Justin to Kelly, Disaster Movie, any movie by Uwe Boll and any movie starring Paris Hilton
suddenly, everything made sense
i assume you're joking, but.. (Score:2)
i was joking, however (Score:5, Interesting)
from the excellent nyt article about the competition in november:
http://science.slashdot.org/article.pl?sid=08/11/22/0526216 [slashdot.org]
it isn't bad movies that are the problem, taste in bad movies can still be uniform
the real problem is extremely controversial movies, most notably Napoleon Dynamite
http://www.imdb.com/title/tt0374900/ [imdb.com]
not controversial in terms of dealing with abortion or gun control, but controversial in terms of some people really found the movie totally stupid, while some people really found the movie to be really funny
movies like napolean dynamite are genre edge conditions, and people who apparently agree on everything else about movies in general encounter movies like this one and suddenly dramatically differ on their opinion of it, in completely unpredictable ways
Re:i was joking, however (Score:4, Interesting)
Yeah, all the recommendation systems where I've bought or rented movies, and most of my friends all said I needed to see 'Fight Club', so I did, and ... meh.
Consider this list of movies I've bought/rated highly:
12 Monkeys
V for Vendetta
Lost in Translation
Donnie Darko
A Beautiful Mind
Dogma
I might be grouped with folks who enjoy flicks about identity, man vs. man, those who aren't easily offended, etc. But there doesn't seem to be as clear a way to find a group of people who find aggression offensive, which is basically the driving theme of Fight Club. Perhaps given enough negative ratings it could be possible, but even though I've clicked 'Not Interested' on all the Disney movies, they keep suggesting I want their latest direct-to-DVD crapfest, so I'm left to assume they're rating mostly based on positive ratings.
Re:i was joking, however (Score:4, Interesting)
Perhaps given enough negative ratings it could be possible, but even though I've clicked 'Not Interested' on all the Disney movies, they keep suggesting I want their latest direct-to-DVD crapfest, so I'm left to assume they're rating mostly based on positive ratings.
A co-worker gets almost no recommendations at all from Netflix, and customer service told him that they generate recommendations based on ratings of 4 or 5 (though you'd think that the recommendations that they do generate would have to filter through similar movies that you've rated at 0). He was told to rate the movies that he likes higher in order to fix it, but that's never really accomplished anything as he has several hundred movies in the 4-to-5 range and maybe a dozen recommendations total.
I'm pretty sure that the Disney/children's movie recommendation flood that most everyone seems to be getting is driven by parents who don't actually love those movies, but are rating those movies on behalf of their children. That causes a weird connection to movies that they themselves enjoy, and it makes it seem like the same audience is enjoying both types of movie. They need to have an "I'm a parent" flag somewhere to help them sort that out
Re: (Score:2)
Useful, thanks.
> They need to have an "I'm a parent" flag somewhere to help them sort that out
I'd love to have family member tags in the queue. Associating an age with them would be fine. They have a thing where you can have separate queues for different family members, but it's a separate login IIRC, and a real pain to manage cohesively. I'd much rather just have one queue, and have it send discs in a round-robin for thing1, thing2, and the folks.
Re: (Score:2)
Examining the standard deviation of the ratings received by each film should provide a reasonable (albeit imperfect) indicator of the "controversialness" of any given film.
Of course, there are a few ways to improve upon this, and I'm sure that the winning teams have taken this into account.
Re: (Score:1)
it isn't bad movies that are the problem, taste in bad movies can still be uniform
also there are people like my wife and I who share the netflix since we watch most of the movies together and have different tastes in movies: for one I HATE '30s and '40s dramas and she loves them so she will get them sometimes and I love B grade sci fi and horror movies and she isn't the biggest fan of them
Re: (Score:3, Funny)
by simply ignoring data from anyone who ever rented SuperBabies: Baby Geniuses 2, Gigli, From Justin to Kelly, Disaster Movie, any movie by Uwe Boll and any movie starring Paris Hilton
Hey, I (along with the rest of my frat, our school hockey team, and most of the town) was in a movie starring Paris Hilton, you insensitive clod!
Re: (Score:2)
Hey, I (along with the rest of my frat, our school hockey team, and most of the town) was in Paris Hilton, you insensitive clod!
Fixed
Re: (Score:2)
Kind of what I was implying...
Re: (Score:2)
You can generalize that to any movie with an action hero, and a baby. Any movie with a pseudo-star (Hilton, Spears, Madonna, etc.). And any movie with Uwe Boll or similar people.
On a more serious note: I think the best way to improve recommendations, is to first relate the IMDB rating to the IQ of the rater. I found that more intelligent people do not like movies with a simple plot, because it bores them, and less intelligent people do not like movies with a complex, subtle plot, because they don't get it.
Re: (Score:1, Insightful)
Re: (Score:1)
by simply ignoring data from anyone who ever rented SuperBabies: Baby Geniuses 2, Gigli, From Justin to Kelly, Disaster Movie, any movie by Uwe Boll and any movie starring Paris Hilton
suddenly, everything made sense
Ok, From Justin to Kelly wasn't really that bad. Now, Ishtar...and Battlefield Earth; those were baaaad.
Interesting (Score:5, Informative)
I published a paper using Netflix data. (Yeah, that group [slashdot.org].)
It's certainly cool that they beat the 10% improvement, and it's a hell of a deal for Netflix, since it would have cost them more than a prize money paid out to hire the researchers, the interesting thing is whether or not this really advances the the field of recommendation systems.
The initial work definitely did, but I wonder how much of the quest for the 10% threshold moved the science, as opposed to just tweaking an application. Recommender systems still don't bring up rare items, and they still have problems with diversity. None of the Netflix Prize work address any of these problems.
Still, I look forward to their paper.
Re: (Score:1)
Here's the problem with the terms "rare" and "diversity":
1.) Rare could also be defined as unpopular. Trying to recommend unpopular movies is problematic. Is the computer program going to be able to discern under-rated (Glengarry Glen Ross) or just crap (Ishtar)
2.) Suggesting "My Little Pony" when I usually rent black and white Samurai movies could be called diversity. Do you want a program that recommends things that are different or things that are similar?
Re:Interesting (Score:4, Insightful)
Trying to recommend unpopular movies is problematic. Is the computer program going to be able to discern under-rated (Glengarry Glen Ross) or just crap (Ishtar)
That is indeed an interesting question, and I think it's what the grandparent meant when he pointed out Netflix's contest didn't really address it. The performance measure Netflix used was root-mean squared error, so every prediction counts equally in determining your error. Since the vast majority of predictions in the data set are for frequently-watched films, effectively the prize was focused primarily on optimizing the common case: correctly predict whether someone will like or not like one of the very popular films. Of course, getting the unpopular films right too helps, but all else being equal, it's better to make even tiny improvements to your predictions of films that appear tons of times in the data set, than to make considerable improvements to less popular films' predictions, because the importance of getting a prediction right is in effect weighted by the film's popularity.
You could look at error from a movie-centric perspective, though, asking something like, "how good are your recommender algorithm's predictions for the average film?" That causes you to focus on different things, if an error of 1 star on Obscure Film predictions and an error of 1 star on Titanic predictions count the same.
Re: (Score:2)
so every prediction counts equally in determining your error
True, but not true. If we had talked about plain mean error I would have agreed. But as it stands, not every error counts the same. Making a serious error (guessing 1 instead of 5) costs 16 times what making a small error (guessing 4 instead of 5) costs.
With popular movies you usually have enough data to come decently close with guesses. Sure, you can optimize them more to get even closer with the guesses. But it is a minor profit. On the other hand, you don't have much data on the less popular movies, so
Re: (Score:3, Insightful)
That's true, but since there's not a huge range in ratings, that root-squaring doesn't have nearly as big an effect as the many orders of magnitude difference in popularity. I don't recall the exact numbers offhand, but I think the top-10 movies, out of 17,500, account for fully half the weight.
Re: (Score:2)
Awesome combo. I'd pay $100 for a Zatoichi vs My Little Pony movie.
Re: (Score:2, Insightful)
You know what. I actually like Ishtar. I really do. The blind camel, and the line "We're not singers! We're songwriters!" gets me every time.
So really, the even harder problem is to know when to buck your your friends and go with the the outlier. It's hard, because kNN methods work pretty well, and t
Re: (Score:1, Interesting)
You think using a Generic Algorithm could help?! are you kidding??!! :-)
The search space is far too great and what would you actually be searching for with this technique? (just curious)
I would hope to see techniques evolved from an energy variant of Markov Decision Processes! now that 'would' be a nice direction.
"recommendations" (Score:3)
Who listens to these sort of things anyway?
real world (Score:2)
So... What does this mean in real-world analysis? What does the score represent? Since the score shown seems to be smaller-is-better, does this mean that 85+% of the movies recommended won't be attractive to the target, and less than 15% would be found interesting?
That doesn't seem very accurate...
Re: (Score:3, Interesting)
~0.85 points (on a five-point scale)
Actually the scale is not 0-1-2-3-4 but 0-1-4-9-16 as they use Root-Mean-Square. Just thought it was worth pointing out.
Film recommendations (Score:3, Interesting)
Re: (Score:3, Interesting)
Reasonably good, actually. I often add 4 star movies to my queue, and rarely regret it.
The problem is the bell curve. There aren't a lot of 5 star movies out there, and I've seen them. There are a lot of 3 star films, but my life is short and I don't want to spend a lot of time on movies I merely "like".
In fact, it's not really a bell curve. I rarely provide 1-star or 2-star ratings simply because it's not at all difficult for me to identify a film I'm going to truly hate. I don't have to waste two hou
Re: (Score:1)
Re: (Score:2)
The key advantage Netflix has over other services is that it's right there. They know what you watch and you don't have to go searching.
Of course you still have to go though the back catalog and talk about all the things you've already seen, which runs in the hundreds for somebody who likes movies. That pain is essentially the same with any service.
But going forward, Netflix can present you the opportunity to rate a movie more easily. It's a small user-interface thing, but significant.
Re: (Score:3, Informative)
I believe that Netflix is still using Cinematch. You could look into movielens [movielens.org]. It's from the GroupLens group at U Minn.
You do know that Netflix said on the outset "You're competing with 15 years of really smart people banging away at the problem." and it was beat in less than week [slashdot.org].
That's not to meant as a knock against Netflix's engineers, but more about that they didn't really build a state of the art recommender system. Sim [sifter.org]