Computer System Makes Best Sports Bets 73
schliz writes to tell us that a new computer system using the "Logistic Regression Markov Chain" (LRMC) has proven to be the most efficient system at predicting sporting event outcomes. The system was tested on the 2008 US NCAA basketball season and picked all four of the finalists. "Similar to other rankings systems, LRMC uses the quality of each NCAA team's results and the strength of each team's schedule to rank teams. The method has been designed to use only basic scoreboard data, including which teams played, which team had home court advantage and the margin of victory."
For the first time in a while... (Score:5, Insightful)
Why not test it for the past 10 years? (Score:1, Interesting)
Re: (Score:2, Insightful)
Re: (Score:2)
www2.isye.gatech.edu/people/faculty/Joel_Sokol/ncaa.pdf
And what bothers me are things like the following:
"For the LRMC model, we used all of the game data (home team, visiting team,
margin of victory) from the beginning of the season until just before the start of the
tournament; we obtained this data, as well as tournament results, on line from Yahoo!
daily scoreboards [24]. We note that neutral-site non-tournament games were unknown
in our data set; the team listed as âoehomeâ on t
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
From TFA:
Yeah well... (Score:2)
The real test would be to look at the rest of the computer's bracket.
Re: (Score:2)
As for the final four It wouldn't surprise me to see the next several years only #1 and #2 seed teams make up the final four, the rest of the field is a joke. There is no real competition for the top 1 and 2 seeded teams.
Re: (Score:2)
Making sports bets (Score:5, Interesting)
The amount of noise involved strongly depends on which sport that is involved. Basket is a sport where a lot of points is scored, which in turn means that the noise is relatively low while football (what americans call soccer for some strange reason and what americans call football is more like rugby) has a lot of noise since the ability to score a goal there is depending a lot on luck.
This essentially means that counting points is a good way to score a basketball team while counting goals won't give much clue to how good a given football team is. You must look at other factors on a football team instead. And not all those factors can be as easily measured. Of course - the other factors are also important for a basket team. Other factors involved are the composition of players, individual player mood/health/inspiration, latest matches, history between the teams, referee behavior, weather, spectators, location, timezone etc. Add to this the element of randomness caused by the impact of the ball on a surface, player positions at certain points of the game etc.
Re: (Score:2)
Re:Making sports bets (Score:5, Insightful)
Is always a question of statistics with a random noise involved.
The amount of noise involved strongly depends on which sport that is involved. Basket is a sport where a lot of points is scored, which in turn means that the noise is relatively low while football (what americans call soccer for some strange reason and what americans call football is more like rugby) has a lot of noise since the ability to score a goal there is depending a lot on luck.
American football, over the course of a full game, has coarse scoring jumps (7pts for a touchdown) but luck plays a surprisingly small role. This is why good teams have very high winning percentages and poor ones have such low winning percentages. Not sure how that dynamic works in futbol, but the luck factor isn't as large as you'd think.
The reason the LRMC method is well-suited to NCAA basketball is that A) there are a lot of games, and B) the good conferences don't play the bad ones much. That means that a high-order Markov model is a good way to determine who would beat whom through a game of "I beat a team that beat a team that beat a team that beat you" sort of thing.
I came up with a version of this independently before I stumbled over these guys last year. It's pretty fun and works quite well. It's certainly much better than the polls, and in most cases last year my system was within a point or two of the Vegas spread. It's also pretty good at recognizing underdogs early - mine had Davidson and Drake before they were in the polls.
Re: (Score:1)
Re: (Score:2)
Football is as random as basketball (or any other sport). You can win by 6 points as easily as lose by 6 points *if* the two teams are equally matched (on that given day).
Statistically that statement is invalid. You can't determine randomness from a single trial, and the fact is that in football, the better team wins far more likely than sports such as baseball. Good teams in football can have a winning percentage of over 0.800, which is far better than the best baseball teams. So the outcomes of Amer
Re: (Score:3, Interesting)
It also implies that a
My NCAA predictor code had the same result! (Score:2, Funny)
List pickFinalFour(Tournament tourney){
List finalFour = new ArrayList();
for (Division d : tourney){
Team bestTeam = null;
int minSeed = Integer.MAX_VALUE;
for (Team t : d){
if (t.getSeed()minSeed){
Re: (Score:3, Funny)
pick
pick = map minimum
Best bet is not to bet... (Score:5, Interesting)
Now, any statistical model (such as this LRMC thing, or the techniques m'colleague used) will only give estimates of the odds. It might say that the probability of team A winning is 0.6. Now, if the bookies are offering you a return of 0.7 then it's worth a bet. If the bookies rate it 50-50 then it's not worth a bet.
The trouble is that any statistical model worth its salt is going to produce probabilities that add up to 1.0, whereas the bookies' odds can add up to 1.2 or so. That's how they play the game and make their profits.
So after a season where we made a few pennies profit, and got some press interest (including a team from BBC Tomorrow's World filming us playing football), my friend realised the best thing to do was not to bet at all.
And instead he went into the business of supplying odds to bookmakers. From where he now sits at the top of a rather large business empire!
I might pop him an email to see what his current techniques are, but back in the day it was something similar to this LRMC thing.
Re: (Score:2)
Probabilities of all possible outcomes will add up to 1, but odds are p/(1-p), where p = probability of a given event. Odds can vary between 0 and infinity.
Logistic regression predicts the log-odds of a given event (which can be exponentiated to predict odds, or converted to a probability.)
Re: (Score:2)
Re: (Score:2)
All bookies odds can be converted to a probability between 0 and 1, and it makes it easier to see if the probabilities do add up to more than 1 (and also if 100-30 is better than 6-4).
Of course some would argue (and this bei
Re: (Score:1)
I'll bite :P.
Yes, 'cos lord knows that all those people in other countries stuck on metric for the past hundred years just cannot do math.
Damn Anglos.
Re: (Score:2, Funny)
Jesus H Christ it must suck to be a scientist. Imagine working 10 years at one place and still be a mere "assistant".
Re: (Score:1)
Re: (Score:2)
You mean because you never know whether the link will have expired when you next open the bookmark?
Re: (Score:2)
It's sad that most people don't realize that bookmarking is like roulette - you will lose on average no matter how good (statistical) information about the winner you have.
This isn't strictly true - by definition, if you have perfect statistical information, you win every bet and cannot possibly lose on average.
Extending this down to worse statistics, to win in the long run all you need to do is have sufficiently better information than the bookie to ensure that you can overcome the extra padding they give to their chosen set of odds, which is not impossible in principle.
Of course, doing such a thing in practice is an entirely different kettle of fish, which is why it's s
Re: (Score:1)
Re: (Score:2)
James Simons's $29 billion Renaissance Institutional Equities Fund has fallen 8.7 percent so far in August when his computer models used to buy and sell stocks were overwhelmed by securities' price swings. The two-year-old quantitative, or 'quant,' hedge fund now has declined 7.4 percent for the year. Simons said other hedge funds have been forced to sell positions, short-circuiting statistical models based on the relationships among securities."
BTW I use Quant met
Re: (Score:2)
Have you been watching WarGames?
only one question (Score:1, Troll)
Re: (Score:2, Funny)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Excellent... (Score:1)
how does it compare to a playstation? (Score:1)
Wait a minute! (Score:2, Funny)
Oh wait, sorry it was patented years ago, and multiple times with minute variations such as going back to strength of opponents opponents, and margin of victory of opponents against common opponents, and strength of opponents opponents opponents, and
Re: (Score:2)
When substantial progress is made on a long-standing problem, generally there are three situations: the new approach was never tried before (because all the losers were looking under the same wrong rock), or the approach requires deep theoretical insight and skill (which losers
Now Hear This! (Score:2)
Re: (Score:3, Funny)
But I can predict which team the machine will predict to win: Team #42
I'm not convinced (Score:4, Insightful)
Re: (Score:2)
RTFA (Score:5, Informative)
The linked article didn't mention it, but from the GA Tech web site, it said that it correctly identified several overrated teams that lost early on (like Georgetown), and underrated teams that went farther than expected (like WVU). The program picks Kansas to win this year.
Re: (Score:2)
Re: (Score:2)
Data mining (Score:3, Insightful)
Doesn't say whether the test was done on in-sample or out-of-sample data. That is, did they test using the same data that was used during development?
If so, the results are worthless. You can make a "system" that says anything you want given enough tweaking. (This is often the problem with apparently successful computer trading models).
Re: (Score:2)
Money whoring... (Score:2)
Depends on what was that paper for.
If the paper they published is to test and prove methods to produce good quality predictions, they'll probably use out-of-sample data.
If the paper was published so they can ask grands, they'll probably use in-sample data and any other possible trick just to make look their system more efficient. Special bonus if they managed to cram a few money-producing grands like "could be used by DHS to pred
Re: (Score:2)
What I was more concerned about is whether the prediction task they've taken on has low intrinsic difficulty. The fact that others have done it badly doesn't prove much. Worse, those other predictions might have been made with a different immediate purpose, for which they were closer to optimal than as interpreted by this paper for the predict
Great sample (Score:5, Insightful)
Re: (Score:1)
Great sample... They should test the algorithm on maybe 80 historical seasons and maybe we will be able to see something.
Well, the problem with that is the NCAA tournament hasn't always used seeds, and only opened up to 64 teams in 1985. So you might want to see results from the past 23 seasons. That's why I don't trust most of the stats analysts provide unless they preface it with, "since the tournament expanded to 64 teams..." For example, UCLA has eleven or twelve national championships but most of those where back when you only had to win two or three games to claim the title. Six games is a marathon.
I'm using it (Score:2, Interesting)
Re: (Score:1)
Link to the paper (Score:2, Informative)
But in the end... (Score:2)
Only then can it be true to life.
Just great. (Score:2)