typodupeerror

## Computer System Makes Best Sports Bets73

schliz writes to tell us that a new computer system using the "Logistic Regression Markov Chain" (LRMC) has proven to be the most efficient system at predicting sporting event outcomes. The system was tested on the 2008 US NCAA basketball season and picked all four of the finalists. "Similar to other rankings systems, LRMC uses the quality of each NCAA team's results and the strength of each team's schedule to rank teams. The method has been designed to use only basic scoreboard data, including which teams played, which team had home court advantage and the margin of victory."
This discussion has been archived. No new comments can be posted.

## Computer System Makes Best Sports Bets

• #### For the first time in a while... (Score:5, Insightful)

on Saturday April 05, 2008 @02:49AM (#22971390) Journal
The final four were also all #1s in their league. Coincidence? This has never happened before I believe and if the computer calculates odds the way the teams are ranked, then this may not always be so reliable.
• #### Why not test it for the past 10 years? (Score:1, Interesting)

by Anonymous Coward
That was my first thought as well. The four #1 seeds are theoretically the most likely to be in the final four, assuming they were seeded correctly, but of course unexpected things usually happen in sports so this is the first time that's occurred. But if I had to bet my life on picking the final four, I'd probably pick the four #1 seeds in any given year because even though the odds of that occuring are low, the odds of me choosing which #2 or #4 seeds displace a couple #1s as usually happens so that I p
• #### Re: (Score:2, Insightful)

by Anonymous Coward
Why would 10 years be so much better than the 9 years they analyzed?
• #### Re: (Score:2)

www2.isye.gatech.edu/people/faculty/Joel_Sokol/ncaa.pdf

And what bothers me are things like the following:

"For the LRMC model, we used all of the game data (home team, visiting team,
margin of victory) from the beginning of the season until just before the start of the
tournament; we obtained this data, as well as tournament results, on line from Yahoo!
daily scoreboards [24]. We note that neutral-site non-tournament games were unknown
in our data set; the team listed as âoehomeâ on t
• #### Re: (Score:1)

I wonder if the computer picked Davidson going to the Elite 8 ...
• #### Re: (Score:2)

From TFA:

By identifying the 30 of the 36 Final Four participants in the past nine years of tournaments, the method has achieved 83 percent accuracy. In comparison, the seedings and polls have correctly identified only 23 of the 36 NCAA Final Four participants in the same nine year stretch, and the currently used Ratings Percentage Index (RPI) formula identified 21.
• #### Yeah well... (Score:2)

I picked all 4 #1 seeds as well. Sure it's never happened before but the odds have got to be better than trying to pick which numbered seed is going to get in from each division.

The real test would be to look at the rest of the computer's bracket.

• #### Re: (Score:2)

exactly, the whole bracket should be looked atto see just how good the system is.

As for the final four It wouldn't surprise me to see the next several years only #1 and #2 seed teams make up the final four, the rest of the field is a joke. There is no real competition for the top 1 and 2 seeded teams.
• #### Re: (Score:2)

No real competition? Sure, all of the #1s made it to the Final Four for the first time ever. But two #2s lost in the second round (Duke and Georgetown), one in the third (Tennessee), and one in the Elite 8. So at most, if you give a little leeway in the 2-3 game that's supposed to happen in the Sweet 16, half of the #2s performed to expectation. The other half didn't even make it to the second weekend.
• #### Making sports bets (Score:5, Interesting)

on Saturday April 05, 2008 @03:02AM (#22971426) Homepage
Is always a question of statistics with a random noise involved.

The amount of noise involved strongly depends on which sport that is involved. Basket is a sport where a lot of points is scored, which in turn means that the noise is relatively low while football (what americans call soccer for some strange reason and what americans call football is more like rugby) has a lot of noise since the ability to score a goal there is depending a lot on luck.

This essentially means that counting points is a good way to score a basketball team while counting goals won't give much clue to how good a given football team is. You must look at other factors on a football team instead. And not all those factors can be as easily measured. Of course - the other factors are also important for a basket team. Other factors involved are the composition of players, individual player mood/health/inspiration, latest matches, history between the teams, referee behavior, weather, spectators, location, timezone etc. Add to this the element of randomness caused by the impact of the ball on a surface, player positions at certain points of the game etc.

• #### Re:Making sports bets (Score:5, Insightful)

on Saturday April 05, 2008 @06:01AM (#22971868)

Is always a question of statistics with a random noise involved.
The amount of noise involved strongly depends on which sport that is involved. Basket is a sport where a lot of points is scored, which in turn means that the noise is relatively low while football (what americans call soccer for some strange reason and what americans call football is more like rugby) has a lot of noise since the ability to score a goal there is depending a lot on luck.

American football, over the course of a full game, has coarse scoring jumps (7pts for a touchdown) but luck plays a surprisingly small role. This is why good teams have very high winning percentages and poor ones have such low winning percentages. Not sure how that dynamic works in futbol, but the luck factor isn't as large as you'd think.

The reason the LRMC method is well-suited to NCAA basketball is that A) there are a lot of games, and B) the good conferences don't play the bad ones much. That means that a high-order Markov model is a good way to determine who would beat whom through a game of "I beat a team that beat a team that beat a team that beat you" sort of thing.

I came up with a version of this independently before I stumbled over these guys last year. It's pretty fun and works quite well. It's certainly much better than the polls, and in most cases last year my system was within a point or two of the Vegas spread. It's also pretty good at recognizing underdogs early - mine had Davidson and Drake before they were in the polls.

• #### Re: (Score:1)

Football is as random as basketball (or any other sport). You can win by 6 points as easily as lose by 6 points *if* the two teams are equally matched (on that given day). When teams are not equally matched, then randomness does not play much of a role, which is why good teams over the course of a season have good records and bad teams have bad records *in any sport*.
• #### Re: (Score:2)

Football is as random as basketball (or any other sport). You can win by 6 points as easily as lose by 6 points *if* the two teams are equally matched (on that given day).

Statistically that statement is invalid. You can't determine randomness from a single trial, and the fact is that in football, the better team wins far more likely than sports such as baseball. Good teams in football can have a winning percentage of over 0.800, which is far better than the best baseball teams. So the outcomes of Amer

• #### Re: (Score:3, Interesting)

Several years ago I was playing with some iterative and least squares approaches to predicting (American) football scores and rating teams. It worked pretty well, but one thing stood out: When you use only the scores from previous games and home/visiting status as inputs to the model, you hit a pretty hard floor of about 2 touchdowns (13 or 14 points) for your standard error. That error includes the "hidden variables" that you mention, as well as the fundamental randomness of the game.

It also implies that a
• #### My NCAA predictor code had the same result! (Score:2, Funny)

Here's the code I used

List pickFinalFour(Tournament tourney){
List finalFour = new ArrayList();
for (Division d : tourney){
Team bestTeam = null;
int minSeed = Integer.MAX_VALUE;
for (Team t : d){
if (t.getSeed()minSeed){
• #### Re: (Score:3, Funny)

by Anonymous Coward
Really? Here's mine:

pick :: Ord a => [[a]] -> [a]
pick = map minimum
• #### Best bet is not to bet... (Score:5, Interesting)

on Saturday April 05, 2008 @03:06AM (#22971442) Journal
One of our research assistants started doing something like this about ten years ago, fitting a statistical model to previous soccer match results and the home/away effect. He rounded some of us up to chip in a few pounds each week and off he went to the bookies to bet on the outcome of his model.

Now, any statistical model (such as this LRMC thing, or the techniques m'colleague used) will only give estimates of the odds. It might say that the probability of team A winning is 0.6. Now, if the bookies are offering you a return of 0.7 then it's worth a bet. If the bookies rate it 50-50 then it's not worth a bet.

The trouble is that any statistical model worth its salt is going to produce probabilities that add up to 1.0, whereas the bookies' odds can add up to 1.2 or so. That's how they play the game and make their profits.

So after a season where we made a few pennies profit, and got some press interest (including a team from BBC Tomorrow's World filming us playing football), my friend realised the best thing to do was not to bet at all.

And instead he went into the business of supplying odds to bookmakers. From where he now sits at the top of a rather large business empire!

I might pop him an email to see what his current techniques are, but back in the day it was something similar to this LRMC thing.

• #### Re: (Score:2)

Are you mixing up probability and odds?

Probabilities of all possible outcomes will add up to 1, but odds are p/(1-p), where p = probability of a given event. Odds can vary between 0 and infinity.

Logistic regression predicts the log-odds of a given event (which can be exponentiated to predict odds, or converted to a probability.)

• #### Re: (Score:2)

Not at all, he's saying that a bookie always makes his cut. He's saying that if you bet based on probability (if team A is 10%, you bet 10% of your money on him, and 90% on the other guy) on both sports teams in a head-to-head contest with no chance of a tie, it will cost you about 20% of your bet. With such a heavy cut (common in risky black markets), even a highly effective predictive algorithm can lose money.
• #### Re: (Score:2)

I'm mixing up the terminology perhaps, but only because people are used to getting 'odds' from a bookie expressed as 'X to Y'. And in stupid units too (mathematically speaking). "100-30"? "6-4 on"? Jeez they're not even in their lowest terms! No wonder mathematical numeracy is declining!

All bookies odds can be converted to a probability between 0 and 1, and it makes it easier to see if the probabilities do add up to more than 1 (and also if 100-30 is better than 6-4).

Of course some would argue (and this bei
• #### Re: (Score:1)

Of course some would argue (and this being slashdot, some will) that the real reason for the decline in numeracy is because we no longer have to work out weights in pounds and ounces, or distances in feet and miles, or money in pounds shillings and pence. Err yeah maybe I dunno. Discuss.

I'll bite :P.
Yes, 'cos lord knows that all those people in other countries stuck on metric for the past hundred years just cannot do math.
Damn Anglos.

• #### Re: (Score:2, Funny)

One of our research assistants started doing something like this about ten years ago

Jesus H Christ it must suck to be a scientist. Imagine working 10 years at one place and still be a mere "assistant".
• #### Re: (Score:1)

Exactly! I came to the similar conclusion (but theoretically, without computing the probabilities), when a friend, also a student of mathematics, came to me with similar idea. We then checked the bookmakers' odds and they all have this property (inverses add up to more than 1). There is nothing more to add to your post really, except maybe that bookmakers can also add any amount of uncertainty (coming from the statistical model of the data) into their odds (by making it more than one by higher or lesser mar
• #### Re: (Score:2)

It's sad that most people don't realize that bookmarking is like roulette

You mean because you never know whether the link will have expired when you next open the bookmark? :-)
• #### Re: (Score:2)

It's sad that most people don't realize that bookmarking is like roulette - you will lose on average no matter how good (statistical) information about the winner you have.

This isn't strictly true - by definition, if you have perfect statistical information, you win every bet and cannot possibly lose on average.

Extending this down to worse statistics, to win in the long run all you need to do is have sufficiently better information than the bookie to ensure that you can overcome the extra padding they give to their chosen set of odds, which is not impossible in principle.

Of course, doing such a thing in practice is an entirely different kettle of fish, which is why it's s

• #### Re: (Score:1)

Read the original Baum and Welch [jstor.org] paper. Ever wonder why "Baum, Gaines, Petrie and Simons "Probabilistic models for stock market behavior. To appear." never appeared. Check out James Simons [wikipedia.org].
• #### Re: (Score:2)

Did you also read that the fund is down....

James Simons's \$29 billion Renaissance Institutional Equities Fund has fallen 8.7 percent so far in August when his computer models used to buy and sell stocks were overwhelmed by securities' price swings. The two-year-old quantitative, or 'quant,' hedge fund now has declined 7.4 percent for the year. Simons said other hedge funds have been forced to sell positions, short-circuiting statistical models based on the relationships among securities."

BTW I use Quant met
• #### Re: (Score:2)

my friend realised the best thing to do was not to bet at all.

Have you been watching WarGames?
• #### only one question (Score:1, Troll)

who's going to win the 'National today? If it can't tell me that, then no matter how technical-sounding it's algorithm is, it's not a lot of use to me.
• #### Re: (Score:2, Funny)

That's ok, becase I don't think that they created the algorithm with you in mind. You're just a negligible quantity.

Simon.

Oh well
• #### Re: (Score:1)

I predict it will be "Comply or Die".

• #### Excellent... (Score:1)

So I need a copy, preferably of the source, and a bookie.
• #### how does it compare to a playstation? (Score:1)

3 years ago a friend of mine ran the super bowl through a football game and ended up 2 points off, does anyone know what the accuracy of those games are compared to a "real" system like this?
• #### Wait a minute! (Score:2, Funny)

Are you telling me that somebody actually looked at win/loss records and margin of victory and strength of opponents to figure out which team might win? How can this be? Why did nobody ever figure out this simple algorithm before? [slaps forehead with hand] DOH!

Oh wait, sorry it was patented years ago, and multiple times with minute variations such as going back to strength of opponents opponents, and margin of victory of opponents against common opponents, and strength of opponents opponents opponents, and
• #### Re: (Score:2)

This particular strain of nihilism gets on my nerves after a while. There is no clear inference from "it's been tried before". Not even if you multiply by a million times, or a million wannabe losers all with the same wannabe dream.

When substantial progress is made on a long-standing problem, generally there are three situations: the new approach was never tried before (because all the losers were looking under the same wrong rock), or the approach requires deep theoretical insight and skill (which losers
• #### Now Hear This! (Score:2)

I will now be taking bets on how long before mob goons put an axe through the computer, tie it to a chair, and throw it into a river.....
• #### Re: (Score:3, Funny)

"We want this machine off, and we want it off now!"

But I can predict which team the machine will predict to win: Team #42
• #### I'm not convinced (Score:4, Insightful)

on Saturday April 05, 2008 @04:21AM (#22971620)
If I had a computer that could predict sports results, I wouldn't tell anyone about it. I'd take a briefcase full of cash down to the bookmakers.
• #### Re: (Score:2)

Bookies (online ones at least) will blackball you if you win too much of the time.
• #### RTFA (Score:5, Informative)

on Saturday April 05, 2008 @04:22AM (#22971622)
I know this is Slashdot, but why can't people RTFA before commenting? They aren't using the seeds or rankings in the program - only game stats, home quart advantage, etc. They ran it on the last 9 years of data and it picked final four teams 30% more often than analysts. (30/36 vs 23/36).

The linked article didn't mention it, but from the GA Tech web site, it said that it correctly identified several overrated teams that lost early on (like Georgetown), and underrated teams that went farther than expected (like WVU). The program picks Kansas to win this year.
• #### Re: (Score:2)

lol. home quart advantage. Sorry. I should have converted that to liters before posting.
• #### Re: (Score:2)

The summary could at least link to the paper [gatech.edu]. But then again, if we can't expect people to RTFA, I highly doubt anyone is going to RTFP...
• #### Data mining (Score:3, Insightful)

on Saturday April 05, 2008 @04:27AM (#22971638) Homepage

Doesn't say whether the test was done on in-sample or out-of-sample data. That is, did they test using the same data that was used during development?

If so, the results are worthless. You can make a "system" that says anything you want given enough tweaking. (This is often the problem with apparently successful computer trading models).

• #### Re: (Score:2)

Captain Obvious, is that you? Isn't this one of the first lessons in the second year class for undergrad statistics? I hope we can assume that PhD statisticians are not going to use in-sample data and call it a 'prediction'. That's like 'predicting' last week's score.
• #### Money whoring... (Score:2)

I hope we can assume that PhD statisticians are not going to use in-sample data

Depends on what was that paper for.
If the paper they published is to test and prove methods to produce good quality predictions, they'll probably use out-of-sample data.

If the paper was published so they can ask grands, they'll probably use in-sample data and any other possible trick just to make look their system more efficient. Special bonus if they managed to cram a few money-producing grands like "could be used by DHS to pred

• #### Re: (Score:2)

If the conceptual approach is sufficiently restrictive (extreme paucity of tunable parameters), it still amounts to something to successfully predict in-sample data.

What I was more concerned about is whether the prediction task they've taken on has low intrinsic difficulty. The fact that others have done it badly doesn't prove much. Worse, those other predictions might have been made with a different immediate purpose, for which they were closer to optimal than as interpreted by this paper for the predict
• #### Great sample (Score:5, Insightful)

on Saturday April 05, 2008 @05:02AM (#22971736)
Great sample... They should test the algorithm on maybe 80 historical seasons and maybe we will be able to see something.
• #### Re: (Score:1)

Great sample... They should test the algorithm on maybe 80 historical seasons and maybe we will be able to see something.

Well, the problem with that is the NCAA tournament hasn't always used seeds, and only opened up to 64 teams in 1985. So you might want to see results from the past 23 seasons. That's why I don't trust most of the stats analysts provide unless they preface it with, "since the tournament expanded to 64 teams..." For example, UCLA has eleven or twelve national championships but most of those where back when you only had to win two or three games to claim the title. Six games is a marathon.

• #### I'm using it (Score:2, Interesting)

I heard about this last year and used their picks for this year's bracket. I'm tied for first in my pool, and 93.5% nationally in espn's bracket game. Just for comparison of how good their choices are. They had 100% on the first round day one.
• #### Re: (Score:1)

They picked yesterday's games right too. So that makes their picks better than *99.1%* better than others in ESPN's pool, based on that being my percentage in ESPN's pool and I played the GT researchers' picks.
• #### Link to the paper (Score:2, Informative)

Here is the paper describing the method: http://www2.isye.gatech.edu/people/faculty/Joel_Sokol/ncaa.pdf [gatech.edu]
• #### But in the end... (Score:2)

...the program will have a special function designed to find something nasty to say about Kansas and the computer will begin making sounds like Dick Vitale on amphetamines screaming about North Carolina, and just for good measure, Duke, even though they aren't playing.

Only then can it be true to life.

• #### Just great. (Score:2)

Who taught Biff Tannen [ytmnd.com] how to program?

The Wright Bothers weren't the first to fly. They were just the first not to crash.

Working...