SB Nation - Login for mobile commenting

The Crawfish Boxes

Avoiding the Gambler's Fallacy with the Astros

My favorite book that I've read in the last year is The Drunkard's Walk.  It's essentially a history/explantion about how mathematicians have tried to observe and measure tendencies in our lives with statistics.  The discussion that I particuallarly enjoyed was the discussion of the Gambler's Fallacy because, as Amos and Tversky first discovered, humans aren't wired to intuit what the law of averages really gets at—that short run streaks don't change the balance of long run probabilities (I hope I did that justice).

So I was particularly excited when BtB suggested a way to put this concept use in the early part of the season.

Sky provided a spread sheet that allowed you to input your team's original projection total from your projection source and then the actual record.  It then spits out the newly expected win total for the season taking into account the true talent level and the available sample space.

Here's what happend when I used the Community Projection Project:

pW pL aW aL nW nL delta
Astros 83 79 4 8 80.9 81.1 -2.1

p=predicted, a=actual, n=new prediction

The caveat I'd like to bring to the idea of trying to account for the Gambler's fallacy is that the Astros offense, by it's very construction, is itself streaky.  So while there might not be predictive power in a short run streak, I'm not sure the fallacy entirely applicable in our case, but maybe someone with a better understanding of the concept can chime in.

0 recs  |  7 comments

Comments

I don't think that is correct.

Technically, the odds of a baseball team winning a particular game are not random. Among other things it will depend on the opposiing team, the opposing pitcher, and whether the game is home or away. Also, gambler’s fallacy isn’t totally applicable, because the probabilities at particular points in the season are not independent. For example, each game played against the Cubs reduces the proportion of future games which will be played against the Cubs, which changes the probabilities for wins/losses in future games. (This is like removing cards from a deck after the cards are selected.) The same can be said about home/road games. To model the effects more completely would seem to be more complex than what is done at BIB. And, even then, if I recall from our discussions of BP’s playoff odds calculations, there is disagreement over how to handle strength of schedule.

I agree with your comment that the Astros’ team construction results in a streaky team. I don’t have an answer on how to address that. Since we have no idea about the distribution of streaks across the season, it strikes me that a bigger sample size of games played would give us more comfort….but really I’m not sure.

The Gambler's Fallacy applies only to

events governed by a known probability distribution, such as a fairly balanced coin. When trying to project the outcomes of things like team performances, we do not know the actual probability distribution of their wins – it is that very thing we are trying to guess. Therefore, since we start with a guess, we should modify our guess as actual data comes in.

There is a useful mathematical technique for doing this and it is both simple and effective – exponential smoothing. Using that technique, the initial projections are modified every day – for an experiment with 162 trials (games) the normal modification is 1/2 X 1/number of trials (games) or 1/81 or .0123 per game.

As I write this, the Astros are .357 (5-9). Had you thought the Astros were going to be a .450 team at the beginning of the year, the current projection would be .435. Had your original guess been .500, the current guess would be .477, and had your original projection been .550, the current guess would be .519. Interestingly enough, after about 130 games, it would not matter what your original projection was as the current projection would be largely unaffected by it (which makes logical sense) as the evidence from the 130 actual trials overwhelms the original guess.

All of this is exclusive, as clack pointed out, of other variables such as strength of schedule, home-away bias, ballpark factors, weather, rotation, etc. Fun, eh?

You must Login with your SB Nation account and be a member of The Crawfish Boxes to post a comment.