19 Sep 2007
Matt Runnels: I had a stat question that wasn't in the book but came up on another website regarding survival football pick'em. Are upsets/close games more likely in interdivision games? (Some people want to take New England over Buffalo as a sure pick, while others are saying "NO, THEY'RE BOTH AFC EAST!")
So if New England played two teams that were roughly equal, but one was in the AFC East, would the AFC East team have a better shot due to playing each other twice a year/knowing each other better/hating each other more/etc...?
Sure seems like it, right? We all can remember significant upsets where a bad team beat a much better division rival. Just a few examples: It sure seems like the Broncos and Raiders always play close games, right? Remember when the Dolphins were awful and they beat the Patriots on Monday Night Football right before the Pats won their third Super Bowl? The Colts struggle more with Houston and Tennessee than with any other opponents, right?
I could research this using all sorts of complex variables, but I wanted to just run a quick regression to get a general idea of whether this axiom is true. I took every game going back to the 1995 expansion and ran a regression with the dependent variable as binary (win or loss) and the following independent variables:
Some readers may have noticed that this XP was up a couple hours ago, then disappeared. That's because I mistakenly used every game for every team, and it came out that "division game" was completely, totally meaningless. Of course it did. I was looking at every game twice. Duh.
So I re-did the regression, only from the point of view of the team with more Pythagorean wins.
Like the original, faulty analysis I did, the variable for "division game" was completely, totally meaningless. Regressions produce this thing called "P-value" which is used to tell if the variable is significant or not. In general, any variable with a P-value below 0.1 is considered significant. Here are the P-values for the four variables:
That's less significant than an album of Stone Temple Pilots covers by the Olsen Twins. However, the results were slightly different when I used margin of victory as the variable, not just win/loss. The P-value was .11, not significant but close. The coefficient was -.68. In other words, in a division game, if you feel daring, you could give the underdog an extra seven-tenths of a point.
I also tried the regression looking at games where one team was clearly better than the other. In games where one team was two or more Pythagorean wins better, the result is virtually the same -- the coefficient is -.70 and the P-value is .20. In games where one team was four or more Pythagorean wins better -- the really surprising upsets, the Dolphins over Patriots on Monday Night Football stuff -- the coefficient is +.13 and the P-value is .86. In other words, when there is a huge difference between the teams, the fact that they play in the same division does not mean anything.
The verdict: If our readers in Nevada or England feel like buying an extra half-point when betting on a division game, sure, feel free. But it doesn't mean anything for your survivor pool.
One more note: The regression gives the home team an extra 2.7 points, pretty close to the three-point standard for home-field advantage used in Vegas.
14 comments, Last at 20 Sep 2007, 9:22am by Pat on the back
The Week in Quotes wraps up with a look at the good, the bad, and the weird from the Super Bowl.
Comments
#1 said it, but...
* Better team Pythagorean: 1.24 x 10 to the power 33
* Worse team Pythagorean: 2.82 x 10 to the power of 33
* Home or Away: 1.77 x 10 to the power of 28
Don't p-values go from 0 to 1?
The powers should be negative, presumably.
Any STP tribute (at least on the Weiland Era) is welcome!...c'mon, Twins...Leaviiiiiiiin' on a Souuuuuthern Train...Only Yesterday...You Lieeeeeeed...
Bring on the lighters!!
these P-values go to 11
This looks like the basis of a good article in PFP 2008 (or perhaps an expanded version on this site). Either way, as a Colts fan, this article is very relevant, with many other fans trying to explain away the team's mistake-filled game against the Titans by saying 'division foes always play us tough.' Funny; when the Colts were blowing out Tennessee by 30+ points two years ago, nobody was saying that...
Whoops on the signs.
Linear regressions on football scores, especially final score differentials, are severely flawed. One of the basic requirements is that the distribution of the dependent variables is approximately normal.
The football scoring system (7 pt TD, 3 pt FG, etc.) makes score distributions highly irregular. For example, 3-point score differences are far more common than 2- or 4- point differences. 6-pt differences are far more common than 5-, and so on.
The binary (logistic, I assume) regression would not suffer from this flaw however.
Excellent point Brian (#7)...i'd be quite interested to hear a FO response to that
One of the basic requirements is that the distribution of the dependent variables is approximately normal.
The point-differential distribution is approximately normal. You just need to rebin slightly (by about 2 or 3) to smooth over the features. Effectively all football scores have a "+/- 7 points" error in them.
It's not really "highly" irregular - depends on your definition of highly, I guess. The fact that games are highly biased to not end in a tie is actually more of a pain. It's actually better if you just consider all OT games to end in a tie score.
So much for hoping for an excuse for Cincy booting me from my survival league!
Pat's histogram of the 2000-2005 point differential distribution is linked in my name. If you mentally balance it to make it symmetric (it isn't because of the bin endpoints) it looks close enough to normal.
If you try to predict exact differentials, then non-normality is a problem, but that's a mug's game anyway.
Next up, we look at "teams that play good teams close then play down to the level of lesser opponents."
I predict a high correlation of fan frustration with these teams.
I took a look at the same issue from a different angle (link in the name). What I found was that home field advantage has been slightly but consistently weaker for intradivision games than interdivision or interconference games. Interconference games were the most unstable in terms of percent of games won by home team and average margin of victory. In 2006, only half of interconference games were won by the home team, but in 2005, the number was 65.625%.
A lot of prediction models I've seen have an overly large bias towards the home team that's insensitive to these types of matchups. This includes the spread. So in 2005, the models were unusually accurate, and in 2006, the models were unusually inaccurate. I even saw this with simple "Team with better DVOA wins" predictions.
If familiarity or lack thereof doesn't play a role in it, then this might be a result of the way we adjust stats for league (when maybe by conference would be more suitable).
I would have been willing to bet the phenomena was that whole "We remember the exceptions more than the rule" thing. That is, we don't notice the upsets that don't happen. Now I'm pretty convinced.
Post new comment