10,000 Seasons Revisited
by Aaron Schatz
Once upon a time, the NFL playoffs were fairly predictable. There were upsets, sure. The Super Bowl didn't always pit one No. 1 seed against the other No. 1 seed. However, if you made a list of the top five teams in the league by conventional wisdom, usually one of those teams would end the season with the Lombardi trophy.
You can look at a number of different stats to see that this was the case, ranging from the simple to the complex:
- From 1978 through 2006, not counting the strike-shortened 1982 season, the Super Bowl champion averaged 12.7 wins. Every Super Bowl champion during that time had at least 11 regular-season wins except for the 1988 San Francisco 49ers. Even every Super Bowl loser during that time had at least 11 regular-season wins except for the 1979 Los Angeles Rams (9-7) and the 1987 Denver Broncos (10-4-1).
- From 1978 through 2006, 23 of 27 Super Bowl champions finished among the top five teams in Pythagorean wins. No Super Bowl champion finished out of the top ten. From 1989 to 2000, there was a 12-year span where every single champion was either first or second in Pythagorean wins during the regular season.
- From 1978 through 2006, 25 of 27 Super Bowl champions finished among the top five teams in pro-football-reference's Simple Rating System. The exceptions both ranked seventh: the 1980 Oakland Raiders and the 2001 New England Patriots.
- DVOA picks up in 1991, but from 1991 through 2006, only four teams won the Super Bowl without ranking in the top three in DVOA. The 2003 Patriots and 2005 Steelers were fourth, the 2006 Colts were seventh, and the 2001 Patriots were 11th.
Of course, you know what's happened since.
- Four of the last six champions have been either 9-7 or 10-6.
- Only three of the last six champions have ranked in the top ten in Pythagorean wins.
- Only three of the last six champions have ranked in the top ten in the Simple Rating System.
- Only four of the last six champions have ranked in the top ten in DVOA, and none of them have ranked in the top three.
Add this to the 9-7 Arizona Cardinals coming within a minute of winning the title in 2008, and it certainly looks like a trend. For some reason, it seems like regular-season performance has become useless in predicting which team will win a championship. All you have to do is get into the tournament, and you can toss everything else out.
But, as I brought up in the Super Bowl Audibles conversation, what if there is no reason for the trend? What if there is no trend? What if we're just being fooled by a combination of randomness and small sample size?
First of all, the very idea of "four surprise champions in six years" is itself mistaken. It's really just three surprise champions. The 2010 Packers were clearly one of the league's best teams and simply had some poor luck in close games. They ranked second in Pythagorean wins, second in SRS, and fourth in DVOA.
Take out the Packers, and now maybe you don't have a six-year trend where 2009 was the one exception. Maybe you have two different two-year trends, the 2007 Giants/2008 Cardinals and the 2011 Giants/2012 Ravens. Perhaps they have nothing to do with each other. In the latter two-year span, improved health on defense does a lot to explain improved performance. And in the earlier first two-year span, one of the surprise Super Bowl teams didn't actually win the Super Bowl. Isolate just those two years, and it looks a lot like 1979-1980. In one year, a 9-7 team went on a run that ended in the Super Bowl; in the other year, a wild card team got hot in the playoffs and beat the best team in the Super Bowl, although obviously the 1980 Eagles were not the 2007 Patriots.
The question of small sample size is particularly important in football, where we determine our champion through single-game elimination instead of a seven-game series. You don't need a degree in statistics to understand that an underdog is more likely to win one game than seven. And yet, as I wrote in the Giants chapter of Football Outsiders Almanac 2012, football isn't the only sport where a team that was slightly above-average in the regular season recently won the title. Allow me to quote myself:
If baseball can play three seven-game series and find that the last team standing was the worst of the eight over a 162-game sample [2011 St. Louis Cardinals], and hockey can play four seven-game series and find that the last team standing was ranked 13th out of those 16 over an 82-game sample [2011-12 Los Angeles Kings], how ridiculous is it to think that a team that was one of the worst out of a dozen teams in a 16-game sample can be the best out of those same dozen teams in a four-game sample? Especially when that team actually wasn't the worst of that group in the regular season; they may have gone 9-7, but the  Giants were better than the Bengals or the Broncos, and about as good as the Lions.
This same comment applies to the 2012 Ravens. While it was surprising that one of the DVOA "big five" did not win the Super Bowl this season, the Ravens had the highest DVOA of the seven other teams that made the playoffs.
Small sample size isn't just an issue with the postseason; it's also an issue with the regular season. DVOA tries to mitigate the issue of small sample size by including each individual play during the season as a sample rather than looking at 16 binary (well, almost-binary) win/loss decisions. But it's still a fairly small sample. Our old friend Bill Barnwell did a great job of exploring this issue in his post-Super Bowl column on Grantland yesterday. We want to believe that we know how good teams are going into the playoffs, but the fact is that even the most intricate advanced metric is still just an approximate measurement of a team's true quality, based on the information we have available to us. We accept this and do analysis with an admittedly limited sample because a) it's more interesting than not doing analysis; b) it's more accurate than not doing analysis; and c) I have a mortgage.
The 16-game sample problem is compounded by the fact that a team is not actually the same team over all four months, or in the fifth month of the postseason. I'm not talking about the idea of end-season momentum, whether it's good or bad for a team to go into the playoffs having lost X games in a row or whatever. Instead, I'm talking about the effects of concrete changes like health, playing time, and scheme. It's legitimate to say that the Ravens were a better team in the playoffs than they were for most of the regular season because their defense was healthier. It's legitimate to say that the Patriots and 49ers were not as good as they were during the regular season because they were stuck playing without Rob Gronkowski or with a subpar Justin Smith. (People tend to see comments like this as an excuse, but they are meant to be an explanation.) Baltimore shuffled around its offensive line. San Francisco introduced the pistol after hardly using it at all during the regular season.
So even if we knew the "platonic ideal" measurement of how good a team truly is as of the end of the regular season, we still wouldn't necessarily know the platonic ideal measurement of how good a team was four weeks later when it's time for the Super Bowl. And even if we knew the "platonic ideal" measurement of how good a team was on Super Bowl Sunday, we still wouldn't be able to precisely predict the quality of their play because there is always variation in performance. If we knew that San Francisco was 30 percent better than an average team -- I'm not saying if DVOA suggested this, I'm saying we absolutely knew it thanks to omniscience -- their performance in the next game could still be anything from average to 60 percent better than average. (Those numbers are totally made up.) On top of this, we add the issue of matchups, the fact that a football game doesn't just pit "fourth-best team" against "eighth-best team" but instead involves a lot of smaller battles at all the various positions so that sometimes, "eighth-best team" will be in a better position to beat "fourth-best team" than "second-best team" would be. And on top of all that, we add random events: the bounce of a fumble, an unexpected in-game injury, or a large-scale power outage.
When we combine the issue of randomness with the issue of small sample size, we get at my favorite series of articles that Doug Drinen wrote at the old pro-football-reference blog: "10,000 Seasons." The goal was to answer the question: If we were omniscient and knew the true quality of all 32 teams, how often would the best team actually win the Super Bowl? Drinen simulated 10,000 seasons to find out. Each season assigned a value to all 32 teams based on a normal distribution, then built a schedule and played it out.
The answer: The best team won the Super Bowl roughly 24 percent of the time. To some of you, that may seem low. To a few of you, that may even seem high. But it certainly suggests that more often than not, the title of "NFL champion" does not necessarily imply the title of "best team in the league that season."
[ad placeholder 3]
The surprising result of this experiment may not be that the best single team doesn't win the Super Bowl that often, but that the best teams do not win the Super Bowl as often as you would expect. In more than half the simulations, the Super Bowl champion was not one of the top three teams in the league. In one out of three seasons (36 percent, to be exact), the Super Bowl champion was not one of the five best teams in the league. And three times out of every 20 seasons, the Super Bowl champion wasn't even one of the top ten teams in the league.
Drinen made a number of posts about this experiment, which generally revolved around the idea that a lot of things which might seem ridiculous are still far from impossible. The best team in the league didn't even make the playoffs about one in ten seasons. The worst team in football made the playoffs roughly 2.4 percent of the time. In one of the 10,000 simulations, the worst team in the league won the Super Bowl. Chicago in that simulation went 8-8 despite their "true value" being so low, upset a 15-1 Seattle team in the NFC Championship, and then beat the champion of a relatively weak AFC.
Of course, in real life we don't know the "true value" of every NFL team, but the simulation had a number of interesting results even if we only looked at the effects of random chance in the postseason. A sub-.500 team won its division every 11 or 12 years. In 14 of the 10,000 simulations, a sub-.500 division champion went on to actually win the Super Bowl. That seems absurd, but the simulation suggests there's a 13 percent chance of it happening in the next 100 years, so it isn't completely ridiculous.
Teams went undefeated in the regular season 115 times out of 10,000 simulations, but only 40 of those teams actually won the Super Bowl. I believe that our playoff odds report back in 2007 said something similar; going into the playoffs, if I remember correctly, we had another team other than the Patriots winning the Super Bowl a majority of the time. The unexpected part of 2007 was less the Patriots losing, and more the Patriots losing to the Giants instead of Green Bay or Indianapolis.
Inspired by Drinen's posts, I decided to run my own "10,000 seasons" experiment. Obviously, we don't know the "platonic ideal" value of each team the way Drinen could set things up in his simulations. The closest thing we have is DVOA. So we ran our playoff odds simulation to see what it would have looked like if our preseason DVOA projections had been 100 percent accurate. If we knew for sure that Baltimore really was the eighth-best team in the league, 9.8 percent better than an average team, how often would we expect them to win the Super Bowl?
The results of our simulation look a lot like Drinen's simulations, tweaked because of the specifics of this season's DVOA ratings. The top three teams have a greater than 50 percent chance to win the Super Bowl because their ratings were so high. Denver and New England are higher than Seattle because of the imbalance between the AFC and NFC. Our playoff odds report spits out percentages rather than totals, so I can't tell you if Kansas City or Jacksonville ever managed to win the Super Bowl, but I can tell you that Brandon Weeden was a Super Bowl champion quarterback in at least five of our simulations. So was Mark Sanchez (or maybe it was Tim Tebow?).
And how often does Baltimore win the Super Bowl? Three percent of the time. That's a small probability, but again, not an impossible one.
Of course, it's possible (in fact, likely) that DVOA is not 100 percent accurate when it comes to approximating the true quality of the Baltimore Ravens, especially the Baltimore Ravens of the postseason. They were healthier than they were in the regular season. They were better. Maybe they were 20 percent better than an average team, instead of 10 percent. Heck, maybe they were 40 percent better than an average team, instead of 10 percent. But both our simulation and Drinen's simulation suggest that even if Baltimore was four times better than measured by either DVOA or SRS, their Super Bowl championship still beat odds of roughly 3:1.
[ad placeholder 4]
The idea that there's been a significant change in the relationship between the regular season and the postseason has one other problem: There aren't a lot of good explanations for it. That doesn't mean that it isn't a trend, but it doesn't do a lot to support the idea that it is a trend, either. What are some of the common explanations given for the recent rise of "surprise" Super Bowl teams?
Parity: There's more parity in the NFL in recent seasons, so of course there's more parity in the postseason. The problem with this theory is that it is simply not true. Take a look at the standard deviation of DVOA each year since the salary cap began in 1994. The thin black lines represent a linear trend. You can see that standard deviation has been steadily rising, which is part of why eight of the top ten DVOA ratings since 1994 belong to teams that played in 2004 or later, while the ten worst teams by DVOA include one pre-salary cap team and nine teams that played in 2002 or later. Standard deviation in defense has risen a lot less than offense or overall. I have no idea what that means.
If you don't like DVOA, perhaps you want something a bit simpler? Here's the standard deviation of wins since 1994:
The trend is the same. The idea that parity has gone down in the NFL sounds ridiculous when we think back to the early 90's, when we were still in the era of NFC Super Bowl blowouts and Sports Illustrated put a Dallas-San Francisco NFC Championship on their cover as "The Real Super Bowl." If you actually go back and look at the standings for those years, you'll be shocked how much better the win-loss records are for the best teams of today compared to back then. From 1994-2007, there were 11 teams that went 2-14 or worse. That's less than one per season. In the past five years, there have been ten. From 1993-1995, only two teams went 13-3 or better. In the past three seasons, there have been eight.
More teams in the playoffs: Drinen actually did a simulation to experiment with this idea. First, he did one with no wild cards, only division champions. Then he did one where all 32 teams made the postseason. Neither simulation actually changed things very much. The top team still won the Super Bowl about once every four years. The only difference was a slight increase or decrease in how often teams won the Super Bowl with records such as 8-8 or 9-7.
The four-division format: I believe that under the old three-division format, all of our "surprise" Super Bowl teams would have made the playoffs anyway. That includes the 9-7 Giants of 2011. I don't feel like doing the tiebreakers to see if the 2008 Cardinals would have made it in or not. The big difference would be that these teams wouldn't get home games in the first round. Mike Harris is actually building a simulation that will try to test out the old three-division format to see if it makes any difference.
Randomness of turnovers: I remember reading somewhere the idea that there have been more upsets in the NFL playoffs lately because the best regular-season teams have been more dependent on turnovers, and turnovers are naturally more inconsistent than yardage. I don't think this is true. If someone else wants to take the time to check it, please do. I know that the Ravens were very low in offensive turnovers, and average in defensive turnovers, so they certainly weren't underrated because their regular-season turnover rate was likely to regress towards the mean.
Teams getting healthier for the playoffs: That's not a reason for a trend of surprise Super Bowl champions. That's a reason for each of these surprise Super Bowl champions individually. I don't think there's any link between the health of the 2012 Ravens, the health of the 2011 Giants, and the health of the 2006 Colts.
The salary cap leads to parity: That sounds reasonable, except for two problems. First, as noted above, parity has actually been decreasing. Second, if the salary cap changed things, why did things not change until the salary cap had already been around for more than ten years?
The salary cap makes health more important: That's even more reasonable. This also brings up the question of why things didn't change until the salary cap had already been around for more than ten years, but there's no doubt that before the salary cap, the best teams were able to do more to stock up on depth and thus were effected less by injuries both during the regular season and during the postseason. Anecdotally, it sure does seem like specific players getting healthy has had a bigger effect on the postseason than it did in the past. The same goes for specific players getting injured. Statistically, I have no idea if this is true or not.
It seems to be happening in all sports: Scott Kacsmar wrote about this a few months ago. This I have no answer for.
To sum things up: Right now, my best guess is that the current "surprise Super Bowl champions" trend is a myth. It's not unrealistic to think that in a ten-year period, a couple of Super Bowls will be won by teams that finished the regular season 10-6 or even 9-7. Four out of six years is a bit more improbable, but not outrageously so. That 1989-2000 streak where every single champion was either first or second in Pythagorean wins is probably just as much an improbable aberration as the past six Super Bowls are.
This certainly isn't going to stop Football Outsiders from trying to improve its various advanced stats. It's not going to stop us from trying to be more accurate in forecasting the postseason. And it won't stop us from looking for reasons why the "surprise Super Bowl" trend might exist, if it does exist. But I don't think we're actually going to find them, because they probably aren't there.