Writers of Pro Football Prospectus 2008

05 Feb 2013

10,000 Seasons Revisited

by Aaron Schatz

Once upon a time, the NFL playoffs were fairly predictable. There were upsets, sure. The Super Bowl didn't always pit one No. 1 seed against the other No. 1 seed. However, if you made a list of the top five teams in the league by conventional wisdom, usually one of those teams would end the season with the Lombardi trophy.

You can look at a number of different stats to see that this was the case, ranging from the simple to the complex:

  • From 1978 through 2006, not counting the strike-shortened 1982 season, the Super Bowl champion averaged 12.7 wins. Every Super Bowl champion during that time had at least 11 regular-season wins except for the 1988 San Francisco 49ers. Even every Super Bowl loser during that time had at least 11 regular-season wins except for the 1979 Los Angeles Rams (9-7) and the 1987 Denver Broncos (10-4-1).
  • From 1978 through 2006, 23 of 27 Super Bowl champions finished among the top five teams in Pythagorean wins. No Super Bowl champion finished out of the top ten. From 1989 to 2000, there was a 12-year span where every single champion was either first or second in Pythagorean wins during the regular season.
  • From 1978 through 2006, 25 of 27 Super Bowl champions finished among the top five teams in pro-football-reference's Simple Rating System. The exceptions both ranked seventh: the 1980 Oakland Raiders and the 2001 New England Patriots.
  • DVOA picks up in 1991, but from 1991 through 2006, only four teams won the Super Bowl without ranking in the top three in DVOA. The 2003 Patriots and 2005 Steelers were fourth, the 2006 Colts were seventh, and the 2001 Patriots were 11th.

Of course, you know what's happened since.

  • Four of the last six champions have been either 9-7 or 10-6.
  • Only three of the last six champions have ranked in the top ten in Pythagorean wins.
  • Only three of the last six champions have ranked in the top ten in the Simple Rating System.
  • Only four of the last six champions have ranked in the top ten in DVOA, and none of them have ranked in the top three.

Add this to the 9-7 Arizona Cardinals coming within a minute of winning the title in 2008, and it certainly looks like a trend. For some reason, it seems like regular-season performance has become useless in predicting which team will win a championship. All you have to do is get into the tournament, and you can toss everything else out.

But, as I brought up in the Super Bowl Audibles conversation, what if there is no reason for the trend? What if there is no trend? What if we're just being fooled by a combination of randomness and small sample size?

First of all, the very idea of "four surprise champions in six years" is itself mistaken. It's really just three surprise champions. The 2010 Packers were clearly one of the league's best teams and simply had some poor luck in close games. They ranked second in Pythagorean wins, second in SRS, and fourth in DVOA.

Take out the Packers, and now maybe you don't have a six-year trend where 2009 was the one exception. Maybe you have two different two-year trends, the 2007 Giants/2008 Cardinals and the 2011 Giants/2012 Ravens. Perhaps they have nothing to do with each other. In the latter two-year span, improved health on defense does a lot to explain improved performance. And in the earlier first two-year span, one of the surprise Super Bowl teams didn't actually win the Super Bowl. Isolate just those two years, and it looks a lot like 1979-1980. In one year, a 9-7 team went on a run that ended in the Super Bowl; in the other year, a wild card team got hot in the playoffs and beat the best team in the Super Bowl, although obviously the 1980 Eagles were not the 2007 Patriots.

The question of small sample size is particularly important in football, where we determine our champion through single-game elimination instead of a seven-game series. You don't need a degree in statistics to understand that an underdog is more likely to win one game than seven. And yet, as I wrote in the Giants chapter of Football Outsiders Almanac 2012, football isn't the only sport where a team that was slightly above-average in the regular season recently won the title. Allow me to quote myself:

If baseball can play three seven-game series and find that the last team standing was the worst of the eight over a 162-game sample [2011 St. Louis Cardinals], and hockey can play four seven-game series and find that the last team standing was ranked 13th out of those 16 over an 82-game sample [2011-12 Los Angeles Kings], how ridiculous is it to think that a team that was one of the worst out of a dozen teams in a 16-game sample can be the best out of those same dozen teams in a four-game sample? Especially when that team actually wasn't the worst of that group in the regular season; they may have gone 9-7, but the [2011] Giants were better than the Bengals or the Broncos, and about as good as the Lions.

This same comment applies to the 2012 Ravens. While it was surprising that one of the DVOA "big five" did not win the Super Bowl this season, the Ravens had the highest DVOA of the seven other teams that made the playoffs.

Small sample size isn't just an issue with the postseason; it's also an issue with the regular season. DVOA tries to mitigate the issue of small sample size by including each individual play during the season as a sample rather than looking at 16 binary (well, almost-binary) win/loss decisions. But it's still a fairly small sample. Our old friend Bill Barnwell did a great job of exploring this issue in his post-Super Bowl column on Grantland yesterday. We want to believe that we know how good teams are going into the playoffs, but the fact is that even the most intricate advanced metric is still just an approximate measurement of a team's true quality, based on the information we have available to us. We accept this and do analysis with an admittedly limited sample because a) it's more interesting than not doing analysis; b) it's more accurate than not doing analysis; and c) I have a mortgage.

The 16-game sample problem is compounded by the fact that a team is not actually the same team over all four months, or in the fifth month of the postseason. I'm not talking about the idea of end-season momentum, whether it's good or bad for a team to go into the playoffs having lost X games in a row or whatever. Instead, I'm talking about the effects of concrete changes like health, playing time, and scheme. It's legitimate to say that the Ravens were a better team in the playoffs than they were for most of the regular season because their defense was healthier. It's legitimate to say that the Patriots and 49ers were not as good as they were during the regular season because they were stuck playing without Rob Gronkowski or with a subpar Justin Smith. (People tend to see comments like this as an excuse, but they are meant to be an explanation.) Baltimore shuffled around its offensive line. San Francisco introduced the pistol after hardly using it at all during the regular season.

So even if we knew the "platonic ideal" measurement of how good a team truly is as of the end of the regular season, we still wouldn't necessarily know the platonic ideal measurement of how good a team was four weeks later when it's time for the Super Bowl. And even if we knew the "platonic ideal" measurement of how good a team was on Super Bowl Sunday, we still wouldn't be able to precisely predict the quality of their play because there is always variation in performance. If we knew that San Francisco was 30 percent better than an average team -- I'm not saying if DVOA suggested this, I'm saying we absolutely knew it thanks to omniscience -- their performance in the next game could still be anything from average to 60 percent better than average. (Those numbers are totally made up.) On top of this, we add the issue of matchups, the fact that a football game doesn't just pit "fourth-best team" against "eighth-best team" but instead involves a lot of smaller battles at all the various positions so that sometimes, "eighth-best team" will be in a better position to beat "fourth-best team" than "second-best team" would be. And on top of all that, we add random events: the bounce of a fumble, an unexpected in-game injury, or a large-scale power outage.

When we combine the issue of randomness with the issue of small sample size, we get at my favorite series of articles that Doug Drinen wrote at the old pro-football-reference blog: "10,000 Seasons." The goal was to answer the question: If we were omniscient and knew the true quality of all 32 teams, how often would the best team actually win the Super Bowl? Drinen simulated 10,000 seasons to find out. Each season assigned a value to all 32 teams based on a normal distribution, then built a schedule and played it out.

The answer: The best team won the Super Bowl roughly 24 percent of the time. To some of you, that may seem low. To a few of you, that may even seem high. But it certainly suggests that more often than not, the title of "NFL champion" does not necessarily imply the title of "best team in the league that season."

The surprising result of this experiment may not be that the best single team doesn't win the Super Bowl that often, but that the best teams do not win the Super Bowl as often as you would expect. In more than half the simulations, the Super Bowl champion was not one of the top three teams in the league. In one out of three seasons (36 percent, to be exact), the Super Bowl champion was not one of the five best teams in the league. And three times out of every 20 seasons, the Super Bowl champion wasn't even one of the top ten teams in the league.

Drinen made a number of posts about this experiment, which generally revolved around the idea that a lot of things which might seem ridiculous are still far from impossible. The best team in the league didn't even make the playoffs about one in ten seasons. The worst team in football made the playoffs roughly 2.4 percent of the time. In one of the 10,000 simulations, the worst team in the league won the Super Bowl. Chicago in that simulation went 8-8 despite their "true value" being so low, upset a 15-1 Seattle team in the NFC Championship, and then beat the champion of a relatively weak AFC.

Of course, in real life we don't know the "true value" of every NFL team, but the simulation had a number of interesting results even if we only looked at the effects of random chance in the postseason. A sub-.500 team won its division every 11 or 12 years. In 14 of the 10,000 simulations, a sub-.500 division champion went on to actually win the Super Bowl. That seems absurd, but the simulation suggests there's a 13 percent chance of it happening in the next 100 years, so it isn't completely ridiculous.

Teams went undefeated in the regular season 115 times out of 10,000 simulations, but only 40 of those teams actually won the Super Bowl. I believe that our playoff odds report back in 2007 said something similar; going into the playoffs, if I remember correctly, we had another team other than the Patriots winning the Super Bowl a majority of the time. The unexpected part of 2007 was less the Patriots losing, and more the Patriots losing to the Giants instead of Green Bay or Indianapolis.

Inspired by Drinen's posts, I decided to run my own "10,000 seasons" experiment. Obviously, we don't know the "platonic ideal" value of each team the way Drinen could set things up in his simulations. The closest thing we have is DVOA. So we ran our playoff odds simulation to see what it would have looked like if our preseason DVOA projections had been 100 percent accurate. If we knew for sure that Baltimore really was the eighth-best team in the league, 9.8 percent better than an average team, how often would we expect them to win the Super Bowl?

Team Conf App Conf Win SB Win
DEN 62.6% 38.1% 22.2%
NE 59.1% 33.7% 19.1%
SEA 44.6% 27.5% 15.3%
GB 33.1% 17.6% 9.1%
SF 25.8% 14.5% 7.8%
CHI 19.7% 9.4% 4.4%
NYG 17.9% 7.5% 3.2%
BAL 17.8% 7.6% 3.0%
HOU 21.2% 8.0% 3.0%
ATL 16.2% 6.7% 2.8%
WAS 16.0% 6.9% 2.5%
Team Conf App Conf Win SB Win
CIN 16.5% 6.1% 2.2%
CAR 9.2% 3.5% 1.3%
PIT 8.7% 3.0% 1.1%
MIN 3.8% 1.5% 0.5%
DET 2.8% 1.1% 0.4%
DAL 3.2% 1.2% 0.4%
NO 2.4% 0.8% 0.3%
STL 2.3% 0.9% 0.3%
MIA 3.1% 0.8% 0.3%
SD 3.1% 1.0% 0.2%
TB 2.9% 0.9% 0.2%
Team Conf App Conf Win SB Win
BUF 2.2% 0.6% 0.2%
IND 2.1% 0.5% 0.1%
CLE 1.7% 0.4% 0.1%
NYJ 0.9% 0.2% 0.1%
TEN 0.4% 0.1% 0.0%
PHI 0.2% 0.0% 0.0%
KC 0.1% 0.0% 0.0%
JAC 0.2% 0.0% 0.0%
OAK 0.3% 0.0% 0.0%
ARI 0.1% 0.0% 0.0%

The results of our simulation look a lot like Drinen's simulations, tweaked because of the specifics of this season's DVOA ratings. The top three teams have a greater than 50 percent chance to win the Super Bowl because their ratings were so high. Denver and New England are higher than Seattle because of the imbalance between the AFC and NFC. Our playoff odds report spits out percentages rather than totals, so I can't tell you if Kansas City or Jacksonville ever managed to win the Super Bowl, but I can tell you that Brandon Weeden was a Super Bowl champion quarterback in at least five of our simulations. So was Mark Sanchez (or maybe it was Tim Tebow?).

And how often does Baltimore win the Super Bowl? Three percent of the time. That's a small probability, but again, not an impossible one.

Of course, it's possible (in fact, likely) that DVOA is not 100 percent accurate when it comes to approximating the true quality of the Baltimore Ravens, especially the Baltimore Ravens of the postseason. They were healthier than they were in the regular season. They were better. Maybe they were 20 percent better than an average team, instead of 10 percent. Heck, maybe they were 40 percent better than an average team, instead of 10 percent. But both our simulation and Drinen's simulation suggest that even if Baltimore was four times better than measured by either DVOA or SRS, their Super Bowl championship still beat odds of roughly 3:1.

The idea that there's been a significant change in the relationship between the regular season and the postseason has one other problem: There aren't a lot of good explanations for it. That doesn't mean that it isn't a trend, but it doesn't do a lot to support the idea that it is a trend, either. What are some of the common explanations given for the recent rise of "surprise" Super Bowl teams?

Parity: There's more parity in the NFL in recent seasons, so of course there's more parity in the postseason. The problem with this theory is that it is simply not true. Take a look at the standard deviation of DVOA each year since the salary cap began in 1994. The thin black lines represent a linear trend. You can see that standard deviation has been steadily rising, which is part of why eight of the top ten DVOA ratings since 1994 belong to teams that played in 2004 or later, while the ten worst teams by DVOA include one pre-salary cap team and nine teams that played in 2002 or later. Standard deviation in defense has risen a lot less than offense or overall. I have no idea what that means.

If you don't like DVOA, perhaps you want something a bit simpler? Here's the standard deviation of wins since 1994:

The trend is the same. The idea that parity has gone down in the NFL sounds ridiculous when we think back to the early 90's, when we were still in the era of NFC Super Bowl blowouts and Sports Illustrated put a Dallas-San Francisco NFC Championship on their cover as "The Real Super Bowl." If you actually go back and look at the standings for those years, you'll be shocked how much better the win-loss records are for the best teams of today compared to back then. From 1994-2007, there were 11 teams that went 2-14 or worse. That's less than one per season. In the past five years, there have been ten. From 1993-1995, only two teams went 13-3 or better. In the past three seasons, there have been eight.

More teams in the playoffs: Drinen actually did a simulation to experiment with this idea. First, he did one with no wild cards, only division champions. Then he did one where all 32 teams made the postseason. Neither simulation actually changed things very much. The top team still won the Super Bowl about once every four years. The only difference was a slight increase or decrease in how often teams won the Super Bowl with records such as 8-8 or 9-7.

The four-division format: I believe that under the old three-division format, all of our "surprise" Super Bowl teams would have made the playoffs anyway. That includes the 9-7 Giants of 2011. I don't feel like doing the tiebreakers to see if the 2008 Cardinals would have made it in or not. The big difference would be that these teams wouldn't get home games in the first round. Mike Harris is actually building a simulation that will try to test out the old three-division format to see if it makes any difference.

Randomness of turnovers: I remember reading somewhere the idea that there have been more upsets in the NFL playoffs lately because the best regular-season teams have been more dependent on turnovers, and turnovers are naturally more inconsistent than yardage. I don't think this is true. If someone else wants to take the time to check it, please do. I know that the Ravens were very low in offensive turnovers, and average in defensive turnovers, so they certainly weren't underrated because their regular-season turnover rate was likely to regress towards the mean.

Teams getting healthier for the playoffs: That's not a reason for a trend of surprise Super Bowl champions. That's a reason for each of these surprise Super Bowl champions individually. I don't think there's any link between the health of the 2012 Ravens, the health of the 2011 Giants, and the health of the 2006 Colts.

The salary cap leads to parity: That sounds reasonable, except for two problems. First, as noted above, parity has actually been decreasing. Second, if the salary cap changed things, why did things not change until the salary cap had already been around for more than ten years?

The salary cap makes health more important: That's even more reasonable. This also brings up the question of why things didn't change until the salary cap had already been around for more than ten years, but there's no doubt that before the salary cap, the best teams were able to do more to stock up on depth and thus were effected less by injuries both during the regular season and during the postseason. Anecdotally, it sure does seem like specific players getting healthy has had a bigger effect on the postseason than it did in the past. The same goes for specific players getting injured. Statistically, I have no idea if this is true or not.

It seems to be happening in all sports: Scott Kacsmar wrote about this a few months ago. This I have no answer for.

To sum things up: Right now, my best guess is that the current "surprise Super Bowl champions" trend is a myth. It's not unrealistic to think that in a ten-year period, a couple of Super Bowls will be won by teams that finished the regular season 10-6 or even 9-7. Four out of six years is a bit more improbable, but not outrageously so. That 1989-2000 streak where every single champion was either first or second in Pythagorean wins is probably just as much an improbable aberration as the past six Super Bowls are.

This certainly isn't going to stop Football Outsiders from trying to improve its various advanced stats. It's not going to stop us from trying to be more accurate in forecasting the postseason. And it won't stop us from looking for reasons why the "surprise Super Bowl" trend might exist, if it does exist. But I don't think we're actually going to find them, because they probably aren't there.

Posted by: Aaron Schatz on 05 Feb 2013

93 comments, Last at 13 Feb 2013, 3:17pm by Furious Llama

Comments

1
by Chase (not verified) :: Tue, 02/05/2013 - 5:02pm

Well done.

20
by kilfara (not verified) :: Tue, 02/05/2013 - 8:14pm

Ditto - really excellent work here which the mainstream media should absorb (but of course won't).

2
by Anonymousse (not verified) :: Tue, 02/05/2013 - 5:11pm

The entirety of this "trend" seems to be the Patriots losing to teams that they're 10+ point favorites over.

Also, when people talk about Parity in the NFL, they're not talking about everyone going 8-8. They're talking about teams that win 4 games making the playoffs 2 years later.

5
by IB (not verified) :: Tue, 02/05/2013 - 5:33pm

Or teams that go 1-15 make the playoff one year later.

10
by commissionerleaf :: Tue, 02/05/2013 - 6:09pm

"The statistics don't know that Tom Brady is a no-good choke artist of a choketastic choker in the Super Bowl."

Conventiently ignoring that he set a consecutive completions record in his last Super Bowl loss.

But yeah, the fact that Tom Coughlin appears to basically own Bill Belichick is not irrelevant to this "trend".

36
by RickD :: Wed, 02/06/2013 - 1:58am

"The entirety of this "trend" seems to be the Patriots losing to teams that they're 10+ point favorites over."

That's a weird way to say "Super Bowl XLII". The Pats weren't 10+ point favorites in XLVI.

3
by Jon Goldman (not verified) :: Tue, 02/05/2013 - 5:13pm

"Teams getting healthier for the playoffs: That's not a reason for a trend of surprise Super Bowl champions. That's a reason for each of these surprise Super Bowl champions individually. I don't think there's any link between the health of the 2012 Ravens, the health of the 2011 Giants, and the health of the 2006 Colts."

Well, it is a legitimate point, though in a completely different way: injuries can drastically change the relative "strength" of a team. It's possible that the "surprise" teams were actually as good as they played in the postseason, but injuries artificially deflated their "strength." In other words, good teams can become okay teams when good players are injured, and become good again when they get the players back.

I feel like you made this point in the article.

4
by JIPanick :: Tue, 02/05/2013 - 5:22pm

I'd believe it for the Colts (who were Super Bowl caliber in '03, '04, '05, '07, and '09) and the Ravens (who were Super Bowl caliber in '06, '08, '09, '10, and '11) but not the Giants.

14
by Anonymousse (not verified) :: Tue, 02/05/2013 - 6:56pm

The Giants had a whole bunch of defensive players come back from injury in both 2007 and 2011.

In 2007 the Patriots lost several offensive lineman during the playoffs. In 2011, the Patriots lost the best tight end in football the week before.

Injuries are a big deal.

19
by JIPanick :: Tue, 02/05/2013 - 7:27pm

I'll believe had some injuries and played better in the playoffs than the regular season. I will not believe they were as good as they played in the playoffs.

23
by Danny Tuccitto :: Tue, 02/05/2013 - 8:50pm

Agree with this. This year, BAL got healthy, and SF got hurt (e.g., Manningham, Smiths; people don't seem to appreciate that Aldon had shoulder injury to go along with Justin's triceps/elbow injury). Last year, NYG got healthy, and both SF and NE got hurt.

Like Aaron said, not excuses, just possible explanations.

25
by dmstorm22 :: Tue, 02/05/2013 - 9:02pm

Who are these o-lineman? All five of their regulars (Light, Mankins, Koppen, Neal, Kaczur) started the Super Bowl. Neal got hurt during and was replaced by uber-sub Russ Hochstein.

59
by Harry (not verified) :: Wed, 02/06/2013 - 12:25pm

Logan Mankins was apparently playing on a torn ACL in the Superbowl and was nowhere near full strength.

60
by dmstorm22 :: Wed, 02/06/2013 - 12:46pm

He was hurt in Super Bowl XLII also?

6
by Will Allen :: Tue, 02/05/2013 - 5:37pm

Excellent stuff. We tend to massively overrate how much information we have, in judging the quality of teams. We also tend to forget how frequently weird crap happens, for the sophisticated reason that the universe is filled with weird crap happening, because the universe is so freakin' huge.

Hang out in a big casino on a busy night, and you'll see plenty of weird stuff. In general, the casino management doesn't care,because they are compiling thousands of trials per second, they know the odds favor them, so as long as they don't let anybody cheat, or play in a way that tilts the odds the other way, their count room will be filled sufficiently. Even so, there are plenty of stories of publicly traded companies that operate casinos taking quarterly earnings beatings, when some weird crap happened, and the whales at the Baccarat table got fat for three weeks.

I'm really looking forward to seeing the DVOA data stretching back to 1978, when the modern passing game began, to get a better sense of what was going on pre-salary cap, compared to post salary cap. Then, I'll look forward to seeing the DVOA numbers (if the data is sufficient) for the rest of the 70's, to get a better sense of how the game was different, when passing was much, much, more difficult. Even with that data, however, we'll still be working on hunches, if better informed hunches. Really knowing things for a fact is really, really, really, hard. Really.

9
by Anonymous1 (not verified) :: Tue, 02/05/2013 - 6:02pm

I am sure losing bets is a much smaller part of a casino companies earnings and more likely declining earnings is due to costs outpacing profits due to less business than expected or that new wing costing a great deal but not doing the expected business.

12
by Will Allen :: Tue, 02/05/2013 - 6:31pm

When they file their reports, they frequently include explanations for the numbers. It is not terribly infrequent that the explanation is that the suckers got lucky for a few weeks. A five second search turned up this example....

http://finance.yahoo.com/news/nevada-casino-revenue-down-10-154947913.ht...

.....which specifically mentions the whales at the baccarat table.

22
by jebmak :: Tue, 02/05/2013 - 8:38pm

LOL

45
by justanothersteve :: Wed, 02/06/2013 - 9:30am

I like your comment about weird crap happening. It seems most years, especially recently, there is a play in the playoffs that you wouldn't expect to happen that sends a team to the next round or helps a team win the Super Bowl. Sometimes the plays are so unique, like the Immaculate Reception, it only takes a short phrase to recall it. The Catch. Red Right 88. The Tuck Rule. Fourth-and-26. The helmet catch. Last year, it was Tebow-to-Thomas. This year, it was Denver's lapse on defense with 31 seconds allowing Jacoby Jones to score on a 70 yard pass though I don't know if there that play has a special name yet.

When I say weird, I don't mean to denigrate any team or player. Just saying these were unexpected things that happened.

48
by Will Allen :: Wed, 02/06/2013 - 9:38am

It's why we watch the games.

7
by @felixpotvin (not verified) :: Tue, 02/05/2013 - 5:43pm

"hockey can play four seven-game series and find that the last team standing was ranked 13th out of those 16 over an 82-game sample [2011-12 Los Angeles Kings]"

While I agree with the premise of the article, this isn't quite right.

On the team level (ie. not individual hockey shooters, but a team as a whole) there doesn't seem to be team shooting talent. Most teams score at about the same rate on a number of shots.

LA scored at about half that rate in the regular season in some sort of comical 82 game cold streak.

From the NY Times: http://slapshot.blogs.nytimes.com/2012/06/10/luck-and-shooting-accuracy-...

Their excellence in the playoffs would have come as no surprise — and indeed, the Kings are striking on 9.3 percent of their postseason shots while skating five-on-five. That’s the third-best shooting percentage among the 16 playoff teams, and much better than the median even-strength strike rate of 6.5 percent.

One might say that as unlucky as the Kings were in their regular-season shooting, they have been quite lucky in their playoff shooting.

Meanwhile if you look at Fenwick events (shots on net plus shots at the net that missed completely) teams with more Fenwick events than their opponents control the puck more which leads to more shots on net which leads to more goals which leads to more wins.

LA was #4 in Fenwick events (with the score tied, another important adjustment) all year, the puck just didn't go in for them. In the playoffs their shooting percentage normalized in a hurry, they got hot goaltending and they absolutely buzzsawed everything in their path on the way to the Cup.

Fenwick by score: http://behindthenet.ca/fenwick_2011.php?sort=6&section=close

15
by IAmJoe :: Tue, 02/05/2013 - 7:01pm

Holy shit, it's Chemmy!

I'm disappointed that you didn't use Battle of California to try to make a point about the Kings.

Last year's Kings were a very strange team. While you're generally right about "most teams score at about the same rate on a number of shots", the Kings being an exception to that for 82 games last year (and for the start of this year) just doesn't feel right to me as some sort of colossal cold streak. I don't know how to explain it on a team with Brown, Kopitar, Richards, and Carter, but maybe they just aren't as good as most teams with the shots that they do get. They're making "sample size" and "cold streak" look less and less likely, though it is certainly possible. I feel like Terry Murray and/or Daryl Sutter has to figure in somehow to that issue as well, though.

82
by @felixpotvin (not verified) :: Wed, 02/06/2013 - 6:42pm

Hello.

8
by DomM (not verified) :: Tue, 02/05/2013 - 6:01pm

What about variance? Would a seventh or eighth ranked team that mixes crushing victories with miserable failures be more likely to put together a Super Bowl run than a seventh or eighth ranked team that turns in roughly the same quality of performance week after week? If variance is increasing (and I have no idea whether it is or not) might surprise winners become more common?

28
by nottom :: Tue, 02/05/2013 - 9:55pm

I've wondered about this as well, it certainly seems to fit the Ravens well this season and certainly helps explain the 2009 Cardinals although I don't think the Giants fit that mold as much as just matching up well against NE. I certainly feel like a high variance team is certainly a more dangerous underdog than a team like Cincinatti, Houston, or Atlanta that were just "above average" teams that went out each week and performed "above average"-ly.

11
by Salur (not verified) :: Tue, 02/05/2013 - 6:18pm

Being relatively young, I don't have a great grasp on how things were pre-cap, but is it possible the increasing importance of the quarterback position is contributing to the variance in team quality? If the variance between teams at one position is greater than the variance between teams of the entire team taken as a whole (I assume that this is true, but I don't have any explicit backing for it), and that one position becomes more and more important, could that explain increased difference between good and bad teams?

Put in a different way, randint(0,10)*0.5+randint(3,7)*0.5 will have less variance than randint(0,10)*0.8+randint(3,7)*0.2, where the 0-10 value is QB skill and the 3-7 value is team skill. This also assumes that QB skill varies more than team skill (does anyone have any thoughts on whether or not this is true?) and that QB importance has been growing over time (which I take to be true, but, again, I don't have a great feel for the pre-cap years).

I know the recent XP on "Are Super Bowl QBs getting worse" touched on this general area, but the tread on standard deviation surprised me, and I'm looking for an explanation.

41
by Eggwasp (not verified) :: Wed, 02/06/2013 - 8:50am

How can superbowl QBs be getting worse when Jim Plunkett has 2 rings?

49
by WesM :: Wed, 02/06/2013 - 11:02am

Now, now. In today's passing friendly offense you'd see a much improved Jim Plunkett, in fact I'd say he'd be ... ... your starting Arizona Cardinals QB?

61
by This Guy (not verified) :: Wed, 02/06/2013 - 1:01pm

I was thinking something similar as I was reading this as well.

The reason parity doesn't seem to be increasing is because it is aligned with an increase in the importance of the quarterback position.

Because there isn't really a free market for the top tier quarterbacks the few teams that do have them will be consistently good for a while. While the other teams are forced to live in the squalor of parity

13
by iron_greg :: Tue, 02/05/2013 - 6:47pm

the simulation doesn't account for how bad Cam Cameron was at offensive game planning and how elated all of us were when Jim Caldwell took over. We'll get next year to get a bigger sample on the Ravens under JC vs. CC so I acknowledge the smallness of the sample but in 2012, its an understatement to say our offense didn't perform hilariously better (albeit with the admission that our Offensive line shuffle was an impact too and that is not necessarily on the coordinator). Until proven otherwise next year, I for one am like many Baltimoreans in believing Cam Cameron was a weighted anvil on our offense

17
by Hurt Bones :: Tue, 02/05/2013 - 7:25pm

Flacco was very polite when asked about the change to Caldwell, not willing to say anything bad about Cameron. But he answered about what he felt one of the biggest differences. He said the plays come into the huddle a lot quicker which gives him more time to set and read the defenses and audible if necessary.

The play book was the same. The efficiency of using it just increased when Caldwell took over. The results started in the Giants game and went on from there.

Throw in the improving health and a new offensive line ( 3 new starters in effect) and the Ravens are not the same team that looked so mediocre in the middle of the season. And they certainly were mediocre, but there really isn't anyway to reliably predict this improvement.

79
by mrh :: Wed, 02/06/2013 - 6:00pm

I think it is reasonable to speculate that the change to Caldwell, while it had some potential negatives, had a significant sportive effect regardless of whether he is a "Better" OC or play-caller than Cameron. It changed the Ravens' tendencies and made it harder for opposing teams to prepare for, especially because the Bengals game was meaningless and the Broncos regular season game was in the first week of his tenure while he was still working the bugs out. Even the 49ers only had 1 regular season and 3 playoff games worth of data to go on, and only the 3 with the revised o-line in place.

90
by DGL :: Thu, 02/07/2013 - 11:38am

Or possibly, the Hawthorne Effect.

27
by Jerry :: Tue, 02/05/2013 - 9:38pm

The point of the article is that no simulation, however detailed, is going to be able to capture everything and reliably predict a champion. Aaron, as well as others, is trying his hardest to come up with a number that describes how good a team is, but team quality is a moving target, and some changes are impossible to evaluate until after the fact. Everything worked out for the Ravens, and they earned their championship, but if Rahim Moore takes a better angle, some other city is celebrating right now.

16
by IAmJoe :: Tue, 02/05/2013 - 7:17pm

Drinen's 10,000 seasons pieces are some of my favorite sports reading. Fantastic stuff, and it's so awesome for helping demonstrate the point to people who don't get it.

I feel like part of the thing that's happening in sports is that the range between the top and the bottom of the league is growing, and I think it's doing it in most leagues. I'm totally pulling this out of my ass, but my personal guess is that this is the impact of the information age and the differences between front offices and the way they pursue marginal advantages. A 12 win team with a few marginal advantages turns into a 13 or 14 win team, and those wins necessarily have to come from somewhere else, turning 4 win teams into 2 or 3 win teams. You end up with teams that would've previously been the best of a very good group (a 12 win team) looking instead like they're on this whole other tier (a 14 win team). That makes it look more shocking when they get upset by the teams that weren't supposed to be on the same level.

I also think that this effect will largely be temporary, as teams figure out the other team's marginal advantages, and the truly horrible teams (and their truly horrible front offices) are finally pushed out of the league because "all he does is win" is no longer the valid strategy it was when they established their career 20 years ago.

33
by zlionsfan :: Tue, 02/05/2013 - 11:53pm

I don't think it'll be temporary at the bottom of the league.

You can get rid of a coach or a general manager, but you can't get rid of an owner. (Oh, what an unfortunate truth that is.) Incompetent owners have been present in all sports and, barring changes in how franchises can change hands, will always be present ... leagues can attempt to limit the abilities of the best franchises to remain excellent, but they can't force terrible franchises to be decent, and some owners won't let anyone stop them from doing what they do best: producing terrible teams.

For example, looking at worst winning percentage over three consecutive seasons, the 10 worst teams come from the '30s, '30s-'40s (two teams, the Eagles and Steelers, had bad runs from 1939 to 1941), '50s-'60s, '60s, '70s, '80s, and '00s. Change it to worst five-year periods, and it's '30s-'40s, '40s, '60s-'70s, '80s, and '00s-'10s.

Even if you look at seven-year periods, while specific franchises do bubble to the top (the pre-WWII and wartime Cardinals and Eagles, plus the Bucs from the '80s), there are teams from other eras right behind them (Millen Lions, the Rams through 2011, the early-AFL Broncos, WWII-era and pre-merger Steelers, early-'70s Saints).

There will always be owners who know much less than they think about football, and as long as they get to keep their teams, those teams will continue to win fewer games than they ought to.

68
by AceRothstein (not verified) :: Wed, 02/06/2013 - 3:53pm

I'm unclear why Aaron is using a metric that he finds unreliable, wins/losses, to illustrate diminished parity. For a site that uses Pythagorean and DVOA to evaluate and forecast teams, I would have thought these would be the preferred measurements.

Do DVOA and Pythagorean support the concept of diminished parity as well?

85
by Duff Soviet Union :: Wed, 02/06/2013 - 8:09pm

Um, he specifically mentioned that according to DVOA, parity is diminished recently. He just mentioned wins to show that DVOA isn't alone in saying that.

18
by gregg (not verified) :: Tue, 02/05/2013 - 7:27pm

How about the increased emphasis on the passing game in recent years? More variance on passing plays (ie, more weird stuff can happen) than on running plays. Closely related to the increased turnover theory but it's more than turnovers. When the nfl was a running league, it was more about brute athletic skill and less vagaries ( eg, tipped balls, marginal holding or DPI calls) were involved.

The exact opposite is happening in tennis where the top 3-4 guys always win, because it's become more about brute athletic skill, particularly movement. Back in the 80s, there was more variance because it was more about finesse, touch, luck, strategy, and the conditions (weather or court) could swing a match. Now djokovic, federer, nadal, Murray are just physically superior in a game where that's now all that matters.

29
by IAmJoe :: Tue, 02/05/2013 - 10:49pm

I would say that this probably also contributes significantly. I feel like the dependance on passing leads to more volatility. If I throw the ball to a receiver 5 yards downfield, my results for the play are pretty much 0, 5, or more than 5 (you don't often lose yards after a catch, but you do frequently gain some). The result of that might be, on average, similar to a running play, but that's including a lot of 10's and a lot of 0's. That's volatile, compared to a running play which may be 0, 1, 2, 3, 4, 5 yards.

Obviously thats grossly simplified, but I feel like the concept is real, and worth investigating. With a league more invested in passing, volatility would increase. This allows for teams performing at the far ends of the spectrum more often 14 wins and 2 wins instead of 12 and 4), and over the course of the playoffs, with single eliminations, you could see more volatility in game-to-game results, causing more upsets.

21
by ammek :: Tue, 02/05/2013 - 8:27pm

I agree that the consistent success of the top-rated teams in the 1980s and 1990s is more surprising than the current situation. Curiously, between 1994 and 2007, fully 13 of the 14 Superbowls pitted the top seed from one conference against a non-top-seed from the other. What would be the probability of that?

Nine of those top seeds were from the NFC, versus only four from the AFC. But since 2007, the NFC has become unpredictable. The top seed has lost its opening playoff game four times in six years. This may well be related to parity, since the NFC has now had 11 different champions in 12 seasons. Three-quarters of NFC teams have appeared in either one or two conference championship games during that period: that's 12 NFC teams compared with only five AFC teams.

My theory about parity is to do with the amount of information available to coaches, scouts and players. Technology has made it easier to sift through vast quantities of film relatively quickly, and I probably don't need to mention the advances in analytics. Trends, strategies, flaws and wrinkles buzz around the league in a matter of days rather than years. Coupled with a great deal more openness to new ideas, that makes it harder to get a start on other teams in terms of strategy alone. Remember how long it took the west coast offense to spread around the league? As late as the mid-1980s, the Packers' offense was still being designed to beat the Butkus Bears.

42
by Eggwasp (not verified) :: Wed, 02/06/2013 - 9:01am

The mid-80s Packers offense was designed?

24
by Anonymous Jones :: Tue, 02/05/2013 - 8:53pm

Really nicely done. I tried to make a similar point about "true" ability of quarterbacks a week ago, but I like your "platonic ideal" explication much better.

I also love that you make the point that the data can never get us certainty, but it's still better than not looking at the data. I often notice that many people can find problems with one method or another but never stop to think that the alternative is worse. No matter how many problems data-driven accentuations of analysis will create, the alternatives are almost certainly worse.

26
by Danny Tuccitto :: Tue, 02/05/2013 - 9:22pm

Or, “It has been said that democracy is the worst form of government except all the others that have been tried.”

30
by Paul R :: Tue, 02/05/2013 - 11:31pm

Where do you guys keep the Football Outsiders Supercomputer? Is it in one of those rooms like the HAL-9000? Do you have to wear a clean-suit and shoe covers when you enter the chamber?

I'll bet it's in an underground bunker...

38
by Karl Cuba :: Wed, 02/06/2013 - 2:04am

I always pictured an Atari 520 ST in a corner of a basement with Aaron feeding it a succession of floppy disks and beating it with a broom when it crashes.

40
by RickD :: Wed, 02/06/2013 - 2:14am

Please can it have a card reader?

43
by Eggwasp (not verified) :: Wed, 02/06/2013 - 9:03am

Its on Salisbury Plain - there is a car park and a gift shop

52
by Independent George :: Wed, 02/06/2013 - 11:25am

I think everything is a lie; I think the "computer" is in fact a 90 year-old Indian mystic that Aaron keeps chained up in his basement with nothing but piles of graph paper, #2 pencils, and back issues of The Sporting News to keep him company.

Also, Raiderjoe is his long-lost grandson, who is secretly trying to find and release him from captivity.

31
by nat :: Tue, 02/05/2013 - 11:45pm

There are simply a lot of upsets no matter what system you use to rank NFL teams. Just look at divisional games, say, in this year's AFC. There were eleven split series, meaning at least eleven of those 48 games have to be an upset in the best possible ranking system. Add in the likelihood that at least one series sweep was due to upsets against the platonic ideal, and you get an upset rate of more than 25%.

If 25% of games are upsets due to factors not accounted for in the rankings, another 25% probably have the favored team winning by those same unaccounted for factors. So 50% (or less) of games are decided by factors we could hope to include in an ideal ranking.

That's it. Simply being the better team will give you the win at most 50% of the time. For the other half, go with match ups, swagger, momentum, dramatic irony, FOMB curses, Roger Goodell conspiracy theories, choking chokemasters, divine intervention, spy cameras, and wanting it more.

39
by RickD :: Wed, 02/06/2013 - 2:11am

There were eleven split series, meaning at least eleven of those 48 games have to be an upset in the best possible ranking system.

Only if you
a) think teams are fixed in quality level all season long
b) think home field advantage doesn't exist
c) think weather effects are negligible
d) think injuries are negligible
e) think variation in time between games is unimportant
f) think that it's impossible that any two teams be at exactly the same quality level

There's no reason to think that the best possible ranking system consists of assigning a single, unique, number to each team which is the sole determinant of which team should be favored on a given week.

"Simply being the better team will give you the win at most 50% of the time."

Let's look at a logical implication of your statement.

"Simply being the worse team will give you the win at least 50% of the time."

Apparently you're better off being the worse team than the better team.

You might want to reconsider this argument.

47
by Jeremy Billones :: Wed, 02/06/2013 - 9:36am

... as RickD gives a textbook demonstration of the excluded middle.

66
by RickD :: Wed, 02/06/2013 - 3:25pm

Please elaborate.

Did you find the original argument convincing?

50
by nat :: Wed, 02/06/2013 - 11:07am

You have misunderstood the 50% mark. It's not that the underdog wins 50% of the games. It's that you could usefully think of a game as consisting of two coin flips (very simplistically, as a rule of thumb).

The first decides whether this game will be determined by the teams' over quality ranking. If not, the second one determines which team will win due to other factors, such as home field advantage, weather, changes in scheme, players suddenly getting better or injured, fumble-luck, unusual plays, bad clock management, specific match-ups, odd officiating, etc. Those are factors that are not, should not and cannot be included in a single, simple, full-season ranking of the teams.

In a more complex analysis, we would see that the greater the difference in overall team quality, the lower the chance of some other factor determining the result of the game. The first "coin-flip" is about fifty-fifty only on average.

That fifty-fifty mark is not too surprising, really. A fair number of games are played between teams that are close in quality, when we would expect most games to be decided by specific match-ups, fumble-luck and similar factors that don't apply to general rankings of quality.

Interestingly, because the playoffs exclude the worst teams, we should expect the overall quality of the teams to determine the winner of a playoff game less than 50% of the time. More than half of playoff games will be decided by all those other factors, which is one reason playoff games are better entertainment.

67
by RickD :: Wed, 02/06/2013 - 3:38pm

Sorry, but I didn't "get" that you were using a "two coin-flip" model. It hardly seems like the best way to model these kinds of matchups.

I don't see how you're going to, from observations, decide whether a team won "because it was better" or "due to other factors." Also, you still have the problem of treating team performance as a constant, when all evidence suggests otherwise.

It seems more useful or natural to model performance level as a distribution centered around a mean level, with variation that is team-specific, and the compare the actual performance levels generated for any given game. This modeling approach would include all of the extra variables into each team's distribution, and would produce a smoother, simpler model.

Of course, the mean level for each team should be moving through time.

72
by nat :: Wed, 02/06/2013 - 4:48pm

It's a simplistic model. But it makes this point: you don't need to use complex (or even simple) statistical models to show that ANY ranking system that applies to the whole season and considers a higher ranked team to be favored over a lower ranked team will get the result of the game wrong at least 25% of the time.

You just need to look at the actual games. Any time you get a "beat loop" then at least one of the games must be an "upset" against the ranking system. No stats required.

It's easiest to see this in divisional games, all consisting of two game series, which is what I did. You could do the same for the entire schedule, looking for longer beat loops. But there's no reason to think that upsets are more or less likely in divisional games.

I did this same check a few years ago and got about the same result. Based on actual results, NFL games MUST be upsets against the best possible static full-season ranking system at least 20-25% of the time. (The actual percentage depends on which season you look at. Around 25% seems typical, and makes for easier discussions.)

Certainly, we could use a ranking system that changed from game to game. In that case, the best theoretical model gets it right 100% of the time. We simply rank the winner of each game as better than the loser. Boring, and a bit silly. We could use a system that wasn't a ranking system, but a who beats who system, rock/paper/scissors style. That would let use define the "favorite" as whoever won, eliminating all non-divisional "upsets". Again, boring and silly.

Somewhere in between are ranking systems that adjust themselves during the season based on facts available prior to each weekend, and systems that consider non-ranking factors such as home field advantage to predict outcomes. The first is very hard to do objectively, and fails at giving you a ranking for the whole season. The second is changing the usual definition of an upset: when a weaker team beats a stronger one. It's not wrong to change the definition; it's just playing semantics. I doubt it changes the numbers that much anyway.

Generally, I think you've missed the point of my exercise: the "better" team (in the full-season ranking sense) doesn't always win the game. They lose about 25% of the time, maybe more.

This isn't just my belief or pet theory or an aphorism. It's about as close to a mathematical truth as we get around here.

32
by MJK :: Tue, 02/05/2013 - 11:52pm

I haven't read Drinen's work yet, but one question that stands out to me is this: He may assume a normal distribution of team ratings, but what does he consider the variance to be on a given day. I.e. if a team that is a "32" plays a team that is a "54", what is the probability that the "32" will pull off the upset? If the standard deviation in play on a given day is, say, +/-3, it's almost zero. If the standard deviation is +/-50,000, then it's roughtly 50% and the game is essentially a coin toss.

That's the thing about trying to use any statistical method to predict game outcomes...you're not just trying to infer how good a team is on average from a small sample size...you're also trying to infer how variable that team is.

I'm starting to think more and more that a statistical football analysis tool should have some concept of a "confidence interval" built into it...

37
by Danny Tuccitto :: Wed, 02/06/2013 - 1:59am

Well, yeah. That's at the heart of the problem here. SF's DVOA variance during the regular season was 21.7%, which means their standard deviation was 46.6%. Using their average game DVOA (i.e., a game-level metric) instead of their full-season DVOA (i.e., a play-level metric), and assuming a normal distribution for game DVOAs, then 95% of their games should fall between 29.6% ± 93.2% = (-63.6%, 112.8%). Same method for BAL produces 9.8% ± 79.0% = (-69.2%, 88.8%). Not helpful at all.

If we're looking for the 95% CIs for DVOA as an estimate of "true ability" for the two teams, then SF's is 29.6% ± 10.8% = (18.8%, 40.4%) and BAL's is 9.8% ± 7.8% = (2.0%, 17.6%). But this is obviously saying something different than above. Here, we're saying (crudely) that SF is "truly" a better team than BAL. Above, we were saying (again crudely) that a game between SF and BAL is a coin flip.

I don't know what the solution is, but this is definitely one of the main problems that need to be solved.

64
by spenczar :: Wed, 02/06/2013 - 2:53pm

All this is saying is that DVOA is not good at predicting individual games - it is designed to predict the mean number of wins for a team. That shows up in your statistics here: the confidence intervals about a team's performance in just one game are quite large, but the CIs for the quality of the team are much better. That's partly an outcome of the law of large numbers (season-long DVOA is easier to estimate than a single game since it is an average of 16 games, so regression to the mean helps you out), but it is probably also a result of the design of DVOA, which seems to be based on fitting to number of wins and pythagorean wins.

To get around it, you could start by fitting your models to predicting the outcomes of specific games. In this case, you'd build a 32x32 table of each team's probability of defeating each other team. Each probability would depend on all other probabilities. A Monte Carlo Markov Chain approach would probably be the way to try to fit that, although it might be rough due to sample size, so you might have to include the previous season's data.

edit to follow up: googling quickly reveals someone who has done this sort of thing for NCAA basketball, apparently pretty successfully: http://www2.isye.gatech.edu/~jsokol/ncaa.pdf

34
by MJK :: Wed, 02/06/2013 - 12:14am

Here's another thought about how variance fits in. Say teams have a platonic ideal of "goodness" on a scale of 1 to 100. Team A is a good, very consistent team, with an average rating of 60, and a standard deviation of +/-5. Team B is a below average and highly variable team with an average of 45, and a standard deviation of +/-30.

Who is the better team? Obvioulsy, the answer is A, right? If A plays B, assuming the performance on a given day is normally distributed, there is roughly a 68% chance that A will win.

But what if they both play team N a very good and consistent team with an average of 72 and a standard deviation of +/-4? A has just a 3% chance of beating N. B has a 19% chance of beating N. B is more likely to get an upset of an elite team than A, even though A is the better team on average.

In the regular season, you'd rather be A (maybe it's a Marty Schottenheimer coached team?). In the post season, if you have to get by N to win the SB, you'd rather be B.

Maybe what happened this year is that Baltimore was more of a "B" team. For whatever reason, they showed a much greater swing of ability between regular season games and the playoffs...they were a high variance team who got three (well, at least two) of their "good" games in a row.

35
by Jhaeman :: Wed, 02/06/2013 - 1:53am

I am *stoked* to see that my Browns finally won the Super Bowl, even though it took 10,000 seasons . . .

44
by Eggwasp (not verified) :: Wed, 02/06/2013 - 9:07am

Had the simulation been run in 1994, your Browns won their 2nd SB last Sunday.

46
by Jim in Pgh (not verified) :: Wed, 02/06/2013 - 9:32am

I'd love to have the time to track the movements of training staff members from team to team, and try to correlate that with injury rates.

51
by Paul M (not verified) :: Wed, 02/06/2013 - 11:15am

Let's place the playoff teams into two pools. If every game were a coin flip then the first pool-- the 4 teams with a bye-- would each have a 12.5% chance of winning the SB and the collective chances would be 50%. The 8 teams without byes would have a 6.25% chance and their collective chances would also be 50%. So the odds that three teams from the second pool would win the championship in consecutive years would be 1 in 8-- hardly earth-shattering. We would further estimate that it would have most often happened three times in the 20+ years since FO has been measuring DVOA.

But of course home field advantage and quality of teams over the regular season plus the extra week's rest must mean something. Let's say, for argument's sake, that the actual odds of the bye winners is 15% each, or a collective 60%, as opposed to 5% on average for the first round teams and a collective 40%. Now the odds have changed and it is a 1 in 16 chance of the 2010-12 phenomenon occurring-- and the run was overdue since it should happen at least once and probably twice in this time frame. But still within the xpected parameters. (The outlier would have been the previous 20 years, of course-- see Below)

Yet if you used the actual playoff results since, say, 1980, and gave some extra weight to real vs. theoretical-- maybe the odds we all would have placed as of the beginning of the 2005 season would have been more like 20% for each of the 4 bye teams, for a collective 80%; and only 2.5 for the other 8, for a collective 20%. After all at that point only the Broncos in 1997 and the Ravens in 2000 had won 3 games in the current 12 team format on their way to a SB victory. Now you are dealing with a 1 in 125 year event for three teams in a row (or 6 in 8 years-- Steelers, Colts, Giants twice, Packers and Ravens) to have won from the lower rated pool-- a veritable hundred year flood.

Well hundred year floods and other extreme weather events are happening with stunning regularity these days, largely because conditions changed. The metrics may have been flawed as well, but the biggest change is the impact human beings are having on the planet. We have changed the odds bigtime. In a much less dire case, the fact that 6 of the last 8 SB winners-- after only 2 of the previous 20+-- have come from the "non-bye" pool and have had to win 3 games to reach the final contest tells me that conditions have changed. To say that the previous 20+ years were the flukes, and now normalcy has arrived is spitting in the face of a pretty powerful wind. The why is less a story of what an advanced metric system such as DVOA is missing and more a story of certain factors rising in importance: QB play, internet film study, pro-offense rule changes, the salary cap, etc... That's the mystery.

57
by cjfarls :: Wed, 02/06/2013 - 12:17pm

The simplest answer is still that there is no actual change. Even at your made-up odds of 1 in 125, thats not a compeltely absurd case. There is plenty of reasons why the best team shouldn't always win in the NFL (20-year trend results not withstanding), so its not unreasonable to think that the 20-year trend is more likely the abberation rather than the shorter recent trend... particularly since the 30year data overall looks about like what we'd expect if there were simply no change in an underlying randomness.

Post-hoc fishing the data such as you do to make claims of changes/causality, without an underlying theory, is bad science. If climate change scientists simply were looking at the trend data without the underlying theory/knowledge of GHG forcing, we all should laugh them out of the building. The power of climate science is that the observations match a well-established theory... and that no other theory has ever been developed that better explains the data.

The strength of science is having a hypothesis, and then testing it. We can't simply fish the data, see wierd trends, and then claim there are big changes in causality due to factors we can't identify. Wierd trends happen.

Obviously the data needs re-evaluated over time as more observations are collected... and folks can/should/will continue to posit new theories (like the 3 vs. 4 division change, or the increase in passing/variability) that we can test overtime. When we can match a theory to data, then we have something... but until then, making big claims about changes based on small samples is unjustified.

58
by Will Allen :: Wed, 02/06/2013 - 12:20pm

And you really need a lot of observations. A couple hundred data points just doesn't cut it, when dealing with complex phenomena.

65
by Paul M (not verified) :: Wed, 02/06/2013 - 3:04pm

"simplest answer?" Maybe. But believe me 32 GMs, head coaches, and coordinators don't get paid to do simple or to wait on a big enough sample size. In the real world people have to make decisions-- often, perhaps mainly, on incomplete information-- intuiting from experience and facts on hand a proper course of action. If I am a key decision-maker in a NFL franchise, I am going to pay a lot of attention to the fact that 6 out of the last 8 SB winners came from Week One playoff participants, after it had basically almost never happened in the first 25 years of the three week format. And I am going to try to determine if there are any common traits of the teams that failed or succeeded in order to guide my team's future. Which will/do they think more likely: that we go back to high-seed dominance for 25 years, or that Week One teams keep winning? I think that answer is self-evident.....

69
by Will Allen :: Wed, 02/06/2013 - 3:58pm

Based on that reasoning, you'd end up banning all your high stakes baccarat suckers, because they got hot for a few weeks, and end up losing a very profitable business. No, you can't make a statement of confidence, as to what is more likely, based on the last 25 or so data points, out of a grand total of 225 or so data points. You keep writing of 25 years of playoffs as if it is a large amount. It's nothing, or close to nothing, really.

76
by Paul M (not verified) :: Wed, 02/06/2013 - 5:47pm

"You know nothing, Jon Snow". In the great scheme of the cosmos, of course that answer is correct. All I am saying is that I find it unsatisfying to throw up one's hands and say "not enough data points"-- and the people getting paid major coin to figure out this sport don't have the luxury of doing same. They must act-- they must intuit-- they must adapt.... we might as well throw all the numbers out then....

87
by Will Allen :: Thu, 02/07/2013 - 12:34am

You are simply putting a sheen on what is more accurately called a wild-assed guess. Fine. Just don't label it as if it is something informed by actual knowledge. Personally, I think 5 million a year plus benefits, is a kinda' steep price for throwing darts.

(edit) That was a little more harsh than I intended, but, again, as frustrating as you may find it to be in the dark, that is the norm for any complex inquiry where the variables are large in number and can't be controlled, regarding a phenomena we have observed only a few times, relatively speaking. We do ourseves no favor when we fool ourselves into thinking we can, wih any confidence, determine what is the likely cause of outcomes which comprise an even smaller subset of what is a small set of outcomes to begin with. Seriously, you'd likely go broke pretty quickly with that approach in any complex enterprise, absent some phenomenal luck.

Anyways, suppose you are right in your hunches. What wouldit change in terms of constructing a roster, that deviates from conventional wisdom employed currently throughout the league? Just drawing on the set of tournments that you say indicates a real change, what have most concluded, with regards to what is needed to win the last game?

1. You need to make the playoffs, and since shooting for a wild card berth with 9 or 10 wins may, with some run of the mill bad luck, leave you with 8 or less, you really need to plan on winning your division.

2. You better take the field in the vast majority of your games with the confidence that your qb won't be badly outperformed by the opposing qb, given the centrality of passing to winning.

3. You better add pieces which compliment that qb performance, either receivers that get open quickly, or an o-line which can give time for the receivers to get open.

4. You need a defense which doesn't put your qb in the position where he has to become very predictable.

5. You need special teams which don't screw up too badly.

No, thse rules aren't foolproof; the Giants won a title in 2007 by having a defensive front which made opposing qbs non-productive. However, this is widely accepted wisdom already, and there is nothing that you suppose that has happened recently which, if proven beyond a shadow of a doubt, would do much to change this conventional wisdom.

84
by Jerry :: Wed, 02/06/2013 - 7:57pm

Front offices that chase the latest trend are the ones that have trouble. By the time they install their Wildcat, or zone-read option, defenses have figured it out, and it's time for the next brilliant idea. Whereas teams with reasonably consistent philosophies, and competent front offices, do better at staying atop the standings. If something continues to succeed over time, everyone in the league will adopt it, or parts of it.

53
by Will Allen :: Wed, 02/06/2013 - 11:35am

You are pulling the "actual odds" out of thin air (I don't mean this as criticism), and with a tiny sample, you have no way to know if any number of unquantifiable factors, like, for instance, a 27% turnover of offensive personnel on the eve of the playoffs, or the home field advantage being negated by a cold snap, which renders the home field qb with nerve damage far less effective, than the younger road qb with the big gun, are more important that any theories pertaining to the nature of the game, or roster construction, changing.

55
by Paul M (not verified) :: Wed, 02/06/2013 - 11:51am

Sure, or the offensive coordinator's son dying the week before the first playoff game, or Manningham and Tyree making improbable catches, or Welker dropping a key pass, or Brady throwing a bad pass, or a punt returner fumbling two punts, or the official swallowing a whistle, or a defensive line matchup that favors the "inferior" team-- or, for that matter, Santonio Holmes getting his feet down a half inch short of the sideline, etc, etc...

But those things are always true-- and were true when the "best" teams were winning all those years as well. I am simply positing that when one trend lasts 20+ years, and another kicks in for a much smaller period-- the chances that A was the outlier are much smaller than that B is the outlier or that C is now operative: conditions have changed in a way that negates the prior trend. You can choose to believe as Aaron hypothesizes that this isn't a trend-- but to do so means you are taking the least likely road. You're Robert Frost-- and you've stopped in the woods on a snowy evening-- and you just decided to thrash through the trees.

56
by Will Allen :: Wed, 02/06/2013 - 12:10pm

You are writing as if 20 years of playoff games can yield enough data to give meaning to the word "trend", when we are looking at won loss results as the data point, and the data point is affected by hundreds, if not thousands, of factors. To draw on your analogy, it would be as if I took the temperature readings from 200 days over the last 20 years to establish that global warming was not occurring. Or, to draw on my analogy, when the suckers get lucky, over several thousand deals, for a few weeks at the baccarat table, the managers of the casino, wisely, do not not start suspecting that a meaningful trend has been established, and start banning the players, or firing the dealers.

Could your theory be correct? Sure, and I wouldn't be shocked if it was. We don't have anywhere close to enough information, however, to have any meaningful confidence in the theory.

62
by Andrew Potter :: Wed, 02/06/2013 - 1:04pm

Also, looking solely at Super Bowl winners is at best an incomplete way to look at the playoffs. I remember looking quickly through the numbers after the Conference Championships - at that point, the superior DVOA team was something like 9-3 in this year's playoffs, with the three "upsets" being two road wins for Baltimore and the NFC #1 seed beating the #5 seed at home on a last-minute field goal. So yes, Baltimore's an unexpected winner, but the totality of the playoffs saw the team with superior DVOA coming into the game win most of the time. 9-4 in picking outright winners isn't a bad record, even if the overall winner isn't the team we'd expect, and that's before we look at further information like personnel, health, weather, playoff officiating, and specific matchups.

Note that I'm drawing that from memory so may have missed something somewhere, and I haven't looked at other years to see how this worked out in head-to-head matchups in those years.

54
by AZ Wranglers (not verified) :: Wed, 02/06/2013 - 11:45am

The four-division format: I believe that under the old three-division format, all of our "surprise" Super Bowl teams would have made the playoffs anyway. That includes the 9-7 Giants of 2011. I don't feel like doing the tiebreakers to see if the 2008 Cardinals would have made it in or not. The big difference would be that these teams wouldn't get home games in the first round. Mike Harris is actually building a simulation that will try to test out the old three-division format to see if it makes any difference.

I have been thinking about this issue a lot the past few months, and I believe that realignment has a lot to do with whatever we are seeing the past few years. My hypothesis is that realignment has skewed strength of schedule across the league, which impacts who makes the playoffs, and hence who even has a shot to win the Super Bowl. But saying its just "realignment" doesn't go far enough. In my view it is actually a number of related factors that got us to realignment, such as:

1. Expansion. Going from 30 to 32 teams is what necessitated realignment in the first place, and my guess is that adding the Browns and Texans created more need for starting players than talent to full them. I think a case can be made that there was not enough starting level talent to support 30 teams, let alone 32. Interestingly, this could have the effect of amplifying the importance of scouting and development, so that good teams get better while bad teams get worse, and we get more of both and less of a "middle class."

2. Realignment and Reduced Importance of Divisional Games. In the old setup of 3 divisions with 5 teams each, every team had to play 8 games against division rivals, half the season. With realignment that number dropped to 6. Now, in theory you can go winless in the division and still finish 10-6 by winning every other game you play (and maybe even win the division). Before, the best that would get you is 8-8, and while a division title there would still be possible, it would be harder. Also, this increases the importance of non-divisional games between conference opponents to win the head-to-head tie breaker, especially since you only get one game against those foes. In today's NFL, if you want to make the playoffs you are better off winning all of your non-division conference games to win the tie-breaker.

3. Formulaic Schedules. Combined with realignment comes the new scheduling format where you play one non-conference division, one in conference division, and the remaining two games are in conference against teams that finished in the same spot you did in your conference. In some cases this can give pretty big advantages as far as wins go: if you're in the AFC and play the NFC West in any year but this one that might be good for 4 wins, while a team playing the NFC North may only get 2. Plus, if you finished last in your division the prior year and made modest improvements and you can beat the other 3 last place teams in your conference, the playoffs may be in reach (assuming those teams did not improve much).

I think these factors have combined to significantly alter the NFL landscape. Division games are not as meaningful because there are less of them, there are more teams who are not competitive due to expansion, and the vagaries of the schedule formula could give one team several games against weak teams. Add these effects together and I'd expect to see more weaker teams make the playoffs, and then, as the article says, maybe one or two of them will catch fire at the right time.

In other words, I think realignment contributes to this trend because it has changed the landscape for teams to make the playoffs, and what we are seeing is teams making the playoffs that probably would not have in a 3-division, 30 team league.

Curiously, the AFC seems largely immune to this development, since until this year the AFC has only one time been represented by a team that was not the Patriots, Colts or Steelers (3 of the league's model franchises). Rather, it is the NFC that has seen more variation, especially with 8-8 teams winning divisions and hosting playoff games.

One final comment that I think supports this. I was a Cardinals season ticket holder in 2008. That year they won the division by beating up on weak division foes, and went 3-7 outside the division. In the regular season they lost to the Giants, Eagles, Vikings and Redskins. If the league in that year was still a 30 team league and the Cardinals played in the NFC East, I don't think there's any way the Cards would have won 9 games. I don't even see them winning 6 of 8 against the Giants, Eagles, Cowboys and Redskins. They would have finished somewhere between 6-10 and 8-8 and everyone would have wondered what happened to Kurt Warner. .

63
by Revenge of the NURBS (not verified) :: Wed, 02/06/2013 - 1:44pm

I found your point #3 interesting, mostly because I'm a Colts fan and I'll concede that their schedule contributed to their turnaround this year (not all of it, but some).

I didn't look at every year, but I couldn't see any sort of clear demarkation between pre-realignment and post-realignment gaps in strength of schedule. All I looked at was the highest and lowest SOS (by DVOA) and took the difference between them. Interestingly, the gap does fluctuate by large amounts (sometimes 10%) from year to year. So there are definitely years where one team had a much much easier (>25%) schedule than another. But the fluctuation didn't start in 2002; it goes back as far as DVOA goes back. In fact, the largest gap I saw was in 1991, when there were only 28 teams. Of course, this is obviously not an exhaustive look at things.

80
by DisplacedPackerFan :: Wed, 02/06/2013 - 6:11pm

On point 1, I'm fairly convinced the lack of talent isn't on the players physical side of things. I don't have access to all the data, but I'm very convinced that the average player is bigger stronger faster. I think there are in fact enough starting level players, and that the physical gifts of the worst starting player today is significantly better than the worst starter 20 years ago.

I also think they get better coaching at lower levels, but I'm not convinced they get better coaching or utilization at the highest level. I'm pretty sure that the average trend will show specialization is more common (snap count data going forward can help with this) in that more players see the field in various packages than ever before. Teams that don't match the players to the jobs correctly, or the jobs to the players correctly, are then out executed, with everyone being more talented, a mistake is actually more costly.

So as others have mentioned the value of scouting and development become more important if the margins on physical talent are shrinking, being able to coach, or evaluate, the mental talent becomes more important. Judging football IQ (not general processing like the wonderlic does), and being able to increase that, becomes important. Or being able to make sure that your guy is doing a job he can do. Capers biggest fault over the years has been not being able to adjust as well as needed to only having one good outside pass rusher. The concepts he always tries to use tend to be predicated on pressure and keeping coverage time shorter. He hasn't always been able to scheme around the player weaknesses he's had.

The other analogy I like to use for this is simply Brownian Motion. Bigger, faster, more skilled players means you have a system with more energy in it, which makes it harder to predict the state of the system. You'll have more collisions, some of them resulting in more motion for a particle than you would get with less energy. It means that you'll get more variance and less predictability. It means that any player is more likely to make a big play (be that offensively or defensively) and more so in the passing game where plays become more isolated and the speed of the ball has a practical upper limit (it can only fly so fast).

70
by QW (not verified) :: Wed, 02/06/2013 - 4:12pm

Here is 1 possible theory.

Many of the Best DVOA teams of recent years (NE for example), seem to use efficient, precision based Passing Offenses. Stats also show that in the playoffs, Ref call Pass Interference/Holding at a much lower rate than in the Regular Season.

It is quite possible that the difference in rule enforcement is playing a part and that the teams that dominate under the set of regular Season rules are not the same teams that can bests take advantage of how the Playoffs are called.

71
by Will Allen :: Wed, 02/06/2013 - 4:19pm

Again, perfectly plausible. It's one I'd like to be true, because I wish running the ball would become a bigger part of NFL oeffenses again, since I really like watching the teamwork of a good offensive line when it is allowed to be aggressive, and move defenders off their mark. As much as I want it to be true, I can't, unfortunately, have much confidence that it is true.

I will say this. I really liked this year's playoffs, because it brought o-line play back into prominence.

73
by QW (not verified) :: Wed, 02/06/2013 - 4:58pm

I know it is hardly a bastion of statistical info, but on Mike and Mike the other day, they were saying that according to ESPN Stats Inc, that PI/Holding in the playoffs are only called at like 50% the Regular Season rate (or something to that effect) and at an almost impossibly low rate in late game situations (0.8% I believe).

This may or may not lead to what we have seen, but I sure know that if I was playing a team like NE that DVOA loves (since DVOA loves those type of offenses over big play offenses), I would much prefer playing them in an environment more conducive to clutching and grabbing their WRs to upset their timing patterns than in an environment where I was more likely to be penalized.

74
by dmstorm22 :: Wed, 02/06/2013 - 5:02pm

While that is true, the refs also call offensive holding far less in the playoffs, which somewhat mitigates the fewer DPI/IC calls on defenses.

75
by Karl Cuba :: Wed, 02/06/2013 - 5:33pm

Which would result in the maximum benefit for teams that look to throw deep as the line can keep the qb upright and the receivers will have time to get off the grabby defenders.

This is what irritates me so much about the 'let them play' approach, shouldn't the biggest games have the same rules as the rest of the year?

77
by Will Allen :: Wed, 02/06/2013 - 5:47pm

The "let them play" philosophy indicates to me that the rules and their application for the vast majority of the games are inherently deficient. How about we plainly give defenders the latitude to play the game in a manner which keeps defense competitive, while strictly enforcing actions which exceed that latitude?

81
by Karl Cuba :: Wed, 02/06/2013 - 6:11pm

I take a different message, that the custodians of the game are more interested in proffering a product that avoids controversy than defending the integrity of the game on its biggest stage. They'd rather have a non-call of a penalty than give the impression of interference in order to maintain the rules of competition they normally use.

78
by Will Allen :: Wed, 02/06/2013 - 5:48pm

misplaced

83
by JMM* (not verified) :: Wed, 02/06/2013 - 7:45pm

I am reminded of Steven Gould's "Full House." He examines the dissapearance of the .400 hitter in baseball. The short answer is the talent distribution in the league has narrowed and improved. The reduction in standard deviation controls over the improvement. I'm wondering if there is a similar effect operating here - the improvement could be in the quality of the average and mid- performance.

88
by Aaron Brooks Go... :: Thu, 02/07/2013 - 11:05am

I wonder if he examined the effect that decreased playing field size has had on baseball. I suspect it was far easier to be a singles hitter playing for average in the Polo Grounds and some of that era's other enormous fields than it is in some of the bandboxes you see today.

I'm not sure even baseball has the play-by-play data from the Cobb-Hornsby era, though, to reliably track BABIP.

86
by MITCH (not verified) :: Wed, 02/06/2013 - 8:34pm

I think your off-base with this article.

Free agency and possibly the pass happy era has weakened the best teams, sure more teams post great records during the regular season, this began 3 or 4 years after free agency began in 1993.

One can see a big difference in teams, their performances and the overall league in general, from before and a couple of seasons after free agency began.

And those differences seem to be peaking right now, I could write a 30 chapter book on all the differences.

What we are seeing today is more and more weak no.1 and 2 seeds that are weak statistical teams that do not posses any of the common denominators of past SB Champs.

Honestly, no disrespect but I think your reaching a bit here because your DVOA meteric is weak.

This season, DVOA called the Patroits one of the best teams in the DVOA era, even though they were out-played in a very key effiency stat, ave gain per pass att. Teams out-played in this key stat have never been consider a great team of any 20 year period in history. Find one if you can, don't think you will.

Interesting that DVOA called 4 teams over the past 3 seasons as the best of the DVOA era and those 4 teams combined to go 2-4, not 1 team produced a winning record or made the SB. And half those teams were one & done in the playoffs.

89
by Aaron Brooks Go... :: Thu, 02/07/2013 - 11:17am

The 1965 Bills, the 1999 Tampa Bay Bucs.

91
by JMT (not verified) :: Fri, 02/08/2013 - 11:18pm

It may be that there's a greater number of talented players in the league now relative to the number of teams. There are so many good quarterbacks in the nfl right now that Super Bowls can be won by quarterbacks who are arguably not elite, like Eli Manning and Flacco.

92
by Subrata Sircar :: Wed, 02/13/2013 - 2:09am

The biggest issue that I see is that people seem to think that the chances of the best team losing to the worst team are 1 in a million, when in reality they're much more like 1-in-20 - say, the Patriots losing to Arizona.

And if the chances of the best losing to the worst are 1-in-20, the chances of the best losing to the 10th have to be closer to 35%. The issue is mostly that football teams vary only by small amounts in true-quality relative to each other, to the point where other factors play a much larger role than we think.

93
by Furious Llama (not verified) :: Wed, 02/13/2013 - 3:17pm

I really do hope you're right about this, Aaron. I've been watching football fairly closely since about 2004 and I've been slowly getting more and more frustrated by the lack of relationship between success in the regular season and success in the post season.

Honestly, it's a problem I always ascribed to other sports (#8 seeds in hockey seem to be randomly successful). What on earth is the point of the regular season if a team can get it together at the right time and place and then win the whole thing.

Even more frustrating is that I'm probably wrong about this anyways. After all, the 2011 giants had to beat the Falcons, Saints, 49ers and the Patriots to win the Superbowl. The 2012 Ravens beat the Colts, Broncos, Patriots and 49ers. In both seasons, I'm not sure a more difficult route was possible.

In conclusion: someone call the WAAAAHmbulance for me...