07 Jan 2014
by Danny Tuccitto
Almost six years ago, Football Outsiders published a column by Bill Barnwell titled, "Why Doesn't Bill Polian's S--t Work in the Playoffs?" Eight days ago in the comments section of our DVOA ratings column, reader pm asked if we would be running an update of Barnwell's research. Well, it just so happens that, as part of FO's 10th year anniversary, we were already planning on running a series of articles throughout the offseason updating some of the seminal research findings that helped propel the site to where it is today, and form the basis of our "Pregame Show" essay. So I thought, "What the heck! Give the people a taste of what's to come."
As the title of Barnwell's column lays bare, his research was a football adaptation of Baseball Prospectus' "Why Doesn't Billy Beane's S--t Work in the Playoffs?" essay from the book Baseball Between the Numbers. The central conceit of both was that the untimely demise of highly seeded teams might be explained by the idea that predictors of regular season success are different from (and often in direct conflict with) predictors of postseason success. Both articles attempted to find some kind of postseason "secret sauce" -- areas where a team should improve if it wanted better playoff results than its record would otherwise forecast. (As an aside, if only he had known in January 2007 that Polian would later call us morons, reveal that he doesn't get statistics, and actually win the Super Bowl, Barnwell might have recast the title role.)
In the NFL, for instance, research based on the regular season shows that offensive performance is more predictive and more consistent than defensive performance, which in turn is more predictive and more consistent than special teams performance. But what if an analysis based on postseason stats showed that defense and special teams is more predictive than offense? What's a cap-constrained general manager supposed to do in that case besides use a quantum accelerator to leap into the body of Bobby Beathard? Well, unfortunately for GMs everywhere -- at least those without best friends named Al and Ziggy -- that's exactly the situation Barnwell described six years ago: DVOA splits for defense and special teams were more predictive of postseason success than offensive splits. (For reasons that will become apparent shortly, I'm not going to list his more detailed results here. Feel free to click through to the original article, though.)
But of course, two crucial aspects of Barnwell's analysis have obviously changed since 2007. First, historical DVOA was only available for the 1997-2005 era back then, and we've since added it for 1989-1996 and 2006-2012. That development gives us more data to work with, and also allows us to see whether or not predictors of postseason success have changed over time. Second, we've normalized DVOA to put it in the context of a season-specific league environment, so even the data available to Barnwell at the time is more valid now than it was back then.
Besides including more data, I also made a couple of methodological improvements, one of which addresses issues with our measure of playoff success, while the other simply indulges the "hardcore stats" side of my brain. With respect to the former, Barnwell used a measure of playoff performance called Playoff Success Points (PSP), which he adapted to the NFL from what Baseball Prospectus used in their analysis of MLB. All Barnwell's NFL version entailed was assigning two points to each team for a home playoff win, three points for a road playoff win, and five points for a Super Bowl win. Using this system, for instance, three teams earned the maximum 14 points by winning three road games and the Super Bowl: the 2005 Pittsburgh Steelers, the 2007 New York Giants, and the 2010 Green Bay Packers.
PSP was fine as a first step in this kind of analysis, and Barnwell freely admitted it wasn't ideal, welcoming future improvements. Well, the future is now, and so I'm going to fix its biggest flaw: the assumption that playoff games are created equally from a win probability perspective. For starters, the fixed ratio of road win points to home win points implies that home teams win 60 percent of the time in the playoffs, and that this is true of every game. However, even in the 1997-2005 data set Barnwell used, home teams won 67 percent of the time, and win probabilities based on the Vegas line averaged 65 percent, ranging from 34 percent for the New Orleans Saints against the St. Louis Rams in 2000 (which the Saints won) to 89 percent for the Minnesota Vikings against the Arizona Cardinals in 1998 (which the Vikings won). Furthermore, these probabilities change from round to round. For instance, over that same time frame, home teams had a line-based expectation of 61 percent in the Wild Card round (winning 67 percent of the time), an expectation of 68 percent in the Divisional round (winning 78 percent), and an expectation of 65 percent in the Conference Championship round (winning 44 percent).
Piggybacking off that idea, a related problem is that having a static five-point reward for winning the Super Bowl implies that, at the start of the playoffs, every team has the same chance of doing so. Now, Barnwell specifically addressed that critique in the original piece -- and kudos to him for acknowledging it -- but I still think it doesn't pass muster. His argument was that it's reasonable for a team that wins three road games but loses the Super Bowl to score the same PSP as a team that wins two home games and the Super Bowl. And it does seem reasonable at first glance -- even with the common sense knowledge that it's harder for a No. 6 seed to make the Super Bowl than it is for a No. 1 seed to go all the way. On second glance, though, the question becomes, "Well, how much harder is it, exactly?"
Spend way too many hours of free time figuring out the math, and you learn the answer: It's about three times harder, and that renders Barnwell's PSP proposition unreasonable. Without boring you with details, if you assume that every home team wins 60 percent of the time and that the Super Bowl is a 50-50 game (both of which are wrong for any specific matchup of two teams, but they're what PSP assumes), and you plot out every possible trajectory for the six seeds in a conference, it turns out that the No. 1 seed has an 18 percent chance of winning the Super Bowl, whereas the No. 6 seed has a 6 percent chance of even getting there. And if you change the home-team assumption to 67 percent (i.e., to something slightly more in line with reality), the likelihoods diverge even more: 22 percent for the No. 1 seed winning the Super Bowl, but only 4 percent for the No. 6 seed winning its conference.
The fix for this involves allowing win probability to vary across games. The solution I devised is based on that old statistics standby: observed minus expected. First, I went to Pro Football Reference (PFR) and got all the necessary data for playoff games from 1990 to 2012. (Even though we have DVOA stats for 1989, I'm starting with 1990 because that's the year the NFL added a sixth playoff team in each conference.) Then, I used the model that PFR introduced this season, which is based on the Vegas line, to calculate each playoff team's win probability for each of the games they played. Next, I simply subtracted the number of games each team was expected to win from the number of games they actually won to produce a statistic we'll call "Observed Playoff Wins Minus Expected Playoff Wins (OPWMEPW)." Just kidding, let's go with "Playoff Success Added (PSA)?" What? That acronym's taken? Alright, fine, then it's "Playoff Wins Added (PWA)."
According to PWA, here are the 12 most overachieving and 12 most underachieving playoff teams since 1990:
You'll recall that the 2005 Steelers, 2007 Giants, and 2010 Packers scored the maximum according to PSP. And not surprisingly, each of them appears in the top 12 according to PWA. However, it's clear from the table that the Giants' run was much harder than those of the Steelers and Packers. In each of their four games, the Giants were no better than a 3-to-2 underdog, and they became bigger underdogs with each successive round: 40 percent at Tampa Bay in the Wild Card round, 30 percent at Dallas in the Divisional round, 29 percent at Green Bay in the NFC Championship game, and 18 percent against New England in the Super Bowl. Meanwhile, the Packers were no worse than a 3-to-2 underdog, and actually were favorites in both the NFC Championship game and the Super Bowl. Pittsburgh was also a favorite in two games, including the Super Bowl.
The ability of PWA to quantitatively differentiate between PSP peers is also an advantage on the other side of the ledger -- perhaps even more so. That's because, according to PSP, every team that doesn't win a playoff game gets 0 points, and there were 117 of them from 1990-2012. In other words, PSP considers over 40 percent of playoff teams in the past quarter-century to be equally bad even though we know that a No. 1 seed losing to a No. 6 seed in the Divisional round is much worse of an outcome than a No. 6 seed losing in the Wild Card round. To wit, 10 of the 12 biggest playoff underachievers according to PWA were heavy home favorites after a first-round bye; and that's the way it should be. The only two exceptions are the 2010 New Orleans Saints, who infamously lost to Beast Mode and the 7-9 Seattle Seahawks, and the 1996 Buffalo Bills, who lost as an 8.5-point home favorite to the same Jaguars team that crowned the Broncos as PWA's top underachiever the following week.
So with a more valid playoff success measure in tow, all that's left to do is calculate correlations between PWA and the hundred or so regular-season DVOA splits we have in our Premium database, and answer the following two questions:
To answer all of these questions, I added one methodological wrinkle because statistical inference tests are a crutch. Namely, in order for me to conclude that a DVOA split was predictive, the correlation had to have a p-value less than or equal to 0.05. So, without further ado, below is a table showing PWA correlations for each of the three time periods provided that at least one of them was statistically significant. It's sorted in the best way possible to delineate which DVOA splits were important during each time period, and the color of the shading corresponds to the correlation's level of significance (i.e., p ≤ 0.01 is darker green if this split leads to more playoff success and darker red if this split leads to less playoff success, p ≤ 0.05 is lighter green or lighter red, respectively, and nonsignificance is unshaded):
|Pass Defense, 1st Down||-0.006||-0.290||-0.069|
|Defense, 1st Down||-0.042||-0.260||-0.049|
|Run Defense, 2nd Down||0.105||-0.252||-0.050|
|Special Teams, Variance||-0.008||0.251||0.035|
|Defense, Red Zone||0.082||-0.249||-0.061|
|Run Defense, Unadjusted||0.084||-0.230||0.015|
|Special Teams, Punt Returns||0.402||0.219||-0.154|
|Special Teams, Unadjusted||0.153||0.198||-0.002|
|Run Defense, Weeks 10-17||-0.010||-0.198||-0.029|
|Defense, Tied/Winning Small||0.137||-0.194||-0.048|
|Run Defense, Weighted||0.013||-0.187||0.019|
|Pass Offense, Weeks 1-9||0.273||-0.086||-0.213|
|Offense, Weeks 1-9||0.250||-0.075||-0.170|
|Offense, 2nd Down||0.248||-0.003||-0.265|
|Special Teams, Weather Points||-0.228||0.020||-0.148|
|Offense, Winning Small||0.227||-0.078||-0.267|
|Pass Offense, 2nd Down||0.227||0.042||-0.262|
|Offense, 1st Quarter||0.213||-0.037||-0.339|
|Pass Offense, Unadjusted||0.209||-0.091||-0.292|
|Offense, 3rd Down||0.195||-0.116||-0.246|
|Pass Offense, 3rd Down||0.185||-0.118||-0.261|
|Pass Offense, Weighted||0.141||-0.107||-0.292|
|Offense, 1st Half||0.165||-0.078||-0.280|
|Pass Offense, Weeks 10-17||0.118||-0.017||-0.260|
|Pass Offense, Red Zone||0.107||0.049||-0.248|
|Offense, Weeks 10-17||0.091||-0.059||-0.245|
|Offense, Tied/Losing Small||0.067||-0.098||-0.222|
|Offense, Late & Close||0.165||-0.130||-0.220|
(Before moving on, here are a few more notes about reading the table. First, remember that since DVOAs get lower as defenses get better, playoff success for better defensive DVOAs is shown by negative correlations. Second, the "Momentum" split is just the difference between the unit's weighted DVOA and unweighted DVOA, so a positive correlation means teams playing better towards the end of the season had more playoff success. Third, "Unadjusted" means VOA, which is not adjusted for opponents. Finally, "Special Teams, Weather Points" is a measure of how much weather and altitude was responsible for a team's success on special teams. It will be high for Denver and dome teams, and low for cold-weather teams other than Denver.)
With respect to comparing the results for 1997-2005 using PWA to those using PSP, the details are slightly different, but the general conclusion is the same. Of the 15 statistically significant correlations in that time period, none involved offense. To boot, the closest any offensive DVOA split came isn't even on the table because it also wasn't significant for the other two time periods (strength of schedule at -0.175). Focusing in on the DVOA splits that were predictive of PWA from 1997 to 2005, four matched up with Barnwell's column: First-down pass defense, first-down defense, red-zone defense, and away defense. This isn't surprising when you consider the teams that overachieved during that time. At -74.4%, the 2002 Tampa Bay Buccaneers remain the best first-down pass defense in DVOA history, and they posted +1.55 PWA during that postseason. The second-best pass defense from 1997 to 2005 was the 2000 Baltimore Ravens (waaaaaay behind the Bucs at -35.3% DVOA), and they ended up with the highest PWA of that era (+2.15).
In terms of our second question, the pattern of correlations leaves no doubt that the recipe for playoff success has changed over time: There wasn't a single DVOA split that was statistically significant across all three time periods, and only eight of the 46 in the table overlapped across two time periods.
What's more, it's as if we're looking at three distinct eras of different s--t working in the playoffs. (Again, Barnwell's timing was impeccable, writing his piece about DVOA correlations that ended up being mostly inapplicable to eras before or after.) I already discussed the "defense plus special teams" recipe for 1997-2005, so let's focus on the other two. From 1990 to 1996, playoff success was most enjoyed by teams with a good punt return unit and a good overall offenses that slumped towards the end of the regular season. The poster child for that era was its first champion and owner of its highest PWA (+1.71): the 1990 New York Giants. New York finished the regular season ranked seventh at 10.5% Offense DVOA, but their Weighted Offense DVOA was only 4.9% because of a -33.3% showing in a Week 13 loss to San Francisco (Breathe, Danny. Breathe.), and Dave Meggett propelled their punt return unit to a No. 1 finish (+10.3 net expected points).
The last seven postseasons have been like Bizarro 1990-1996: Offense is important again, but it's bad offenses that have won more games than expected. Of the 24 statistically significant correlations for this time period, only Total VOA doesn't have "offense" in the name. And yet, every one of those offensive DVOAs has a negative effect on playoff success. The 2007 New York Giants -- I'm sensing a theme here -- had the highest PWA since 1990 (not just from 2006 to 2012), but ranked 18th with -1.1% Offense DVOA. The 2009 New York Jets -- seriously, what's with the New York thing today -- amassed +1.06 PWA despite finishing the regular season at -12.5% Offense DVOA (ranked 22nd). Meanwhile, the 2010 New England Patriots, who currently own the second-best Offense DVOA of all time (so far) were one and done thanks to the N+1 incarnation of that Jets team despite being a 3-to-1 favorite to win the game. Finally, the team with the worst PWA of this era (-0.79) was the 2007 Indianapolis Colts, they of the 22.2% Offense DVOA and Divisional round exit at the hands of Billy Volek.
The fact that having a good offense -- especially a good pass offense -- seems to be a recipe for playoff failure these days is puzzling to me for two reasons. First, if that's the case, then why isn't having a good defense -- especially a good pass defense -- part of the recipe for success? Of 126 DVOA splits, the most influential defensive correlation ranks 28th (-.191 for front zone) and pass defense doesn't show up until 48th (-.129 for Weeks 10-17). Second, and more importantly, what the hell's going on out here? Anyone who is either in tune with NFL stat analysis or has happened to watch an NFL game over the past few years knows that having a high-octane pass offense, usually on the shoulders of an elite quarterback, is the shortest distance between showing up and winning. Over the past seven years, however, 14 of the 18 teams that finished the regular season in the top 3 of Pass Offense DVOA underachieved in the playoffs, including the last six No. 1s. Of course, in the irony to end all ironies, the only No. 1 pass offense to win the Super Bowl during this period was the 2006 Indianapolis Colts. After 2,500 words, it turns out Bill Polian's s--t comes up smelling like roses to this moron.
If I had to guess, I would point the finger at two culprits: sample size and missing variables. Regarding the former, my sample for the 2006-2012 correlations comprised 84 teams, and that's woefully small in this era of big data. That said, I had no problem finding that good offense led to playoff success from 1990 to 1996, which involved an identical sample size. Therefore, it's probably more an issue of missing variables. I've made a couple of crucial improvements to Barnwell's analysis, but what's really needed is to see how well regular-season DVOA splits autocorrelate with postseason DVOA splits. It might very well be that bad regular-season offenses are sleeping giants these days, raising their games (for whatever reason) during the playoffs. In other words, this might be a case of statistical mediation: Postseason success may very well depend on good offense, but it's the worse regular-season offenses that are more likely to be good come playoff time.
To finish things up, I'm going to apply what I've learned from this analysis to the 2013 playoffs. For fun, I'll apply it two different ways: (1) Assuming the 2006-2012 postseasons are the most predictive, and (2) assuming this postseason is best predicted by an amalgam of the previous 23. Under both assumptions, we don't know the final Vegas lines past the Wild Card round, so in the spirit of Baseball Prospectus' original foray into the topic, I'm just going to create a composite score for all 12 teams using the following method: (1) Include rankings only for those DVOA splits that were statistically significant over the time period assumed to be the most predictive, using some common sense discretion when it comes to overlapping splits; and (2) weight the included rankings by the magnitude of the correlation. (If you want more details about the weighting procedure, ask me in the comments.)
So, for the 2006-2012 method, after paring down the 24 statistically significant DVOA splits that appear in the table, I ended up with these 11 DVOA predictors of playoff success:
You might have read that and yelled, "But at least five of those involve things we know to be nonpredictive in general!" If so, you're right... in general. Some of these splits are kind of random, but we're talking about the playoffs, where randomness abounds. In small sample applications, I don't mind including things that tend to only work in a small sample.
For the second method, I'll use rankings for the following eight DVOA splits, which are based on a correlation analysis of data from 1990 to 2012 (listed from strongest to weakest significance, direction in parentheses):
Below is a table showing the 2013 playoff teams ranked from first to 12th in expected playoff success according to the two methods I just described:
In the AFC, the sans-Polian Indianapolis Colts are the team most likely to overachieve during this year's playoffs, especially now that they've eliminated the team second-most likely to overachieve. The Colts ranked in the bottom half of the NFL for each of the six most influential DVOA splits from 2006 to 2012 (see above) for which low rankings are advantageous: 16th in first-quarter offense, 20th in weighted pass offense, 20th in offense when winning by a touchdown or less, 16th in second-down offense, 22nd in third-down offense, and 24th in red-zone offense. They also ranked 15th or worse in four of the five bad-is-good splits I used from 1990 to 2012. Speaking of which, the full-sample method was bullish on the Chiefs because of their No. 1 Weighted Special Teams DVOA, their No. 1 punt return unit, and their No. 2 kick return unit. Too bad these "random" variables ended up playing far less of a role than another "random" variable: injuries.
Meanwhile, both methods hate the chances of both the Chargers and the Broncos, so the Patriots-Colts winner has the inside track to a Super Bowl berth -- at least according to this analysis. Of the 19 DVOA splits I'm looking at across both systems, Denver is on the wrong end of 18. For the 11 bad-is-good predictors based on 2006-2012, they rank fourth or better in all of them. And for the 1990-2012 predictors, the only bright spot is a slumping offense (-6.6% difference between Offense DVOA and Weighted DVOA), but that's offset by low rankings on two of the three good-is-good predictors: Weighted Special Teams DVOA and net expected points on punt returns. For San Diego, the main detractor is recent postseason history: The Chargers don't measure up well in 10 of those 11 predictors. The fact that they won this weekend (I think) tells us more about just how awful Cincinnati played than about San Diego's prospects going forward.
The results are far less clear-cut for the NFC, where the most concrete thing I can say is that the top two seeds have a slight edge, and -- in the reverse of the Colts-Chiefs situation -- New Orleans' elimination of Philadelphia anointed the Saints as the remaining team to most likely underachieve from here out. The Eagles got killed in both systems by having offense too good and special teams too poor. The Saints, meanwhile, get demerits in the 1990-2012 system for their special teams rankings (27th in Weighted Special Teams DVOA and 27th in net expected points on punt returns). And the 2006-2012 system does them no favors because they rank in the top half of the NFL in 10 of 11 bad-is-good DVOA splits.
Elsewhere, Green Bay's loss appears on the surface to be an indictment of the 2006-2012 system considering they were its most likely team to overachieve. And even in the 1990-2012 system, they were still a better bet than the 49ers. However, a few seconds of deep thought reveals that the systems were so high on the Packers mainly because of the bad offense they displayed during Aaron Rodgers' injury absence. No doubt, a full season of Rodgers would have placed Green Bay higher than 22nd in Weighted Pass Offense DVOA, which has been the second-most (negatively) influential split in recent times. Furthermore, the Packers' offensive momentum wouldn't have been -8.0% DVOA with a healthy Rodgers, and that's the No. 1 predictor -- albeit in a negative direction -- according to the full-sample correlations.
In closing, I'll go ahead and state a few things I've taken away from this research project. First, regardless of what you just read, the Colts remain long shots. Going back to the tedious math I mentioned a couple thousand words ago, the No. 4 seed is theoretically about 12-to-1 to even make it to the Super Bowl, and the Colts' win over Kansas City only increased those odds to 7-to-1. In other words, even giving credit to the two systems for correctly identifying Indianapolis as a playoff overachiever this postseason, that doesn't necessarily mean they're going all the way. Second, I'll reiterate what our dear leader has said before: Over time, parity has decreased during the regular season, but increased during the playoffs. To wit, this exercise has proven to me that, like Hulk Hogan was in the 80s, playoff success is really hard to pin down cleanly. Finally, even with the improvements I've made to Barnwell's original analysis, mine is only a second step. There are plenty of ways to make it even better (e.g., using a win probability model based on a multivariate logistic regression rather than the Vegas line). Lets work on that over the next 10 years, OK?
144 comments, Last at 11 Jan 2014, 11:12pm by Overfitting