Stat Analysis
Advanced analytics on player and team performance

Secret Sauce Revisited

by Danny Tuccitto

Almost six years ago, Football Outsiders published a column by Bill Barnwell titled, "Why Doesn't Bill Polian's S--t Work in the Playoffs?" Eight days ago in the comments section of our DVOA ratings column, reader pm asked if we would be running an update of Barnwell's research. Well, it just so happens that, as part of FO's 10th year anniversary, we were already planning on running a series of articles throughout the offseason updating some of the seminal research findings that helped propel the site to where it is today, and form the basis of our "Pregame Show" essay. So I thought, "What the heck! Give the people a taste of what's to come."

As the title of Barnwell's column lays bare, his research was a football adaptation of Baseball Prospectus' "Why Doesn't Billy Beane's S--t Work in the Playoffs?" essay from the book Baseball Between the Numbers. The central conceit of both was that the untimely demise of highly seeded teams might be explained by the idea that predictors of regular season success are different from (and often in direct conflict with) predictors of postseason success. Both articles attempted to find some kind of postseason "secret sauce" -- areas where a team should improve if it wanted better playoff results than its record would otherwise forecast. (As an aside, if only he had known in January 2007 that Polian would later call us morons, reveal that he doesn't get statistics, and actually win the Super Bowl, Barnwell might have recast the title role.)

In the NFL, for instance, research based on the regular season shows that offensive performance is more predictive and more consistent than defensive performance, which in turn is more predictive and more consistent than special teams performance. But what if an analysis based on postseason stats showed that defense and special teams is more predictive than offense? What's a cap-constrained general manager supposed to do in that case besides use a quantum accelerator to leap into the body of Bobby Beathard? Well, unfortunately for GMs everywhere -- at least those without best friends named Al and Ziggy -- that's exactly the situation Barnwell described six years ago: DVOA splits for defense and special teams were more predictive of postseason success than offensive splits. (For reasons that will become apparent shortly, I'm not going to list his more detailed results here. Feel free to click through to the original article, though.)

But of course, two crucial aspects of Barnwell's analysis have obviously changed since 2007. First, historical DVOA was only available for the 1997-2005 era back then, and we've since added it for 1989-1996 and 2006-2012. That development gives us more data to work with, and also allows us to see whether or not predictors of postseason success have changed over time. Second, we've normalized DVOA to put it in the context of a season-specific league environment, so even the data available to Barnwell at the time is more valid now than it was back then.

Besides including more data, I also made a couple of methodological improvements, one of which addresses issues with our measure of playoff success, while the other simply indulges the "hardcore stats" side of my brain. With respect to the former, Barnwell used a measure of playoff performance called Playoff Success Points (PSP), which he adapted to the NFL from what Baseball Prospectus used in their analysis of MLB. All Barnwell's NFL version entailed was assigning two points to each team for a home playoff win, three points for a road playoff win, and five points for a Super Bowl win. Using this system, for instance, three teams earned the maximum 14 points by winning three road games and the Super Bowl: the 2005 Pittsburgh Steelers, the 2007 New York Giants, and the 2010 Green Bay Packers.

PSP was fine as a first step in this kind of analysis, and Barnwell freely admitted it wasn't ideal, welcoming future improvements. Well, the future is now, and so I'm going to fix its biggest flaw: the assumption that playoff games are created equally from a win probability perspective. For starters, the fixed ratio of road win points to home win points implies that home teams win 60 percent of the time in the playoffs, and that this is true of every game. However, even in the 1997-2005 data set Barnwell used, home teams won 67 percent of the time, and win probabilities based on the Vegas line averaged 65 percent, ranging from 34 percent for the New Orleans Saints against the St. Louis Rams in 2000 (which the Saints won) to 89 percent for the Minnesota Vikings against the Arizona Cardinals in 1998 (which the Vikings won). Furthermore, these probabilities change from round to round. For instance, over that same time frame, home teams had a line-based expectation of 61 percent in the Wild Card round (winning 67 percent of the time), an expectation of 68 percent in the Divisional round (winning 78 percent), and an expectation of 65 percent in the Conference Championship round (winning 44 percent).

Piggybacking off that idea, a related problem is that having a static five-point reward for winning the Super Bowl implies that, at the start of the playoffs, every team has the same chance of doing so. Now, Barnwell specifically addressed that critique in the original piece -- and kudos to him for acknowledging it -- but I still think it doesn't pass muster. His argument was that it's reasonable for a team that wins three road games but loses the Super Bowl to score the same PSP as a team that wins two home games and the Super Bowl. And it does seem reasonable at first glance -- even with the common sense knowledge that it's harder for a No. 6 seed to make the Super Bowl than it is for a No. 1 seed to go all the way. On second glance, though, the question becomes, "Well, how much harder is it, exactly?"

Spend way too many hours of free time figuring out the math, and you learn the answer: It's about three times harder, and that renders Barnwell's PSP proposition unreasonable. Without boring you with details, if you assume that every home team wins 60 percent of the time and that the Super Bowl is a 50-50 game (both of which are wrong for any specific matchup of two teams, but they're what PSP assumes), and you plot out every possible trajectory for the six seeds in a conference, it turns out that the No. 1 seed has an 18 percent chance of winning the Super Bowl, whereas the No. 6 seed has a 6 percent chance of even getting there. And if you change the home-team assumption to 67 percent (i.e., to something slightly more in line with reality), the likelihoods diverge even more: 22 percent for the No. 1 seed winning the Super Bowl, but only 4 percent for the No. 6 seed winning its conference.

The fix for this involves allowing win probability to vary across games. The solution I devised is based on that old statistics standby: observed minus expected. First, I went to Pro Football Reference (PFR) and got all the necessary data for playoff games from 1990 to 2012. (Even though we have DVOA stats for 1989, I'm starting with 1990 because that's the year the NFL added a sixth playoff team in each conference.) Then, I used the model that PFR introduced this season, which is based on the Vegas line, to calculate each playoff team's win probability for each of the games they played. Next, I simply subtracted the number of games each team was expected to win from the number of games they actually won to produce a statistic we'll call "Observed Playoff Wins Minus Expected Playoff Wins (OPWMEPW)." Just kidding, let's go with "Playoff Success Added (PSA)?" What? That acronym's taken? Alright, fine, then it's "Playoff Wins Added (PWA)."

According to PWA, here are the 12 most overachieving and 12 most underachieving playoff teams since 1990:

Biggest Playoff Overachievers
Year Team Wins Exp Wins PWA
2007 NYG 4 1.18 +2.82
2012 BAL 4 1.61 +2.39
2011 NYG 4 1.72 +2.28
2000 BAL 4 1.85 +2.15
1997 DEN 4 1.86 +2.14
2005 PIT 4 1.87 +2.13
2001 NE 3 0.97 +2.03
2010 GB 4 2.07 +1.93
1990 NYG 3 1.29 +1.71
2006 IND 4 2.34 +1.66
2002 TB 3 1.45 +1.55
2008 ARI 3 1.50 +1.50
Biggest Playoff Underachievers
Year Team Wins Exp Wins PWA
1996 DEN 0 0.82 -0.82
2007 IND 0 0.79 -0.79
2008 CAR 0 0.77 -0.77
2010 NO 0 0.77 -0.77
2010 NE 0 0.76 -0.76
1995 SF 0 0.76 -0.76
2012 DEN 0 0.75 -0.75
2009 SD 0 0.75 -0.75
1996 BUF 0 0.74 -0.74
2005 IND 0 0.74 -0.74
2011 GB 0 0.72 -0.72
1995 KC 0 0.72 -0.72

You'll recall that the 2005 Steelers, 2007 Giants, and 2010 Packers scored the maximum according to PSP. And not surprisingly, each of them appears in the top 12 according to PWA. However, it's clear from the table that the Giants' run was much harder than those of the Steelers and Packers. In each of their four games, the Giants were no better than a 3-to-2 underdog, and they became bigger underdogs with each successive round: 40 percent at Tampa Bay in the Wild Card round, 30 percent at Dallas in the Divisional round, 29 percent at Green Bay in the NFC Championship game, and 18 percent against New England in the Super Bowl. Meanwhile, the Packers were no worse than a 3-to-2 underdog, and actually were favorites in both the NFC Championship game and the Super Bowl. Pittsburgh was also a favorite in two games, including the Super Bowl.

The ability of PWA to quantitatively differentiate between PSP peers is also an advantage on the other side of the ledger -- perhaps even more so. That's because, according to PSP, every team that doesn't win a playoff game gets 0 points, and there were 117 of them from 1990-2012. In other words, PSP considers over 40 percent of playoff teams in the past quarter-century to be equally bad even though we know that a No. 1 seed losing to a No. 6 seed in the Divisional round is much worse of an outcome than a No. 6 seed losing in the Wild Card round. To wit, 10 of the 12 biggest playoff underachievers according to PWA were heavy home favorites after a first-round bye; and that's the way it should be. The only two exceptions are the 2010 New Orleans Saints, who infamously lost to Beast Mode and the 7-9 Seattle Seahawks, and the 1996 Buffalo Bills, who lost as an 8.5-point home favorite to the same Jaguars team that crowned the Broncos as PWA's top underachiever the following week.

So with a more valid playoff success measure in tow, all that's left to do is calculate correlations between PWA and the hundred or so regular-season DVOA splits we have in our Premium database, and answer the following two questions:

  • For 1997-2005, does PWA come to the same general conclusions about what DVOA splits predicted playoff success as Barnwell's original analysis using PSP did?
  • Has the formula for playoff success (in terms of DVOA splits) changed over time?

To answer all of these questions, I added one methodological wrinkle because statistical inference tests are a crutch. Namely, in order for me to conclude that a DVOA split was predictive, the correlation had to have a p-value less than or equal to 0.05. So, without further ado, below is a table showing PWA correlations for each of the three time periods provided that at least one of them was statistically significant. It's sorted in the best way possible to delineate which DVOA splits were important during each time period, and the color of the shading corresponds to the correlation's level of significance (i.e., p ≤ 0.01 is darker green if this split leads to more playoff success and darker red if this split leads to less playoff success, p ≤ 0.05 is lighter green or lighter red, respectively, and nonsignificance is unshaded):

DVOA Split 1990-1996 1997-2005 2006-2012
Pass Defense, 1st Down -0.006 -0.290 -0.069
Defense, 1st Down -0.042 -0.260 -0.049
Run Defense, 2nd Down 0.105 -0.252 -0.050
Special Teams, Variance -0.008 0.251 0.035
Defense, Red Zone 0.082 -0.249 -0.061
Run Defense, Unadjusted 0.084 -0.230 0.015
Defense, Away 0.064 -0.225 0.029
Special Teams, Punt Returns 0.402 0.219 -0.154
Defense, Unadjusted 0.169 -0.214 -0.026
Defense, Goal-to-Go 0.057 -0.211 0.102
Special Teams, Unadjusted 0.153 0.198 -0.002
Run Defense, Weeks 10-17 -0.010 -0.198 -0.029
Defense, Tied/Winning Small 0.137 -0.194 -0.048
Special Teams 0.188 0.190 0.037
Run Defense, Weighted 0.013 -0.187 0.019
DVOA Split 1990-1996 1997-2005 2006-2012
Offense, Goal-to-Go 0.357 -0.013 -0.205
Pass Offense, Weeks 1-9 0.273 -0.086 -0.213
Offense, Momentum -0.266 -0.140 -0.137
Offense, Weeks 1-9 0.250 -0.075 -0.170
Offense, 3rd-and-Long 0.249 -0.116 -0.161
Offense, 2nd Down 0.248 -0.003 -0.265
Offense, Home 0.245 -0.009 -0.180
Offense, 2nd-and-Short 0.237 -0.007 -0.277
Defense, 2nd-and-Short 0.236 -0.052 -0.002
Special Teams, Weather Points -0.228 0.020 -0.148
Offense, Winning Small 0.227 -0.078 -0.267
Pass Offense, 2nd Down 0.227 0.042 -0.262
Pass Offense 0.226 -0.062 -0.274
Offense, 1st Quarter 0.213 -0.037 -0.339
Pass Offense, Unadjusted 0.209 -0.091 -0.292
Offense 0.205 -0.081 -0.240
Offense, Unadjusted 0.201 -0.100 -0.264
Offense, 3rd Down 0.195 -0.116 -0.246
Offense, 2nd-and-Long 0.193 -0.047 -0.239
Pass Offense, 3rd Down 0.185 -0.118 -0.261
DVOA Split 1990-1996 1997-2005 2006-2012
Pass Offense, Weighted 0.141 -0.107 -0.292
Offense, 1st Half 0.165 -0.078 -0.280
Offense, Weighted 0.103 -0.124 -0.267
Pass Offense, Weeks 10-17 0.118 -0.017 -0.260
Pass Offense, Red Zone 0.107 0.049 -0.248
Offense, Weeks 10-17 0.091 -0.059 -0.245
Offense, Away 0.093 -0.133 -0.239
Total, Unadjusted 0.092 0.145 -0.225
Offense, Tied/Losing Small 0.067 -0.098 -0.222
Offense, Late & Close 0.165 -0.130 -0.220
Offense, 3rd-and-Short 0.027 -0.064 -0.216

(Before moving on, here are a few more notes about reading the table. First, remember that since DVOAs get lower as defenses get better, playoff success for better defensive DVOAs is shown by negative correlations. Second, the "Momentum" split is just the difference between the unit's weighted DVOA and unweighted DVOA, so a positive correlation means teams playing better towards the end of the season had more playoff success. Third, "Unadjusted" means VOA, which is not adjusted for opponents. Finally, "Special Teams, Weather Points" is a measure of how much weather and altitude was responsible for a team's success on special teams. It will be high for Denver and dome teams, and low for cold-weather teams other than Denver.)

With respect to comparing the results for 1997-2005 using PWA to those using PSP, the details are slightly different, but the general conclusion is the same. Of the 15 statistically significant correlations in that time period, none involved offense. To boot, the closest any offensive DVOA split came isn't even on the table because it also wasn't significant for the other two time periods (strength of schedule at -0.175). Focusing in on the DVOA splits that were predictive of PWA from 1997 to 2005, four matched up with Barnwell's column: First-down pass defense, first-down defense, red-zone defense, and away defense. This isn't surprising when you consider the teams that overachieved during that time. At -74.4%, the 2002 Tampa Bay Buccaneers remain the best first-down pass defense in DVOA history, and they posted +1.55 PWA during that postseason. The second-best pass defense from 1997 to 2005 was the 2000 Baltimore Ravens (waaaaaay behind the Bucs at -35.3% DVOA), and they ended up with the highest PWA of that era (+2.15).

In terms of our second question, the pattern of correlations leaves no doubt that the recipe for playoff success has changed over time: There wasn't a single DVOA split that was statistically significant across all three time periods, and only eight of the 46 in the table overlapped across two time periods.

What's more, it's as if we're looking at three distinct eras of different s--t working in the playoffs. (Again, Barnwell's timing was impeccable, writing his piece about DVOA correlations that ended up being mostly inapplicable to eras before or after.) I already discussed the "defense plus special teams" recipe for 1997-2005, so let's focus on the other two. From 1990 to 1996, playoff success was most enjoyed by teams with a good punt return unit and a good overall offenses that slumped towards the end of the regular season. The poster child for that era was its first champion and owner of its highest PWA (+1.71): the 1990 New York Giants. New York finished the regular season ranked seventh at 10.5% Offense DVOA, but their Weighted Offense DVOA was only 4.9% because of a -33.3% showing in a Week 13 loss to San Francisco (Breathe, Danny. Breathe.), and Dave Meggett propelled their punt return unit to a No. 1 finish (+10.3 net expected points).

The last seven postseasons have been like Bizarro 1990-1996: Offense is important again, but it's bad offenses that have won more games than expected. Of the 24 statistically significant correlations for this time period, only Total VOA doesn't have "offense" in the name. And yet, every one of those offensive DVOAs has a negative effect on playoff success. The 2007 New York Giants -- I'm sensing a theme here -- had the highest PWA since 1990 (not just from 2006 to 2012), but ranked 18th with -1.1% Offense DVOA. The 2009 New York Jets -- seriously, what's with the New York thing today -- amassed +1.06 PWA despite finishing the regular season at -12.5% Offense DVOA (ranked 22nd). Meanwhile, the 2010 New England Patriots, who currently own the second-best Offense DVOA of all time (so far) were one and done thanks to the N+1 incarnation of that Jets team despite being a 3-to-1 favorite to win the game. Finally, the team with the worst PWA of this era (-0.79) was the 2007 Indianapolis Colts, they of the 22.2% Offense DVOA and Divisional round exit at the hands of Billy Volek.

The fact that having a good offense -- especially a good pass offense -- seems to be a recipe for playoff failure these days is puzzling to me for two reasons. First, if that's the case, then why isn't having a good defense -- especially a good pass defense -- part of the recipe for success? Of 126 DVOA splits, the most influential defensive correlation ranks 28th (-.191 for front zone) and pass defense doesn't show up until 48th (-.129 for Weeks 10-17). Second, and more importantly, what the hell's going on out here? Anyone who is either in tune with NFL stat analysis or has happened to watch an NFL game over the past few years knows that having a high-octane pass offense, usually on the shoulders of an elite quarterback, is the shortest distance between showing up and winning. Over the past seven years, however, 14 of the 18 teams that finished the regular season in the top 3 of Pass Offense DVOA underachieved in the playoffs, including the last six No. 1s. Of course, in the irony to end all ironies, the only No. 1 pass offense to win the Super Bowl during this period was the 2006 Indianapolis Colts. After 2,500 words, it turns out Bill Polian's s--t comes up smelling like roses to this moron.

If I had to guess, I would point the finger at two culprits: sample size and missing variables. Regarding the former, my sample for the 2006-2012 correlations comprised 84 teams, and that's woefully small in this era of big data. That said, I had no problem finding that good offense led to playoff success from 1990 to 1996, which involved an identical sample size. Therefore, it's probably more an issue of missing variables. I've made a couple of crucial improvements to Barnwell's analysis, but what's really needed is to see how well regular-season DVOA splits autocorrelate with postseason DVOA splits. It might very well be that bad regular-season offenses are sleeping giants these days, raising their games (for whatever reason) during the playoffs. In other words, this might be a case of statistical mediation: Postseason success may very well depend on good offense, but it's the worse regular-season offenses that are more likely to be good come playoff time.

To finish things up, I'm going to apply what I've learned from this analysis to the 2013 playoffs. For fun, I'll apply it two different ways: (1) Assuming the 2006-2012 postseasons are the most predictive, and (2) assuming this postseason is best predicted by an amalgam of the previous 23. Under both assumptions, we don't know the final Vegas lines past the Wild Card round, so in the spirit of Baseball Prospectus' original foray into the topic, I'm just going to create a composite score for all 12 teams using the following method: (1) Include rankings only for those DVOA splits that were statistically significant over the time period assumed to be the most predictive, using some common sense discretion when it comes to overlapping splits; and (2) weight the included rankings by the magnitude of the correlation. (If you want more details about the weighting procedure, ask me in the comments.)

So, for the 2006-2012 method, after paring down the 24 statistically significant DVOA splits that appear in the table, I ended up with these 11 DVOA predictors of playoff success:

  • Offense, First Quarter
  • Pass Offense, Weighted
  • Offense, Winning Small
  • Pass Offense, Second Down
  • Pass Offense, Third Down
  • Pass Offense, Red Zone
  • Offense, Away
  • Total, Unadjusted
  • Offense, Late and Close
  • Offense, 3rd-and-Short
  • Pass Offense, Weeks 1-9

You might have read that and yelled, "But at least five of those involve things we know to be nonpredictive in general!" If so, you're right... in general. Some of these splits are kind of random, but we're talking about the playoffs, where randomness abounds. In small sample applications, I don't mind including things that tend to only work in a small sample.

For the second method, I'll use rankings for the following eight DVOA splits, which are based on a correlation analysis of data from 1990 to 2012 (listed from strongest to weakest significance, direction in parentheses):

  • Offense, Momentum (-)
  • Pass Defense, First Down (-)
  • Special Teams, Kickoff Returns (+)
  • Special Teams, Weighted (+)
  • Special Teams, Punt Returns (+)
  • Offense, Away (-)
  • Offense, Front Zone (-)
  • Pass Offense, Weighted (-)

Below is a table showing the 2013 playoff teams ranked from first to 12th in expected playoff success according to the two methods I just described:

Team 2006-2012 1990-2012
DEN 12 12
NE 6 8
CIN 4 6
IND 1 3
KC 2 2
SD 11 9
SEA 5 4
CAR 3 7
PHI 10 10
GB 7 1
SF 9 5
NO 8 11

In the AFC, the sans-Polian Indianapolis Colts are the team most likely to overachieve during this year's playoffs, especially now that they've eliminated the team second-most likely to overachieve. The Colts ranked in the bottom half of the NFL for each of the six most influential DVOA splits from 2006 to 2012 (see above) for which low rankings are advantageous: 16th in first-quarter offense, 20th in weighted pass offense, 20th in offense when winning by a touchdown or less, 16th in second-down offense, 22nd in third-down offense, and 24th in red-zone offense. They also ranked 15th or worse in four of the five bad-is-good splits I used from 1990 to 2012. Speaking of which, the full-sample method was bullish on the Chiefs because of their No. 1 Weighted Special Teams DVOA, their No. 1 punt return unit, and their No. 2 kick return unit. Too bad these "random" variables ended up playing far less of a role than another "random" variable: injuries.

Meanwhile, both methods hate the chances of both the Chargers and the Broncos, so the Patriots-Colts winner has the inside track to a Super Bowl berth -- at least according to this analysis. Of the 19 DVOA splits I'm looking at across both systems, Denver is on the wrong end of 18. For the 11 bad-is-good predictors based on 2006-2012, they rank fourth or better in all of them. And for the 1990-2012 predictors, the only bright spot is a slumping offense (-6.6% difference between Offense DVOA and Weighted DVOA), but that's offset by low rankings on two of the three good-is-good predictors: Weighted Special Teams DVOA and net expected points on punt returns. For San Diego, the main detractor is recent postseason history: The Chargers don't measure up well in 10 of those 11 predictors. The fact that they won this weekend (I think) tells us more about just how awful Cincinnati played than about San Diego's prospects going forward.

The results are far less clear-cut for the NFC, where the most concrete thing I can say is that the top two seeds have a slight edge, and -- in the reverse of the Colts-Chiefs situation -- New Orleans' elimination of Philadelphia anointed the Saints as the remaining team to most likely underachieve from here out. The Eagles got killed in both systems by having offense too good and special teams too poor. The Saints, meanwhile, get demerits in the 1990-2012 system for their special teams rankings (27th in Weighted Special Teams DVOA and 27th in net expected points on punt returns). And the 2006-2012 system does them no favors because they rank in the top half of the NFL in 10 of 11 bad-is-good DVOA splits.

Elsewhere, Green Bay's loss appears on the surface to be an indictment of the 2006-2012 system considering they were its most likely team to overachieve. And even in the 1990-2012 system, they were still a better bet than the 49ers. However, a few seconds of deep thought reveals that the systems were so high on the Packers mainly because of the bad offense they displayed during Aaron Rodgers' injury absence. No doubt, a full season of Rodgers would have placed Green Bay higher than 22nd in Weighted Pass Offense DVOA, which has been the second-most (negatively) influential split in recent times. Furthermore, the Packers' offensive momentum wouldn't have been -8.0% DVOA with a healthy Rodgers, and that's the No. 1 predictor -- albeit in a negative direction -- according to the full-sample correlations.

In closing, I'll go ahead and state a few things I've taken away from this research project. First, regardless of what you just read, the Colts remain long shots. Going back to the tedious math I mentioned a couple thousand words ago, the No. 4 seed is theoretically about 12-to-1 to even make it to the Super Bowl, and the Colts' win over Kansas City only increased those odds to 7-to-1. In other words, even giving credit to the two systems for correctly identifying Indianapolis as a playoff overachiever this postseason, that doesn't necessarily mean they're going all the way. Second, I'll reiterate what our dear leader has said before: Over time, parity has decreased during the regular season, but increased during the playoffs. To wit, this exercise has proven to me that, like Hulk Hogan was in the 80s, playoff success is really hard to pin down cleanly. Finally, even with the improvements I've made to Barnwell's original analysis, mine is only a second step. There are plenty of ways to make it even better (e.g., using a win probability model based on a multivariate logistic regression rather than the Vegas line). Lets work on that over the next 10 years, OK?


144 comments, Last at 11 Jan 2014, 10:12pm

1 Injuries

seem to be the most obvious potentially important variable that's worth digging into. Both regular season as a whole, injuries down the stretch, and injuries for playoff games (at the least, I'd guess that most injuries are pre-game and are known in advance). I don't know how good AGL is, but it seems like a reasonable starting point.

69 Re: Injuries

In reply to by cfn_ms

AGL won't take into account players returning from injuries at the end of a season. See Ngata and Lewis last year.

You'd need to create a metric that accounts for both the quality of the starters and the first line of backups - available for play after Week 17.

76 Re: Injuries

In reply to by Anonymous49 (not verified)

Really? I figured AGL was tracked by week including playoffs. Isn't that fairly doable at this point?

2 Re: Secret Sauce Revisited

This article confused me more than anything.

#1. How did the Sample Size = 1 of 2006 Colts disprove Barnwell?
#2. What do I want my team to start getting good at?

12 Re: Secret Sauce Revisited

Well, the analysis itself was based on a small sample. Also, the Colts had to win more than one game to win the Super Bowl.

Data mining theories like this can be fairly fragile when it comes to the addition of counter-examples. The basic problem is that there are so many possible theories that any small data set is going to support some of them, even when no real pattern should exist.

125 Re: Secret Sauce Revisited

RickD wrote: so many possible theories that any small data set is going to support some of them

Yes, that compounded with our human proclivity to see patterns where none exist. We need explanations, so we invent them. And, Stevie Wonder sings "Superstition" in the background and our beer ads tell us "it's only crazy if it doesn't work".

3 Re: Secret Sauce Revisited

Would there be some way to factor PWA into regular season DVOA in the context of each individual matchup? And, if yes, is there an argument that such an approach might yield a good representation of which teams were best situated to take advantage of their 'secret sauce' to marginally increase their chances for victory?

4 Re: Secret Sauce Revisited

Hazy intuition: could it be true that defense and special teams are more important than offense in the playoffs because they're more important in close games and playoff games are more likely to be close? And that teams that win a lot of blowouts in the regular season are just .500 teams in close games like everyone else so they underperform their regular-season WP% in the playoffs when their opponent is non-blowout-able?

Again, I have no idea whether this makes even basic sense when it's quantified. I'm just brainstorming.

8 Re: Secret Sauce Revisited

That makes a lot of sense to me, intuitively.

Although, Peyton Manning has historically done way better in close games than stats would predict, so you might find the opposite is true.

27 Re: Secret Sauce Revisited

Yeah, and we'll see how Luck works with all this. His Colts are something like 16-2 in one-score games the past two seasons. 11 of them, of course, the famous 4Q come-from-behind games. That's kind of a ridiculous stat. Not like being 10-0 in your first ten playoff games, but it's a start.

5 Re: Secret Sauce Revisited

Unless I missed this - did you do correlations across all 3 playoff time periods? so, don't chop up 1990-2012 into 3 different time periods, just treat it as one time period. are there are predictive factors then?

58 Re: Secret Sauce Revisited

This is what I was hoping to see. For some reason the author decided to chop up the data based on the period that barnwell had to work with in the original piece. This seems pretty strange to me.

91 Re: Secret Sauce Revisited

Given the three time periods are broadly the same, isn't it possible to just add the correlations together to get something vaguely akin to a total for each metric? That's what I just did. It reveals that the top 5 positive correlations are:

Punt Returns
Special Teams
Defence (Unadjusted)
Special Teams Variance
Unadjusted Special Teams

The top 5 negatives are:

Offensive Momentum
Pass Defence (1st Down)
Special Teams Weather
Defence (1st Down)
Offence (weighted)

6 Re: Secret Sauce Revisited

I'm sorry, but I found this entire exercise to be questionable at best.

"So with a more valid playoff success measure in tow, all that's left to do is calculate correlations between PWA and the hundred or so regular-season DVOA splits we have in our Premium database"

That feels like blindly throwing darts at a wall and drawing bulls eyes around the darts. First, you have the problem of unfettered data-mining. If you test 100 things, 5 will be "statistically significant" at the .05 level and 1 will be at the .01 level, just due to chance. But--and I'm no expert on this--are the DVOA splits normally distributed? The problem could be a lot worse if not. Also, there is probably a lot of correlation between certain specific splits, which seems like it would also exacerbate this problem.

"In terms of our second question, the pattern of correlations leaves no doubt that the recipe for playoff success has changed over time"

I disagree--"the pattern of correlations" I see only makes me think "random random random", not that any "recipe for playoff success has changed." First Down pass defense is correlated with success for one arbitrary time period and then not for another? What the hell do you do with that information? Especially when you've been testing 100 other splits? So I hate to say it, but I found this article shoddy in terms of statistical inference, and I left feeling like I haven't learned anything of value.

At the same time, this is the best football stats site probably on the internet...I only post this comment because you guys are the best and the level of debate here is so high.

13 Re: Secret Sauce Revisited

Dan S states well concerns that I would also have.

One has to be careful with data mining of this sort. It's important to have a divide between your hypothesis generation and your hypothesis testing. It's typical in machine learning circles to use a subset of the data to generate hypotheses and then test them against the rest of the data. That way you're not simply over-valuing random artifacts.

127 Re: Secret Sauce Revisited

RickD writes: It's typical in machine learning circles to use a subset of the data to generate hypotheses and then test them against the rest of the data.

And actually, by dividing the data up into different periods, that's actually what Danny did. The fact that the predicted correlations from one period did not match the correlations from another, suggest that our machine learning algorithm has not successfully learned. (At least that's what I understand from what I've read on the topic, not being a statistician, machine learning expert, or having any other credentials.)

I suspect that the issue, has to do more with Danny's definition of success, which I'll talk about in another reply.

140 Re: Secret Sauce Revisited

I promised to write more on the definition of success. However, I think that is pretty well cover by numerous people, so I'll try to be brief. There are two main issues with PWA.

1) it compares the results to "Las Vegas Lines". That means that if the lines already properly compute success, the best one should do is break even. Note, I think the lines are more about getting even betting on both sides, so there may be some room for improvement, but it would be slight.

2) It defines success by "exceeding expectations". As many have noted, this skews the measurement toward rewarding teams with depressed expectations, rather than rewarding teams that won (even if they were expected to win).

As several people (me included) have mentioned a simpler metric based solely on wins would be more convincing. And, note, that should be the point of the article: presenting convincing evidence.

I would take it further, to allow DVOA like analysis, e.g. do the various DVOA metrics still properly correlate in the post-season. If they diverge, that would be news, Of course, it they diverge, the bar for proving that will be high, because you have small sample issues, and an ingrained belief that they shouldn't diverge.

17 Re: Secret Sauce Revisited

I'm with you. This is pretty average work. The table of correlations makes me think that the playoffs are just random.

Also, I find this type of statistical work with dependent events (since one team winning a game necessitates another team losing). I took some statistics courses, but not to the degree of quite a few others on FO boards, so maybe I'm off base, but it just seems less sound to do this analysis with a small set of dependent events (Team X cannot win two games unless they win the first game, Team B cannot win three games unless Teams C-G lose a game, etc.)

30 Re: Secret Sauce Revisited

The thing you're missing is that even if you conclude the playoffs are random, that is important in and of itself. Why? Because the regular season isn't. Not even close - you can actually show that very easily by just looking at the distribution of wins, and it's not even close to what you would expect from a random draw.

Since we know they actually are playing the same game in the playoffs (so it can't magically become purely random) it must mean that the playoff selection process produces some bias in the remaining teams that changes things.

If you believed that the playoff selection process was perfect, it could possibly be random. Imagine, for instance, a perfect BCS, selecting only the top 2 teams - if the talent distribution in the NFL was such that there isn't much difference between the #1 and #2 team, then you'd end up with a nearly-random playoff.

But the playoff selection process certainly isn't perfect - the division winners are almost random, in some sense: it could produce a very bad playoff contender. So how in the world can a selection process that lets in between 4-6 almost random teams convert a 'non-random' game into a 'random' game?

I think Danny's conclusion is spot on: that there's some variable that's not being measured that's driving playoff success.

34 Re: Secret Sauce Revisited

He is starting with the Vegas line as a baseline. All this seems to show is that there's no secret sauce to beat Vegas. It's not at all showing that the playoffs are random. Just that playoff over/under-performance relative to the Vegas line is random.

47 Re: Secret Sauce Revisited

Which means the Vegas line is appropriately calibrated, and if there's any additional information about what works better in the playoffs than the regular season it's already factored into the Vegas lines.

72 Re: Secret Sauce Revisited

Thank you. This article really has an underlying definitional problem. It defines success as essentially doing "better than expected", and failure as the opposite.

I guess it's mildly interesting that Vegas has slightly overvalued passing offense in recent playoffs? I wonder how much of that is driven by Pats '07 and '10.

25 Re: Secret Sauce Revisited

Adding to Dan S's comments, I may have missed something, but it seems like there's a another big flaw here.

Let's say that teams with great offenses tend to do well in the regular season and get high seeds in the playoffs. Therefore, teams with great offenses tend to have a relatively high number of expected wins. This means that if a great offensive team/high seed loses early in the playoffs it will score a relatively high negative PWA. At the same time, when a team with a great offense/high seed wins the Super Bowl, it will have accrued a relatively small (for a Super Bowl winner) PWA. Therefore, the ineffectiveness of great offenses is overstated when the team loses early and the effectiveness is understated when the team ultimately wins the Super Bowl.

33 Re: Secret Sauce Revisited

Yup. Now that should balance out in the long run because the relatively high negative PWA shouldn't happen very often. However, we know that an unexpectedly high number of top seeds lost from 2007-2012, meaning top seeds accured lots of negative PWA during that timeframe, much more than we expected.

The question is: did those top seeds lose because something about the playoffs is different than the regular season, from which we based our expectations, or is it simply a randomly unlucky stretch that would have regressed to the mean if given more trials? The results would look the same either way.

46 Re: Secret Sauce Revisited

I agree there is a problem with the basic logic of the scoring system. It seems you have created a measure of success that rewards the teams that perform badly in the regular season but still squeezes into the playoffs. This eliminates the effect of splits we know lead to success in the regular season, although these splits are very likely to be relevant in the playoffs as well. What you have left is a tiny bit of information and a lot of statistical noise, and it is impossible to distinguish.

I suggest you reward success points only weighted by home field advantage, which is pretty much undeniable, possibly awarding more points for more important games (conf finals and SB) based on some subjective definition of 'playoff success'. Award negative points for losses, zero for byes so you don't create another bias for wild card teams.

I am guessing the results will still be noisy, since an entire playoff season represents fewer games than a single week of the regular season, but you might come up with something a little better than 'bad offense is good'.

Or, we could just move on and everyone agree that playoff success is more likely for good teams but still pretty random.

80 Re: Secret Sauce Revisited

Why is that a flaw, if the point of the exercise (as I understood it) is: why do some teams overperform/underperform in the playoffs? To answer that question, you have to categorize teams as Meets Expectations, Exceeds Expectations, and Does Not Meet Expectations. If a team that is rated the best team in football wins the Super Bowl, by definition it Meets Expectations. If it loses earlier, by definition that team has Underperformed.

118 Re: Secret Sauce Revisited

Well and good, but the whole point of Secret Sauce (as I understand it) is not to function as a betting aid, but to predict which teams will succeed and fail in the playoffs. You might want something that will help you beat the spread, but I'm pretty sure that's not what Barnwell and the Baseball Prospectus article before him were trying to do, nor do I think that's what readers are looking for when they read a Secret Sauce article.

139 Re: Secret Sauce Revisited

"which teams will succeed and fail in the playoffs"
Define 'succeed'.
The first gut feeling would say Wins and Losses are. And that a Super Bowl win is more success than a divisional win. But random chance is always a part of success or failure (if you define them with W's and L's) and a good game in the conference CG is just as good of a game as the same game in the Super Bowl.
I don't think 'Wins' is what you're chasing here. What you want to know is which teams perform well, playoff DVOA is what I would be looking for.

The question I'd ask is: What part of a team increases a team's playoff DVOA?

[EDIT] I read that Eddo is saying pretty much the same thing in post #19 and I agree with him totally.
Why would you want to measure if a team exceeds expectations when you're actually looking for the special ingredient of succes in the playoffs?

36 Re: Secret Sauce Revisited

Agree completely.... this seems like a horrible case of trying to over-fit the data.

No reason to lump the eras into smaller sample sizes unless you think there was something rules wise that would make those eras significant theoretically.

So much of the rest of the over-analysis is similar.

It could have been a very nice article/analysis... look at some common large factors (E.g. OFF vs. DEF/ST)... then maybe look at Pass/Run splits. Doing so we could see if indeed there is different "predictors of post season success" then otherwise. You could also use theory to concoct other testable hypotheses (e.g. number of starters lost for playoffs might be an interesting, which would show teams that got healthy for playoff runs).

But this "throw spagetti against the wall to see if it sticks" method really calls the whole excercise into quesiton.

10 Re: Secret Sauce Revisited

Another point--
Can you just tell us whether overall DVOA, defensive DVOA, offensive DVOA, and ST DVOA predict anything over your entire data sample? That's the most obvious thing to test but I didn't see it (and apologies if I missed it). Maybe it's not here because these obvious stats have no explanatory power.

Given that--maybe the only thing you can say about playoff success is that it's random as balls. Why do a bunch of questionable arithmetic acrobatics to convince yourself otherwise?

11 Re: Secret Sauce Revisited

I wouldn't submit this as a paper in an Intro to Stats class. You guys will probably cross-post this to ESPN. Keep up the good work.

14 Survival Analysis

Actually, survival analysis would be a more appropriate statistical method for studying playoff success as the teams keep playing until they lose. It's pretty simple to plug in the teams stats to a multivariate survival model and use something like AIC to select the variables most important for playoff survival.

15 Re: Secret Sauce Revisited

After one read-through my head is spinning but here was my first thought.

1) A team's postseason "expectation" is established in large part by their regular season success.

2) Regular season success is predicted strongly by certain offensive DVOA splits.

So a team with a great regular season offensive DVOA will be expected to succeed in the playoffs.

Here's the are then computing the "special sauce" by their observed playoff results relative to that expectation. In that case it's unsurprising that factors positively correlated with high "expected" values will be negatively correlated with "observed minus expected".

To say it colloquially, good offense means you should be favored in playoff matchups. Which makes it hard for good offenses to do better than expected.

37 Re: Secret Sauce Revisited

Yes, this.

This entire article isn't looking at what makes a team successful in the playoffs. It's about what makes a team more (or less) successful than people expect it to be (as measured by the Vegas lines). So it's as much about measuring what drives people to think highly of a team as it is about how well the team actually does.

I wonder if some of the cause of the negative recent trends you see on offense are tied to fantasy football. One effect of the increased popularity of fantasy football is that people are, in general, more knowledgeable about offensive players, and more likely to think highly of a team in real life that got lots of fantasy points. So offensive juggernauts with porous-defenses are more highly thought of than tough defenses that win a lot of 20-7 games. Hence, if a team that won all its games 34-27 or so goes up against a 20-7 defensive team, I would wonder if the Vegas lines would favor the offensive team more than they should, and hence we'll see a negative correlation in what you've done.

I think your methodology might be worth applying to simply playoff success--NOT playoff success compared to the expected success. Something closer to Barnwell's original methodology, but corrected for the more accurate win probabilities. I would bet that you would see very different predictive splits.

53 Re: Secret Sauce Revisited

The baseball prospectus PSP score was not based on expectation at all, it merely assessed wins and losses and how far teams advanced in the playoffs. I think adding expectation to Barnwell's PSP and PWA was a bad idea.

Obviously it's harder to measure success in the NFL playoffs because there are fewer games and teams play an uneven number of games. But I think a better scoring system would based solely on how far teams advanced. In baseball, losing a playoff series 4-3 is better than losing 4-0, but otherwise that's all PSP is measuring- how far a team advanced.

So 1 point for advancing (via win or bye) to the divisional round, another point for advancing to the conference championships, another for making the Super Bowl and a fourth for winning it. Seems like that would solve the problem of removing expectation and the special advantage teams with a bye receive would seem to be deserved and therefore acceptable anyway.

16 Re: Secret Sauce Revisited

Were there any patterns that were statistically significant over the full 1990-2012 sample? If not, my takeaway from this is that there is in fact no "secret sauce". I find that much more likely than the idea that the secret sauce exists, but changes completely every 7 years or so for no discernible reason. Random wins and losses happen, and if in a 6-8 year period a couple of those random winners have similar profiles (and/or random losers have opposite profiles) then you will see patterns emerge that have no meaning.

19 Re: Secret Sauce Revisited

My biggest concern is that PWA [EDIT: removed parenthetical(*)] isn't measuring "what works in the playoffs", but rather "what works more in the playoffs than in the regular season".

Consider the extreme example of a team that was 16-0, won every game by 70 points, and wound up being considered 99% favorites in every playoff game by Vegas. Then, that same team goes out and wins its three playoff games by 70 points each.

Clearly, that team's shit works in the playoffs, as it dominated three opponents on its way to a championship. But its PWA is going to be very small, and thus whatever it does well wouldn't have much effect on the conclusions of this study.

Part of me thinks you should use VOA/DVOA or SRS in playoff games to determine playoff success. This would be a proxy for how well teams played once they got to the playoffs. Instead, what you're measuring is how well they do as compared to expectations, which isn't quite the same thing.

(*) I never really cared for Barnwell's PSP measure either, mainly due to its arbitrary assignment of value for various wins.

63 Re: Secret Sauce Revisited

Another method for calculating PWA would be using Bill James's Log5 method, which Barnwell used when he calculated the probability of teams making the playoffs.

An alternative method for looking for the "secret sauce" would comparing the most successful teams to each other. You could (arbitrarily) group all the teams that won two or more playoff games and see if there was any stat that was similar across teams.