Guest column by Ajit Kirpekar
There are few statistics more hallowed within NFL circles than the sack. Like touchdown passes, sacks are one of the NFL's sacred cows and often the go-to barometer for Pro Bowl teams, All-Pro votes, and likely even contract extensions. If you've been a fan of football as long as I have, you have probably heard of, and are no doubt a fan of, the recently passed Paul "Dr. Z" Zimmerman, one of football's most prodigious writers. The inspiration for this article comes from one of his All-Pro articles, where in his inimitable style of humor, he lamented the fact that so many All-Pro teams are filled with yearly sack leaders despite the fact that sacks amount to only a dozen or so plays out of hundreds.
A lot has changed since that article was written. Pressure statistics are now being charted from a variety of places, including here at Football Outsiders. In fact, FO has done a number of terrific articles showing the value of pressures and their quantifiable impact on passing success. Of course, this being Football Outsiders, we know that context plays an important role in every statistic and pressures are no different.
Thanks to the enormous contributions by FO's game charting data (collected first by FO staff and volunteers, then in conjunction with ESPN Stats & Info, and since 2015 by Sports Info Solutions), we have numbers going all the way back to 2006, including the result and documentation of every play. Using this data, I created a Football Outsiders-style opponent adjustment for every team's pressure rate. I'm calling the statistic RAP -- Rate of Adjusted Pressure. At the risk of putting everyone to sleep, I'm going to give a high-level and hopefully intuitive explanation how RAP is derived. This takes place in three parts: defining a set of variables that explain how pressure is generated; coming up with a good model that helps predict pressure; and finally, layering in the opponent adjustment to the final pressure rate of each team in each season. For those who wish to skip this, feel free to scroll to the bottom where I have presented the results.
I started by selecting a set of variables that should predict how pressure is generated. When I say "pressure" here, I really mean all hurries plus sacks and plays with intentional grounding, minus all coverage sacks and "self sacks" where a quarterback slips or just drops the ball without pressure. As it turns out, a lot of the variables involved are pretty intuitive. Down-and-distance, time remaining, location on the field, AGL injury data, home-field advantage, etc., are all influential. After all, as Donovan McNabb famously said, "It's a lot harder to play quarterback when you're on the road, down by six while facing a blitz." And of course, it also matters a lot which opponent or quarterback you are facing. Modeling the latter two, however, is tricky since these are not ordinal values like down-and-distance or time remaining. These are categorical variables -- namely, a list of names of quarterbacks and teams -- and that makes using them more difficult.
The normal statistical approach to categorical variables is to convert them into some kind of numerical format. One way to do this is to literally code each name in the list in terms of 0s and 1s -- i.e., did you face Team X and Quarterback Y on this play? If so, then it's a 1 for both; if not, it's 0 for both, or some combination of the two. This method works, but has its own drawbacks which I won't go into here. Suffice to say, whenever possible, it's better to find a numerical representation for your categories than it is to just create a mass of 0s and 1s. For the list of quarterbacks, this was unavoidable. For teams, however, I could use some kind of number that quantifies how good or bad a particular offense or defense was. Pressure rate has an effect on DVOA, which means if I used DVOA to measure team quality, I would be committing a statistical sin of mixing the right-hand and left-hand variables in my model. Instead, I had to develop an alternative rating system, which I did with Las Vegas expected point spreads and expected point totals for each game. Using an Elo-style rating system, every team received an opponent-adjusted rating value for their offense and defense. Not perfect, sure, but the results were good enough to use and happily correlated well with DVOA.
Next, I had to go about about picking the right model. I'll again spare you a lot of the sausage-making process, but suffice to say I approached this problem in the same way I approach my own problems at work: start simple and add complexity as is needed. I tried several models but ultimately settled on a multilayered deep learning model. For those unfamiliar with neural networks, let me try, with trepidation, to give an intuitive explanation.
Think of a simple regression model -- one where we try to predict Y from X. The W1 here refers to how much a change in X results in a change in Y. X could be any relevant variable we think predicts pressure.
Now, suppose we did some non-linear operation on W1*X, called f(x). What f(x) looks like is not important here, just that it performs some kind of non-linear operation. Now, we say Y relates to X via the following equation:
Now suppose we do a second non-linear operation, call it g(x) on f(x). Then we get:
And what if we repeat this again with h(x):
Below is a simple drawing of a neural network. Here, the word "hidden" refers to each layer of the non-linear operators and the "act()" refers to the nonlinear function being used.
In the above example, all of the Ws interact with each set of non-linear operations which ultimately begin with our first input X. By doing this chain of operations, we can extend our original linear regression into a non-linear model that can capture more interesting dynamics of X onto Y. Of course, we don't need to be limited to three nonlinear layers. We can add as many as we want. In fact, the more you add, the more nonlinear interaction the model expects from the data. Furthermore, not only can we add additional layers, we can also add more weights in each layer. In doing so, the drawing above (which depicted only one set of weights) becomes a more complex structure. See the drawing below.
Why, you may ask, is all of this needed? Well, sometimes relationships aren't so easily captured in a linear model. A good example would be something like how distance influences pressure. Rivers McCown had a great article on how longer distance to go for a first down usually means higher pressure, but too far a distance and you are more likely to see a screen pass or a quick hitch to specifically avoid pressure. The sign of the relationship flips! This is an example where a linear model struggles since the effect of a distance increasing or decreasing is the same regardless of what the distance is -- i.e., a distance going from 0 to 10 has the same effect as the distance going from 30 to 40. Machine learning algorithms are there precisely to capture these nuanced effects.
The final component involves opponent adjustments. In order to get a sense of how this was accomplished, we need to think about what we're really trying to account for with opponent adjustments. Let's imagine, for example, Defense X achieves a 20 percent pressure rate for the year. We know that said defense played a certain schedule, either a difficult one or a relatively easy one, but one that is unlikely to be exactly average. What we want to know is, if we took the same set of circumstances but forced Defense X to play an average schedule, what would their pressure rate look like?
Our model allows us to do precisely this. We first run Defense X through their actual schedule along with all of the other variables observed in our model. Their expected pressure rate is 25 percent. Here, I emphasize "expected pressure." This is what our model expected the team to achieve. It's predicting 25 percent, while the team achieved only 20 percent. Why is there a discrepancy? Well, the extra 5 percent could have been bad luck or something deeper we are missing, but no matter, this is what our model expected Defense X to get given everything we've observed. Now, let's run Defense X through our model a second time, only now, we replace their schedule with a league-average one. Now every week they are facing the same average offensive opponent rating and the same average quarterback. Now, our model predicts Defense X should have a pressure rate of 30 percent.
Think about what this means. The first model run-through predicted 25 percent. The second model predicted 30 percent. Again, a 5 percent difference, but a 5 percent difference coming entirely from schedule. And it has to be entirely from schedule, because all of the rest of our variables are unchanged. What does this imply? That facing an average schedule, Defense X should have achieved 30 percent pressure, but given the schedule they actually faced, Defense X should achieve only 25 percent pressure. In other words, Defense X had a schedule that was harder than an average schedule, and that was why the prediction was 5 percent less.
Confused? OK, let's try another example. Say Offense Y had a pressure rate of 25 percent and just happened to face an average schedule. Let's say their expected pressure rate with their actual schedule was 20 percent. Then when we re-run the model with an average opponent -- the result will be 20 percent. No change. Why? Because Offense Y's opponents were average.
What if they faced a difficult schedule? Now their expected pressure rate would be something like 25 percent -- i.e., expected to yield 5 percent more pressure per pass due entirely to the harder schedule. What if it were easier? Then the expected pressure rate would be something like 15 percent, or 5 percent fewer pressures per pass.
In all cases, the delta, or opponent adjustment, is coming from the difference between the model's expected pressure rate and the model average opponent pressure rate. After we calculate this delta, we add it (or subtract it) to their overall pressure rate, and that becomes their RAP.
Below is a chart to make things clearer:
|Delta||Rate of Adjusted
|Case 1: Hard Schedule||15%||20%||25%||-5%||10%|
|Case 2: Average Schedule||15%||20%||20%||0%||15%|
|Case 3: Easy Schedule||15%||20%||15%||5%||20%|
And voila -- we have our opponent-adjusted pressure rate.
Two final notes: in much the same spirit of DVOA, RAP is also normalized by year to provide a clearer year-to-year comparison. This was done by normalizing such that each season had the same average pressure rate. In the tables below, you will see the normal pressure rate and then year-adjusted pressure rate alongside. There have been higher raw pressure rates in recent seasons, and it can be tough to tell whether this is related to real increased pressure or changes in charting baselines.
In addition, when it came to adjusting a team by schedule, this was done strictly via opponent rating and quarterback only. I thought about adjusting for weather and stadium effects, but decided against it for this particular iteration.
Here are the results. We'll begin with the best offenses at preventing pressure.
|Best Offenses at Preventing Pressure, 2006-2017|
Five of the top 15 teams in offensive RAP, including each of the top three, are helmed by Peyton Manning, a familiar name to everyone reading this. With some exceptions, most of the top finishing teams are helmed by traditional pocket passers. This is pretty consistent with view that sacks, along with the pressures, say as much or more about the quarterback as they do about the offensive line.
And now the worst offenses at preventing pressure:
|Worst Offenses at Preventing Pressure, 2006-2017|
Russell Wilson is a terrific quarterback, but achieving a preposterous near-50 percent RAP is likely a combination of Wilson himself and an atrocious offensive line -- something that should surprise no one. It's also fascinating that the two worst teams in RAP managed to get to the Super Bowl and come within a yard of winning both (though that gets easier to explain in the next table).
The rest of this list feels more like a reflection on line quality than it does the quarterbacks involved as few of these teams, aside from the Seahawks, were helmed by scramble-first players.
And now, the defenses:
|Best Defenses at Generating Pressure, 2006-2017|
The 2013 Seahawks will be remembered for their all-time great secondary, but I wonder how many people knew or appreciated just how dominant their pass rush was. Even just looking at its raw total, that unit was a monster.
One of the more interesting teams on this list is the 2009 Vikings. Despite a terrific pass-rushing unit, their pass defense was ranked 23rd in DVOA. A more extreme example not displayed on this list is the 2010 Houston Texans, ranked 25th in defensive RAP while finishing with one of the all-time worst pass defenses. I guess pass rush can only do but so much.
The worst pass-rushing teams on record:
|Worst Defenses at Generating Pressure, 2006-2017|
A few teams here stick out to me. The 2009 Bills actually had a very good pass defense and yet they finish among the worst teams in defensive RAP. In fact, I've run correlations on pass defense and pass rush before -- both on sack rate and pressure rate. There is a correlation, but it's very slight. Ultimately, I think the relationship between pass rush and pass coverage is still too nuanced to be explained by the pressure statistic.
A rather humorous team of note is the 2006 Giants. In terms of raw pass rush, they were the worst team in the entire list. Team averages and schedule bumped them up some, but still an all-time awful pass-rush unit. Strange, isn't it, given how they won the Super Bowl the next year? In fact, the 2007 Giants finished as the 70th best pass-rushing team -- though the difference between them and the 50th best team is less than 0.2 percent.
Here are the numbers for 2017:
|Offenses and Pass Pressure, 2017|
|Defenses and Pass Pressure, 2017|
Ajit Kirpekar is a data scientist based out of San Francisco, CA. Despite this, he roots for the Indianapolis Colts. You can follow him on Twitter @akirp.