Defense and Rest Time
Guest column by Ben Baldwin
Whenever I ask why teams run the ball so frequently, a common response is that by running the ball and chewing up clock, a team can keep its defense off the field and rested for the next drive, thereby allowing it to perform at its highest level.
Anecdotes supporting the idea that rested defenses are more effective come to mind quickly. In Super Bowl LII, the Brandon Graham strip sack -- the only sack of the game -- came after a 15-play drive (counting the attempted two-point conversion) that chewed up more than seven minutes of game time. In the AFC Championship Game, the Jaguars' defense finally showed cracks after a series of three-and-outs by the offense, with the game-deciding New England touchdown following a Jacksonville three-and-out that used less than a minute of game clock. And of course, in Super Bowl LI, Atlanta's defense looked gassed in the second half on the way to spending 93 snaps on the field.
Do we remember specific defensive performance following exceptionally long or short drives because of confirmation bias? Or is a given defense's time to rest actually predictive of how it will perform?
The only piece of research I could find is this piece from Football Outsiders in 2011, in which Daniel Lawver found no relationship between a team's average offensive plays per drive and defensive DVOA over the course of a season. However, it could be the case that using team-level averages over a season obscures a real effect (perhaps, for example, at the end of games). This piece will go a step deeper and look at every single drive and the relationship between defensive rest and its performance.
For this piece, I used the public play-by-play data from the wonderful nflscrapR project that collects statistics from 2009 to 2017 (with special thanks to Ron Yurko). I excluded the first drives of each half (defenses are well-rested for those drives), drives beginning in the last four minutes of the game (when teams are substantially less likely to score due to the clock being a factor), and sample sizes with fewer than 10 drives. Code is here.
The question addressed in this piece is whether the number of plays or amount of time that a defense has recently rested predicts how many points it allows on the subsequent drive, all else being equal. However, a simple look at points per drive versus time of rest (or plays of rest) isn't quite what we want, because drives that follow extremely short drives are more likely to begin after turnovers and thus have better field position.
Every figure in this piece will look at the same four factors as shown in the following graph. Starting from the upper left and moving clockwise, we have:
- defensive rest time on the most recent drive by number of plays run;
- defensive rest time on the most recent drive by time of possession;
- the number of plays faced by a defense as of the start of the drive;
- and time of possession faced by a defense as of the start of a drive.
The first two factors measure the amount of rest a defense has had since the last time it was on the field, and the last two measure how long a defense has been on the field to that point in a game.
The above figure is interesting and could be a piece by itself. The two top graphics show that teams start closer to the opponent's end zone following short drives, whether measured by number of plays or time of possession. This is due to two factors. First, defenses that take the field after a very short drive (i.e., fewer than four plays run) have typically seen their offense go three-and-out or turn the ball over, both of which tend to place the defense in poor field position. Second, teams are more likely to pin their opponents deep following longer drives than shorter drives. The R-squared values of about 0.1 mean that about 10 percent of the variation in starting field position can be explained by the length of the previous drive.
The two bottom graphics in the figure above show that starting field position is mostly constant throughout the game (close to the-30 yard line). The exceptions come at the very beginning (because the game opens with a kickoff, rather than a punt or turnover) and the very end (probably due to teams who have run a lot of plays tending to be in the lead and having opponents taking greater risks).
[ad placeholder 3]
A technical note on the R-squared values listed throughout the piece (feel free to skip this paragraph): R-squared is obtained through a drive-level regression of the outcome of interest on a cubic polynomial in the explanatory rest variable of interest (the order of the polynomial doesn't turn out to matter). The R-squared is small in, for example, the lower left graphic above despite the appearance of a relationship because the collapsed data in the graph obscures the tremendous variation at each point. For example, for the point (1, 74) -- the far left point on the lower left graph -- the vertical coordinate of 74 is the average of more than 4,600 drives which range in starting position from the 1-yard line to the 99-yard line. In technical terms, the variation in field position within previous plays run is enormous relative to the variation across previous plays run.
The following figure shows expected points per drive by defensive rest time, where expected points is defined relative to starting field position. For those interested, I performed a drive-level regression of points scored on a fifth-order polynomial in field position (yards from opponent end zone) to obtain expected points for a given drive based on starting field position. For those familiar with expected points added (EPA), this is not quite the same calculation, because EPA takes into account possible scores on subsequent drives, while I am only interested in the number of points scored on a given drive.
Because drives following very short drives tend to begin with better field position, we would expect more points to be scored based on field position effects alone. This is shown in the following figure:
Since the goal is to gauge the extent to which a more rested defense is a more effective defense, holding field position constant, for the remainder of the piece I show points per drive relative to expected points per drive (I also experimented with holding field position constant by looking at sample of drives following kickoffs or drives starting on one's own 20- to 30-yard line, with similar results).
Points Per Drive Versus Defensive Rest Time
Here is actual points per drive minus expected points per drive, where expected points per drive is based on starting field position as described above:
This is the main relationship of interest. The key points:
1. Running a lot of plays on a drive does not make your defense perform better on the subsequent drive (as shown in the upper left).
2. Chewing up a lot of clock on a drive does not make your defense perform better on the subsequent drive (upper right).
3. Running a lot of plays against a defense does not make it easier to score against that defense as the game goes on (lower left).
4. Running up a lot of time of possession against a defense does not make it easier to score against that defense as the game goes on (lower right).
[ad placeholder 4]
Despite working with enormous sample sizes (nearly 38,000 drives are used to construct the upper left figure, for example), the error bands always include zero (except for the very end of the time of possession graph, where offenses are less likely to score once they've reached high time of possession), and the R-squared values are always 0.000. This includes about 3,300 drives with 11 to 15 plays of defensive rest and more than 300 drives with 16-plus plays of defensive rest. In the upper right figure, there are more than 2,300 drives with defensive rest time exceeding six minutes.
If it were the case that rested defenses performed better, we would expect the top two graphics in the above figure to be downward-sloping (more rest = harder to score) and the bottom two graphics to be upward-sloping (defense on the field longer = easier to score). Instead, we see little relationship for any of the measures. The one possible exception would be the lower right figure, where defenses that have been on the field for a long time tend to allow fewer points at the very end of games (likely because at that point some offenses are trying to run out the clock).
I could stop here and conclude that there is no evidence that how long a defense has rested affects its performance. However, when digging into the numbers, I noticed that the flat relationship in the bottom two graphs of the above figure is the result of two factors: teams with the lead being less likely to score and teams trailing being more likely to score. Here is what the figure looks like when excluding drives that began in the fourth quarter with a lead:
We now see an upward slope at the end of games. Is this evidence that tired defenses perform worse as the game goes on? While this seems plausible, another possibility is that we are seeing the impact of teams trailing late in the game making more of a concerted effort to score (by, for example, passing the ball more and taking more risks). Here is the ratio of pass plays to all plays when excluding teams who are leading in the fourth quarter:
And indeed, we see in the bottom two graphs that teams that are tied or trailing late in games pass substantially more often. Since passing is more efficient than rushing, we would expect teams to be harder to stop once they start passing more, whether defensive rest time is important or not.
Let's isolate situations where the run/pass ratio isn't changing dramatically to investigate whether defensive rest time matters late in games. Here is the ratio of passes to total plays on drives that begin with four to 10 minutes left in the game, with the offensive team trailing:
Much better! We have isolated a situation where the run/pass ratio is roughly constant regardless of defensive rest time. In this game state (four to 10 minutes left in the fourth quarter with the possession team trailing), do rested defenses perform better? Let's take a look:
If drives beginning with four to 10 minutes left in the game were more likely to score because of their aggression rather than the defense being tired, we would expect these drives to score more points regardless of defensive rest time. A look at the above figure reveals that this is exactly the case: teams in this situation are nearly universally more likely to score, regardless of how rested the opposing defense is. However, while the confidence intervals in all four figures are mostly above zero (more likely to score), the lines are mostly flat, and less than 1 percent of the variation in adjusted points per drive can be explained by defensive rest time. Thus, the increase in likelihood of scoring when excluding teams who are leading in the fourth quarter appears to be due to the aggressiveness of trailing teams rather than defenses being tired.
For defenses trying to protect a lead with four to 10 minutes left in the game, the number of plays or time of possession they have already been on the field tells one nothing about how they will perform (R-squared of less than 1 percent). For example, a defense that has already been on the field for 55 plays is no better at holding a lead in the fourth quarter than a defense that has been on the field for 65 plays. A defense that has been on the field for 32 minutes is no worse at holding a lead in the fourth quarter than one that has been on the field for 25 minutes, or even 20 minutes. A team that allowed its defense to rest for eight minutes should expect its defense to perform just as well as one that only rested for one minute.
On Rushing and Defense
If rushing carried inherent value relative to passing in improving defensive performance, we would expect time of possession (the graphs on the right of all figures shown) to be more important than plays run (the graphs on the left) because the clock is more likely to continue running after a rushing play. In reality, neither matters, and we can cross off another purported benefit of rushing.
Putting this all together, the main -- and perhaps only -- channel through which an offense can help a defense on a per-drive basis is through field position. Turnovers and quick three-and-outs make a team more likely to give up points on the following drive, but this appears to have everything to do with field position and nothing to do with defensive rest time. In other words, whether it's one minute or eight minutes, knowing how long a defense has had to rest tells one nothing about how the defense will perform given its starting field position.
Why is the myth that a running game can help a defense so prevalent? I suspect that a contributing factor is the conflation of pace effects (in which defenses allow fewer points if they take the field on fewer drives) with actual changes in defensive efficiency. If two teams possess the ball an equal number of times, there is nothing inherently valuable about making the other team possess the ball fewer times, because your own team will also possess the ball fewer times (unless, perhaps, an underdog is pursuing a high-variance strategy). In the end, barring defensive or special teams scores, the team with more points per drive will win, whether there are a lot of drives or few drives. But there is no evidence that time of possession helps a defense perform better when it is on the field.
An economist by trade, Ben Baldwin uses large datasets to try to learn about human behavior. His work can be found on Field Gulls and GridFE; reach him on Twitter at @guga31bb.
30 comments, Last at 23 Mar 2018, 3:23pm
#10 by Pat // Mar 20, 2018 - 11:43am
It's definitely not. You can always dilute an R^2 to arbitrary levels by saturating the output. That is, if you make a function "if (x less than 0) y=0, if (x greater than 1000) y = 1000, else y=x", the R^2 over the region from x=0 to x=1000 is 1.0, but over the range x=(-infinity,+infinity) the R^2 approaches 0.
You can see that in the first graph of average starting field position. R^2 is testing the agreement of (m*resttime + b = fieldpos) for the best m/b, and obviously that model doesn't fit well. But (if(resttime less than a) m*resttime + b = fieldpos else fieldpos = c) would fit that data really, really well.
The interesting thing is this: look at the final graph, "defensive rest time on recent drive (plays)" versus "actual - expected points per drive". Now look at nplays < 5. That *really, really* looks like a pretty strong correlation there. And this is *totally* what you would expect if getting rest helps. Getting 10 plays off isn't that much better than getting 5 plays off.
... That being said, I don't think that shows that rest time helps. I think it shows that teams whose offenses that average fewer plays per drive have defenses that are worse than average, because bad teams tend to be bad.
#12 by guga31bb // Mar 20, 2018 - 1:44pm
Note that your description of what R^2 in the first graph is testing isn't quite right- see paragraph beginning with "A technical note on the R-squared values...". It's a cubic function of rest time rather than linear, and it fits reasonably well but not perfectly (comparing the estimated function to the red line on the graphs). I can add higher-order polynomials and get the R^2 to go from 0.100 to 0.103. The R^2 isn't low because of the model, it's low because there's so much unexplained variation in points per drive. (and I'm never sure how deep to go into the statistical nitty gritty stuff because I feel like most readers don't care that much, but appreciate the discussion here in the comments!)
On the last graph, the correlation increases from 0.04 to 0.06 when restricting the sample to nplays < 5. So it increases but is still very small.
#22 by Pat // Mar 21, 2018 - 12:05pm
Wait, now I'm confused. I thought I was just reading this wrong in the text: you're computing the R^2 on the *drive-by-drive* results and not the average? Is that right?
... Why? The drive-by-drive results are obviously going to have big variations, because... it's a game? Sorry if I'm missing something here: I mean, if I do a simple linear regression on extra point results (1 or 0) versus temperature, the resulting R^2 will be extremely low, because obviously the spread is dominantly random. But if you do the R^2 on the *average* results at each temperature, the R^2 will be a ton higher, because the random fluctuation in the average is greatly reduced. The result in both cases will still be statistically significant, though (depending on the number of data points, obviously).
I guess it depends on what you're trying to show with the R^2. Are you trying to show that the "rest time" effect is small compared to the natural variation in the drives (as in, small compared to the outcome of the plays)? Because that's what it seems is being shown, if I'm understanding things correctly.
To me, the more interesting thing is if it's statistically significant, which at least for nplays < 5, it looks pretty obvious that it's significant. Like I said previously, I don't think it actually shows that rest time = better performance, because there's going to be a correlation between offensive performance and defensive performance no matter what (bad teams tend to be bad all around).
#23 by guga31bb // Mar 21, 2018 - 1:30pm
>Are you trying to show that the "rest time" effect is small compared to the natural variation in the drives (as in, small compared to the outcome of the plays)?
Yes, this is exactly right. If you're trying to predict the outcome of a given coming drive, defensive rest appears to provide basically no information (in contrast to what TV broadcasters may try to tell us).
#26 by Pat // Mar 22, 2018 - 11:12am
"If you're trying to predict the outcome of a given coming drive, defensive rest appears to provide basically no information (in contrast to what TV broadcasters may try to tell us)."
Okay, let's give a direct example of what you're talking about: icing the kicker (*). For kicks around 40 yards or so, icing the kicker is something like a 5% reduction in chance of making the kick. If you do a regression on a kick-by-kick basis and compare "iced" (1) to "non-iced" (0), you'll get an R^2 of ~0.05, with a decent significance (depending on the number of kicks and number of kickers you've got in the sample). So you go ahead and conclude the effect's tiny, and when an announcer says "oh, should've iced him there" you say that's stupid, it's a tiny effect, it's not like icing works all the time.
And you're right. It's a small effect. But it's there. If it doesn't cost them anything (say, end of half/game), why would a coach turn down a chance at 5% chance at fewer points by their opponent?
Same thing here. If this would be real (and again, I *don't think it is*), the fact that it's a small effect just changes the cost/benefit for how you play offense. It doesn't make the effect unimportant. For icing the kicker, the fact that it's only a small effect doesn't mean you don't do it. It just means you have to consider how much you value the timeout needed. Likewise here. The fact that giving your defense more plays off doesn't change the result of the drive *much* doesn't mean you don't do it. You just have to consider how much it hurts you when you play offense.
I mean, going from 3 plays off to 5 plays off is about a quarter-point difference. Obviously the drive by drive results vary a ton (it's not like you can score a quarter point!), but if you could go from a 3 play drive to a 5 play drive with no cost (... you can't, but pretend you can) why wouldn't you do it?
(*: for the curious, most of the good studies at this point - see LeDoux, 2016, plus others - I think have concluded that icing does work, even though it's a small effect, and difficult to pull out of the data due to the limited dataset and the other variables. So it's a very good analog for this.)
#27 by Will Allen // Mar 22, 2018 - 12:11pm
And again, absent breaking it down to snap counts, probably consecutive snap counts, per player, for each team, you really don't have a sound grasp of how useful it will be to try to drain pass rushers of energy (and it is pass rushers that most matter, by a wide margin).
#30 by Pat // Mar 23, 2018 - 3:23pm
Yeah, significance ("p-value", or some other kind of test to see if it's real, or just random). It's one thing for the effect to be small, and something else entirely for it to be not there at all. Like icing the kicker, or weather effects. Or home-field advantage, for that matter. HFA would have an r^2 of something like 0.05 on win/loss. Obviously true that it's not a big effect, and the effect can of course be overstated, but doesn't mean it's not real.
#2 by ssereb // Mar 19, 2018 - 5:41pm
1. Would real-time as opposed to game-time be a more helpful measure?
2. Could the tendency of teams late in games to make a concerted effort not to run clock be skewing things? I'm talking about plays where receivers try to get out of bounds and teams avoid running.
#8 by guga31bb // Mar 20, 2018 - 10:36am
1. Yes, that would be ideal measure, but I don't have that in my data (and I don't know if anyone does). Given the effect sizes here I'd be surprised if it changed anything but that would be the very best thing to use
2. This probably isn't a driving factor because outside of 5 minutes left in the game, the clock eventually runs (after a brief pause) after players go out of bounds, and I'm throwing out the last 4 minutes of a game anyway
#3 by theslothook // Mar 19, 2018 - 11:17pm
I've done similar regressions and found Essentially the same result.
Since Ben is an economist, one thought I had was the potential issue of endogeneity - namely, teams change their behavior in anticipation of fatigue; ie - they call different plays, rotate more heavily, maybe dress more players based on how effective the offense they are facing is. One could try to control for this, but I certainly didn't attempt to.
#4 by ChrisLong // Mar 20, 2018 - 1:13am
I guess I think that your analyses don’t really address your question, or maybe you’re asking the (only slightly) wrong question. I think that you should be somehow doing this and accounting for team effects. Because there are sooooo many data points, and therefore a ton of noise, you need to try to account for that noise as much as possible. I don’t think I necessarily expect that across all drives and teams more rest= better defensive performance. I think that, given the situation and teams involved, a better rested defense does better at preventing the other team from succeeding than that same defense on less rest.
Idk how these guest articles work so maybe they don’t let you use proprietary data, but this seems like a perfect time to use DVOA as your metric.
#5 by RickD // Mar 20, 2018 - 1:49am
These graphs don't show that wearing out a defense cannot help your offense. They show that there is no trend demonstrating that teams wear out defenses to help their offenses.
Games like Super Bowl LI are the exception, not the rule. Usually a team that dominates time of possession that much in a game doesn't need to score again late in the game - usually a team that dominates the time of possession that much is way ahead, and late in the game they are just burning clock.
Did the Pats offense wear down the Falcons' defense in Super Bowl LI? Yes, that is something we could observe with our eyes.
The next question is: how do you wear down a defense? In Super Bowl LI the Patriots threw 63 passes and made 25 rushes. By way of comparison, the Falcons made 23 passes and 18 rushes. The time of possession margin was 40:31 for the Patriots and 23:27 for the Falcons. The Patriots used a passing game to wear down the passing defense. In other games you might see a rushing game used to wear down a rushing defense, but you don't see much of that anybody for the simple reason that teams just don't rush the ball as much as they used to. Also, while the rushing game burns the clock, the passing game doesn't. So five minutes of defending against a passing attack is going to wear down defenders a lot more than five minutes of defending against a rushing attack.
I've seen plenty of other games where a defense just wore out, including the Patriots' pass D versus the Colts with Manning and Super Bowl XXXVI, when their D was losing their ability to stop the Rams by the end of the game.
Then there's this:
' I excluded the first drives of each half (defenses are well-rested for those drives), drives beginning in the last four minutes of the game (when teams are substantially less likely to score due to the clock being a factor), and sample sizes with fewer than 10 drives.'
I don't understand why you are excluding the first drive of each half if you're trying to determine how fatigue impacts performance. It would seem to me that the largest contrasts would be between the first and last drives of each half, but you've cut them out. The reason cited (defenses are well-rested) for excluding these drives would seem to be good reason to include them.
#6 by guga31bb // Mar 20, 2018 - 8:59am
The problem with drives at the beginning of halves is there's no way to quantify rest time for the top two graphs in each figure because I do not have the amount of real time that elapsed since the last time the defense was on the field, and I didn't want to lump them in with other drives because the defense has had more time to rest. You're right in the sense that this quantifies the effect of rest among drives that aren't the first drives of halves and it's possible that defenses perform better when "fully" rested (to start a half). However, even if that were the case, there's no actionable information for teams because there's no way to manipulate the number of first-drive-of-halves that teams get.
#13 by Scott P. // Mar 20, 2018 - 1:51pm
Games like Super Bowl LI are the exception, not the rule. Usually a team that dominates time of possession that much in a game doesn't need to score again late in the game - usually a team that dominates the time of possession that much is way ahead, and late in the game they are just burning clock.
This is incorrect -- there is very little if any correlation between time of possession and dominating an opponent. In fact, dominant teams tend to score quickly and often don't gain much TOP.
#9 by Hoodie_Sleeves // Mar 20, 2018 - 11:24am
The reason the Graham stripsack happened had nothing to do with the defense being fresh - but it did have to do with the long drive.
That drive significantly changed the game state - the Patriots no longer had enough time on the clock to run a varied offense, and the Eagles no longer had to play an honest balanced defense - they were able to sell out to stop the pass.
#11 by jlaw37 // Mar 20, 2018 - 1:00pm
I'm curious to know what would it look like if you limited the sample size to good defenses. You could eliminate the noise of a team being scored on because it's just bad as opposed to tired. Bad defenses would likely have more variance on how and when teams score on them. Good defenses, however, can usually prevent teams from scoring on a more consistent basis which would allow you to see if they are performing worse when tired.
#14 by ChrisS // Mar 20, 2018 - 2:29pm
I am skeptical that a defense gets significantly more tired than an offense over the course of an average football drive. There are probably exceptions with hurry up situations where defense may get more tired but those situations (long drives with small play/TOP) are inherently unsuccessful for the defense. Perhaps those situations can be isolated and the DVOA of the first few plays could be compared to the last few.
#15 by Will Allen // Mar 21, 2018 - 8:24am
Very much disagree with your skepticism. Pass rushers use significantly more energy than pass blockers, and defensive players are trained to expend maximum effort from the snap, to the whistle, to get to the ball, whereas offensive players can let up once the chance of throwing a useful block ends. There's a reason teams endeavor to use a rotation of pass rushers, whereas absent injury, starting offensive linemen tend to play a higher percentage of the total snaps.
#17 by jtr // Mar 21, 2018 - 8:55am
I think you give the answer to your own point here when you bring up rotations. Yes, pass rushers expend more energy on a per-snap basis than the offensive tackles they're going up against. But those pass rushers rotate heavily, so they're playing something like half or two-thirds as many snaps as a tackle. I think Ben's work here shows that defensive coaches are running close to the optimal rotations, where defenders are getting enough rest that they fatigue at the same pace as opposing linemen.
#20 by ChrisS // Mar 21, 2018 - 10:30am
Yes. On offense 6 players generally play 100% of the snaps (7 if you have a very good WR). On defense maybe 2-3 players play more than 90% of the snaps. So on the vast majority of drives the defense rotates out to compensate for their extra effort. Which is why I said looking at hurry up situations (which limit subs) might be more fruitful.
#29 by Mountain Time … // Mar 22, 2018 - 6:40pm
Defenders being required to pursue the ball on a pass completed downfield is another reason they tire faster. This data (as well as anecdotal memories of a big DT hustling to make a tackle lots of yards past the LOS) suggest this is good strategy that shouldn't change.
I don't want to get into the possible effects of a rotation, but individually d-linemen absolutely will tire faster than o-linemen, and those are really the only two positional groups people mean when they talk about this.
#16 by Will Allen // Mar 21, 2018 - 8:29am
In fact, the more specific question should be "Do pass rushers get tired, and do defenses with tired pass rushers give up more points?". I strongly suspect the answer is yes, and you have to start looking at snap counts (teams vary a lot with pass rusher depth) to gain insight.
#21 by Will Allen // Mar 21, 2018 - 10:56am
Exactly. You're blind of don't observe how pass rusher fatigue allows for completed passes, and how offenses scheme to promote pass rusher fatigue, and score points as a result. Of course defenses get tired, and thus allow more points, relative to what they would if their pass rushers were fresh. We just lack the specific data needed to illuminate this phenomena more clearly.
#24 by theslothook // Mar 21, 2018 - 1:57pm
I have tried. I have included it in regular regressions on points allowed, controlling for down and distance etc. It just never showed up as a good variable. I tried number of plays on the field, I tried to do time accumulated per drive. Both just didn't workout in the data.
I agree with you in principle. YOu see them get tired and implicitly you know its going to cause problems.
My longstanding theory - most offenses just don't take advantage of it like the Patriots or Packers do and that its the other teams in the sample that overwhelming these results.
#25 by Will Allen // Mar 21, 2018 - 3:20pm
That, and the fact that you need to track consecutive snap counts for individual pass rushers to really get a handle on it, along with accounting for the quality difference between, say, each defenses third or fourth best edge rusher. Some teams have one really good edge rusher, and the rest are bad. Another team might have 3 slightly above average rushers. The latter team probably has a distinctly better pass defense, as the game wears on.
Game's complicated, isn't it?