Guest column by Cole Jacobson
As NFL offensive playcallers have continued to become more aggressive over the past decade, a growing trend throughout the league has been the importance of fourth downs. The benefits of leaving the offense on the field on fourth downs more often have already been well documented by data analysts nationwide. But while the mindset of risk aversion is slowly fading in football, the conclusion of "teams should go for it on fourth down more often" doesn't necessarily entail how those teams should approach those attempts. Often, coaches call timeouts immediately before fourth downs in order to decide whether they actually want to leave the offense out there, and/or figure out what the optimal play call will be. Because of this, a natural question forms: does calling a timeout before a fourth-down play help an offense's chances of succeeding? Using data from NFLScrapR, I attempted to find out.
For the sake of keeping this to a reasonable length, I won't publish the R code here (though I'm not opposed to sharing it privately with anyone interested). But as an abridged version, I used all of the play-by-play data available via NFLScrapR for the past 10 completed seasons, which I placed into a data frame. I removed all broken plays from that set (e.g., a botched field goal attempt where the holder keeps and runs with the ball, which NFLScrapR classifies as neither a rush nor a pass). From there, I isolated all fourth downs not resulting in punts or field goals into a data frame of 4,633 plays. I then split that data frame into the 907 plays that came immediately after timeouts and the 3,726 plays that didn't. I performed some further manipulations which will be detailed later, including classifying plays as "Short" for fourth-and-2 or less, "Med" for fourth-and-3 to -6, and "Long" for fourth-and-7-plus.
The summary of the project: when it comes to all fourth downs overall, there is no evidence that calling a timeout helps the offense in a statistically significant manner. Furthermore, if we stratify the data by play type and distance, we can reach more detailed conclusions. The data suggests that if we split our set into six categories based on both distance to go (i.e. short vs. medium vs. long) and by play type (run vs. pass), calling a timeout does not appear to clearly benefit the offense in any of those six instances. At first glance, the numbers suggest that in one of the six categories (fourth-and-long runs), calling a timeout is actually detrimental to the offense, but a deeper analysis of those plays implies that this conclusion is very noisy based on its small sample size.
The Basics: Timeout vs. No Timeout
To embark on this project, the clear first step was to compare all plays following timeouts with all plays not following timeouts. To do such a task without considering play type and/or distance to go is quite simple, involving a mere two-sample T-test. If we perform a T-test on the conversion rates of all fourth-down plays following timeouts vs. all fourth-down plays not following timeouts, we see that the mean conversion rate for fourth downs following timeouts was slightly higher than that for plays not following timeouts (by a 0.504 to 0.501 margin), but the p-value is north of 0.8, which is not at all indicative of a systemic edge in favor of the plays following timeouts.
One potentially interesting side note is that if we run the same T-test using win probability added (WPA) instead of conversion rate, we see a somewhat different result. The mean WPA for fourth-down plays following a timeout was 0.022, while the mean WPA for those not following a timeout was 0.014, and the T-test gave a relatively small p-value of 0.058. The gap between the two numbers still could be, and probably is, due to random chance, but it could also signify that calling a timeout might be more likely to lead to a more explosive play, even if it doesn't necessarily increase the chance of merely converting the fourth down. This is a concept that will be explored later in the piece.
Splitting by Distance
The next step is to stratify our data by the distance to go on each play. To split the categories up individually by each yard likely would've led to too small sample sizes, which is why I went with the "Short," "Medium," and "Long" system. The data frame has 2,300 fourth-and-short plays, 1,211 fourth-and-medium plays, and 1,122 fourth-and-long plays. A break down of all of these plays can be seen in the following table:
For those who favor a more visual approach, see the following graph:
The black brackets represent 95% confidence intervals. Generally, if these intervals overlap, it means that we can't confidently say that there's a significant difference between the two groups being compared. All three pairs of above intervals overlap, indicating that in all three distance categories, calling a timeout doesn't seem to noticeably help or hurt the offense.
Runs vs. Passes: Does it Make a Difference?
If distance wasn't able to exemplify any impact created by timeouts, incorporating further divisions based on play type would be the logical next step. As a frame of reference, all fourth-down passes have a mean conversion rate of 0.429, and fourth-down runs have a mean rate of 0.646.
With that knowledge, we can bring timeouts into the mix again. The following is a table that breaks down success rates for all fourth-down runs and passes separately, both with and without preceding timeouts (but not yet accounting for distance to go):
That data can be seen in graphical form here:
A similar conclusion to our first graph is reached. Even when we break into runs and passes separately, calling a timeout has no apparent impact on the success of either.
Combining it All: Play Type and Distance to Go
We've seen that the length of a fourth-down attempt on its own doesn't seem to indicate any positive value in calling a timeout, nor does the "run vs. pass" designation of a play. What happens if we look at both simultaneously? The following table is the most detailed one yet, grouping all fourth-down plays separately based on distance, run vs. pass, and whether a timeout was called or not:
For more visual thinkers, the same information is conveyed in the following three graphs:
At first glance, there are hints that a timeout can have some substance. Particularly, it looks like the gaps between "timeout vs. no timeout" on fourth-and-long runs may be worth looking into. (As an aside, the first of the three graphs clearly indicates that on fourth-and-short, a run is more effective than a pass.)
Do the graphs deceive us, or does calling a timeout actually harm the offense on fourth-and-long runs? Once we analyze more deeply, we can see it's the former. There are only 13 fourth-and-long runs following a timeout, a small enough sample that it's reasonable to actually dissect the plays one by one. After doing so, we can see that the data is hollow: specifically, three of the 13 plays involved plays late in the fourth quarter of blowout games where the offensive team either took an intentional safety or ran around in the backfield for as long as possible to kill clock.
As such, there are only 10 total "real" plays meeting the criteria of fourth-and-long, run, and following a timeout, over the last 10 seasons. This is simply way too small of a sample size to be drawing any conclusions about. Furthermore, of the 52 fourth-and-long runs not following a timeout, 28 of them are fake punts or field goals. The bottom line is that running the ball on fourth-and-7-plus out of a traditional offensive formation is both incredibly rare and ill-advised, and the presence of a timeout doesn't change that.
Does a Timeout Help One Get "The Big Play?"
Our earlier data suggested that a timeout could be more likely to lead to a more explosive play. To find out if there's any substance to this, we can create a new metric called "Big Plays," which I deemed to be any fourth-down play that was either a touchdown, or a conversion that picked up 10-plus yards. The following graph breaks down how likely big plays are by distance to go:
At first glance, our discrete intervals on the right suggest that a big play becomes far more likely when a timeout is called on fourth-and-short. But a series of successive T-tests proved this to be misleading. Specifically, the average fourth-and-short run that came after a timeout was 7 yards closer to the end zone than the average one that came without a timeout, with a P-value of 9.96 * 10-7 (in other words, not due to random chance). Thus, we can't confidently say that calling a timeout makes a "big play" more common; rather, coaches simply tend to use timeouts more often when closer to the end zone.
Possible Sources of Error/Other Comments
As is the case with any statistical analysis project, there are some factors that are very difficult to account for. For starters, there's the issue of how to handle the misclassified plays (like the example of the botched field goal attempt). From a coding standpoint, omitting them made the most sense, because the fact that NFLScrapR classifies them as neither passes nor runs would complicate the calculations.
Another key question is how to handle the fake field goals/punts that were actually done on purpose. These are technically offensive plays, but because they are defended so differently due to both sides having special teams formations pre-snap, it's easy to argue that these should also be omitted. However, it would require significantly more complex code involving text analysis to detect and remove those plays, and those plays were infrequent enough that they didn't hugely skew any of the data (save for the fourth-and-long runs subset, which was already very small).
There are also some plays where a fourth-down conversion wasn't that meaningful, and the defense didn't care about allowing it. Consider a fourth-and-short from near midfield on the last play of the first half, for example. Even if a team gets a fourth down conversion, it was still a success from the defense's standpoint. Like was the case with fake field goals, these plays were infrequent enough that it would've been counter-productive to detect and remove them.
Finally, in perhaps the most arguable concept here, I didn't isolate plays where the offensive team called timeout, instead looking at all fourth downs that followed a timeout at all. This was partially in an effort to boost our reasonably small sample size, but primarily because in the context of the project, which team called the timeout isn't that important. Whether the offense or defense makes the call, both teams get 60-plus seconds to figure out how they are approaching the upcoming play, and the purpose of the project is to determine whether a fourth-down conversion is more likely or not after that period happens. It certainly could be material for a future project to look into whether the offense or defense calling the timeout makes a difference.
Thanks for the read, and I hope to hear any positive or negative feedback. I'd like to give a special thanks to Keegan Abdoo of NFL Next Gen Stats and Bailey Joseph of the Oklahoma City Thunder for giving specific R tips.
Cole Jacobson is an Editorial Researcher at the NFL Media office in Los Angeles. He played varsity sprint football as a defensive lineman at the University of Pennsylvania, where he was a 2019 graduate as a mathematics major and statistics minor. With any questions, comments, or ideas, he can be contacted at Cole.Jacobson@nfl.com. You can also follow him on Twitter @ColeJacobson32.