Stat Analysis
Advanced analytics on player and team performance

Timeouts and Fourth Downs

Taylor Dalton
Photo: USA Today Sports Images

Guest column by Cole Jacobson


As NFL offensive playcallers have continued to become more aggressive over the past decade, a growing trend throughout the league has been the importance of fourth downs. The benefits of leaving the offense on the field on fourth downs more often have already been well documented by data analysts nationwide. But while the mindset of risk aversion is slowly fading in football, the conclusion of "teams should go for it on fourth down more often" doesn't necessarily entail how those teams should approach those attempts. Often, coaches call timeouts immediately before fourth downs in order to decide whether they actually want to leave the offense out there, and/or figure out what the optimal play call will be. Because of this, a natural question forms: does calling a timeout before a fourth-down play help an offense's chances of succeeding? Using data from NFLScrapR, I attempted to find out.

For the sake of keeping this to a reasonable length, I won't publish the R code here (though I'm not opposed to sharing it privately with anyone interested). But as an abridged version, I used all of the play-by-play data available via NFLScrapR for the past 10 completed seasons, which I placed into a data frame. I removed all broken plays from that set (e.g., a botched field goal attempt where the holder keeps and runs with the ball, which NFLScrapR classifies as neither a rush nor a pass). From there, I isolated all fourth downs not resulting in punts or field goals into a data frame of 4,633 plays. I then split that data frame into the 907 plays that came immediately after timeouts and the 3,726 plays that didn't. I performed some further manipulations which will be detailed later, including classifying plays as "Short" for fourth-and-2 or less, "Med" for fourth-and-3 to -6, and "Long" for fourth-and-7-plus.

The summary of the project: when it comes to all fourth downs overall, there is no evidence that calling a timeout helps the offense in a statistically significant manner. Furthermore, if we stratify the data by play type and distance, we can reach more detailed conclusions. The data suggests that if we split our set into six categories based on both distance to go (i.e. short vs. medium vs. long) and by play type (run vs. pass), calling a timeout does not appear to clearly benefit the offense in any of those six instances. At first glance, the numbers suggest that in one of the six categories (fourth-and-long runs), calling a timeout is actually detrimental to the offense, but a deeper analysis of those plays implies that this conclusion is very noisy based on its small sample size.

The Basics: Timeout vs. No Timeout

To embark on this project, the clear first step was to compare all plays following timeouts with all plays not following timeouts. To do such a task without considering play type and/or distance to go is quite simple, involving a mere two-sample T-test. If we perform a T-test on the conversion rates of all fourth-down plays following timeouts vs. all fourth-down plays not following timeouts, we see that the mean conversion rate for fourth downs following timeouts was slightly higher than that for plays not following timeouts (by a 0.504 to 0.501 margin), but the p-value is north of 0.8, which is not at all indicative of a systemic edge in favor of the plays following timeouts.

One potentially interesting side note is that if we run the same T-test using win probability added (WPA) instead of conversion rate, we see a somewhat different result. The mean WPA for fourth-down plays following a timeout was 0.022, while the mean WPA for those not following a timeout was 0.014, and the T-test gave a relatively small p-value of 0.058. The gap between the two numbers still could be, and probably is, due to random chance, but it could also signify that calling a timeout might be more likely to lead to a more explosive play, even if it doesn't necessarily increase the chance of merely converting the fourth down. This is a concept that will be explored later in the piece.

Splitting by Distance

The next step is to stratify our data by the distance to go on each play. To split the categories up individually by each yard likely would've led to too small sample sizes, which is why I went with the "Short," "Medium," and "Long" system. The data frame has 2,300 fourth-and-short plays, 1,211 fourth-and-medium plays, and 1,122 fourth-and-long plays. A break down of all of these plays can be seen in the following table:

For those who favor a more visual approach, see the following graph:

The black brackets represent 95% confidence intervals. Generally, if these intervals overlap, it means that we can't confidently say that there's a significant difference between the two groups being compared. All three pairs of above intervals overlap, indicating that in all three distance categories, calling a timeout doesn't seem to noticeably help or hurt the offense.

Runs vs. Passes: Does it Make a Difference?

If distance wasn't able to exemplify any impact created by timeouts, incorporating further divisions based on play type would be the logical next step. As a frame of reference, all fourth-down passes have a mean conversion rate of 0.429, and fourth-down runs have a mean rate of 0.646.

With that knowledge, we can bring timeouts into the mix again. The following is a table that breaks down success rates for all fourth-down runs and passes separately, both with and without preceding timeouts (but not yet accounting for distance to go):

That data can be seen in graphical form here:

A similar conclusion to our first graph is reached. Even when we break into runs and passes separately, calling a timeout has no apparent impact on the success of either.

Combining it All: Play Type and Distance to Go

We've seen that the length of a fourth-down attempt on its own doesn't seem to indicate any positive value in calling a timeout, nor does the "run vs. pass" designation of a play. What happens if we look at both simultaneously? The following table is the most detailed one yet, grouping all fourth-down plays separately based on distance, run vs. pass, and whether a timeout was called or not:

For more visual thinkers, the same information is conveyed in the following three graphs:

At first glance, there are hints that a timeout can have some substance. Particularly, it looks like the gaps between "timeout vs. no timeout" on fourth-and-long runs may be worth looking into. (As an aside, the first of the three graphs clearly indicates that on fourth-and-short, a run is more effective than a pass.)

Do the graphs deceive us, or does calling a timeout actually harm the offense on fourth-and-long runs? Once we analyze more deeply, we can see it's the former. There are only 13 fourth-and-long runs following a timeout, a small enough sample that it's reasonable to actually dissect the plays one by one. After doing so, we can see that the data is hollow: specifically, three of the 13 plays involved plays late in the fourth quarter of blowout games where the offensive team either took an intentional safety or ran around in the backfield for as long as possible to kill clock.

As such, there are only 10 total "real" plays meeting the criteria of fourth-and-long, run, and following a timeout, over the last 10 seasons. This is simply way too small of a sample size to be drawing any conclusions about. Furthermore, of the 52 fourth-and-long runs not following a timeout, 28 of them are fake punts or field goals. The bottom line is that running the ball on fourth-and-7-plus out of a traditional offensive formation is both incredibly rare and ill-advised, and the presence of a timeout doesn't change that.

Does a Timeout Help One Get "The Big Play?"

Our earlier data suggested that a timeout could be more likely to lead to a more explosive play. To find out if there's any substance to this, we can create a new metric called "Big Plays," which I deemed to be any fourth-down play that was either a touchdown, or a conversion that picked up 10-plus yards. The following graph breaks down how likely big plays are by distance to go:

At first glance, our discrete intervals on the right suggest that a big play becomes far more likely when a timeout is called on fourth-and-short. But a series of successive T-tests proved this to be misleading. Specifically, the average fourth-and-short run that came after a timeout was 7 yards closer to the end zone than the average one that came without a timeout, with a P-value of 9.96 * 10-7 (in other words, not due to random chance). Thus, we can't confidently say that calling a timeout makes a "big play" more common; rather, coaches simply tend to use timeouts more often when closer to the end zone.

Possible Sources of Error/Other Comments

As is the case with any statistical analysis project, there are some factors that are very difficult to account for. For starters, there's the issue of how to handle the misclassified plays (like the example of the botched field goal attempt). From a coding standpoint, omitting them made the most sense, because the fact that NFLScrapR classifies them as neither passes nor runs would complicate the calculations.

Another key question is how to handle the fake field goals/punts that were actually done on purpose. These are technically offensive plays, but because they are defended so differently due to both sides having special teams formations pre-snap, it's easy to argue that these should also be omitted. However, it would require significantly more complex code involving text analysis to detect and remove those plays, and those plays were infrequent enough that they didn't hugely skew any of the data (save for the fourth-and-long runs subset, which was already very small).

There are also some plays where a fourth-down conversion wasn't that meaningful, and the defense didn't care about allowing it. Consider a fourth-and-short from near midfield on the last play of the first half, for example. Even if a team gets a fourth down conversion, it was still a success from the defense's standpoint. Like was the case with fake field goals, these plays were infrequent enough that it would've been counter-productive to detect and remove them.

Finally, in perhaps the most arguable concept here, I didn't isolate plays where the offensive team called timeout, instead looking at all fourth downs that followed a timeout at all. This was partially in an effort to boost our reasonably small sample size, but primarily because in the context of the project, which team called the timeout isn't that important. Whether the offense or defense makes the call, both teams get 60-plus seconds to figure out how they are approaching the upcoming play, and the purpose of the project is to determine whether a fourth-down conversion is more likely or not after that period happens. It certainly could be material for a future project to look into whether the offense or defense calling the timeout makes a difference.

Thanks for the read, and I hope to hear any positive or negative feedback. I'd like to give a special thanks to Keegan Abdoo of NFL Next Gen Stats and Bailey Joseph of the Oklahoma City Thunder for giving specific R tips.


Cole Jacobson is an Editorial Researcher at the NFL Media office in Los Angeles. He played varsity sprint football as a defensive lineman at the University of Pennsylvania, where he was a 2019 graduate as a mathematics major and statistics minor. With any questions, comments, or ideas, he can be contacted at You can also follow him on Twitter @ColeJacobson32.


9 comments, Last at 01 Nov 2019, 1:43pm

1 Nice analysis. Thanks! There…

Nice analysis. Thanks!

There's no data, of course, to indicate why time outs don't help, so I'll speculate it has to do with the symmetrical nature of time outs. The offence gets to discuss what they want to do, but the defense also gets to discuss what to look for.

5 Thanks for the read,…

Thanks for the read, appreciate it. To be honest, entering the project, I thought a timeout might actually hurt the offense. With the basic premise being that a quick snap might lead to chaos from the defensive standpoint, since they are the ones who have to react to the offense rather than vice versa. But in retrospect, the symmetrical nature makes sense. Chaos can harm the offense as well, e.g. the defense blitzes the house and the offensive line is confused on whether it should protect with a slide, man, or mixed scheme. Overall, it seems like having time to talk specific roles out benefits both sides relatively evenly.

2 Another thing to consider

It seems like some noise in the data might be attributable to the motivation of the team calling the timeout. In some cases, particularly in late game situations, the timeout is being used simply to prevent loss of time or to make sure the right personnel is on the field, not to design/call the perfect play. For example, a team might be forced to take a timeout in the last two minutes due to a play stoppage for review with a running clock, which would cause a 10-second runoff or a third-down sack or stop in bounds in a similar situation.

6 Definitely a good call, and…

Definitely a good call, and something that space constraints would've made difficult to explore here. I think the logical immediate next step of this project would be to analyze whether the offense or defense called timeout, as mentioned in the latter paragraphs. Then from there, an even further one would've been to explore those motives like you said. E.g. if we isolate situations where a team is trailing and there are under 2 minutes remaining, that timeout was probably called for clock-stopping purposes rather than scheming a play. So certainly material that could be fodder for a future project.

3 Why use a T-test?

Can you speak a bit to the statistical methodology used? I was under the impression that a T test is not the appropriate hypothesis test to use when you have binary data, which it seems like you do - the “conversion rate” being tested is just the average of a binary variable. Wouldn’t a proportion test/chi square be more appropriate in this case?

7 Good questions for sure…

In reply to by takeleavebelieve

Good questions for sure. Given the binary nature of fourth downs (you either convert or you don't), a chi-squared test would've given very similar results here. I opted for the T-test because it can pretty easily determine whether the difference in means between two sampled populations is legitimate or not, because of the P-value that it provides. A chi-squared test should've been just as effective, though. Unfortunately for space constraints, I cut nearly all of the actual R code from this version of the piece, but I still have it all personally. If you email, I'd be happy to send over screenshots of the individual T-tests and explain the thought process behind each.

4 Trade-off in WPA

I like that you touched on the idea of WPA in terms of explosive plays, and my first thought was whether the cost of using a timeout, even if it allowed you to successfully call the right play, was worth not having it later. As you mentioned, cutting it down to offensive timeouts will limit the sample size more, but it seems like many head coaches still haven't figured out exactly how valuable each timeout can be.

8 Really good obsveration, and…

In reply to by jtrucksis

Really good obsveration, and honestly something I didn't even consider during the process of this piece. Plays into the more broad concept of "exactly how valuable is the average timeout", and then how that value varies based on various score, time, down-and-distance situations. Probably something I wouldn't have been able to fit here even if I had wanted to, but it could definitely be explored more later.

9 Good article, it's nice to…

Good article, it's nice to find fairly definitive results in this sort of noisy dataset. And I learned there is something called Sprint Football.