Guest column by Caio Brighenti
"Running backs don't matter." If you've read any analytics article about running in the NFL, odds are you've heard that phrase. But, if you watched the 2020 playoffs, then you saw firsthand how Derrick Henry against the Patriots and Raheem Mostert against the Packers proved that a quality run offense can win big games. So, what makes for a good run game?
I explored this question in my submission for this year's NFL Big Data Bowl student subcompetition. Instead of tackling the strategic question of whether it's worth it to run at all, I decided to investigate what separates runs that work from those that don't.
To start, let's look at two plays that were nearly identical from the runner's perspective, but that had vastly different results. In both plays below, the runner received the ball 4 yards behind the line of scrimmage, approximately one second after the handoff, and was moving towards the left end. However, Bilal Powell gained 12 yards, while Wendell Smallwood lost 4.
So, what happened? Why did the defense stop Smallwood almost immediately while Powell escaped for a first down? It's not always easy to tell from the film, so we can turn to the bird's-eye view offered by the dots. In these plots, the runner is shown as the white dot.
Here, it becomes clear where these two plays differed: Smallwood had two defenders moving into the space he intended to run towards, while Powell was fortunate to have wide open space ahead of him.
While looking at the dots visually gives us the information we needed, this approach isn't at all quantitative and doesn't provide any measurable way to describe either play. To quantify what's happening in both plays, we can borrow a concept from soccer analytics: pitch control, or field control for those offended by the word pitch.
The math behind field control can get complicated, but the concept is straightforward. If we assume that each player on the field controls some area around them based on where they are and where they're moving to, and we can find a way to calculate this, then we can arrive at each team's overall field control by calculating each player's control and adding it up across the entire team. Then, if we take the difference between the offensive and defensive control, we find which team has ownership over each point on the field.
With this concept of field control, we can get a measurable understanding of what made the two example plays different. In this plot, the yellow and purple areas represent areas of offensive and defensive control, respectively. Blueish-green areas represent neutral space, where neither team has an advantage.
While Powell has plenty of neutral or offensive-controlled space between him and the first-down marker, in the case of Smallwood the dark blue defensive control wraps around him, cutting off his path.
It's clear in these two examples that quantifying field control is a good approach for identifying differences between plays that work and those that don't, but if we're interested in making conclusions about running in general then looking at just two plays isn't enough. Instead, I calculated field control in a standard area around the line of scrimmage for all 23,000 plays available in this year's Big Data Bowl dataset.
Once I had field control calculated for each play in the dataset, I grouped runs based on the runner's direction of motion and looked at what the average field control looked like for each type of run. The interpretation for these plots is the same as before -- yellow represents offensive control, and purple defensive control.
Initially, these plots aren't particularly interesting. In all three groups, the defense has greater control to the right of the line of scrimmage, and the offense to the left of the line of scrimmage. But, if we instead look at the difference between successful and unsuccessful plays, the results are far more interesting. In this case, I define successful plays as those gaining more than 1 yard.
Note that the interpretation here is slightly different -- yellow represents areas where successful plays have greater offensive control than unsuccessful plays, and the opposite for purple.
Finally, we can answer the question we set out to address: what separates runs that work from those that don't? In these plots, it's obvious the biggest difference between successful and unsuccessful runs is control over specific spots on the line of scrimmage. The exact position of this spot also varies through each group, demonstrating a relationship between the runner's initial direction and where this gap should be.
Interestingly, successful plays actually have less control over the space past the line of scrimmage than unsuccessful plays. This suggests that the amount of space the offense can control is finite -- instead of aiming to control more space overall, teams might want to instead focus all resources in producing a single gap at the line of scrimmage.
In short, this application of tracking data confirms what the analytics has long suggested about the run game: the space created by the offensive line is what makes for consistently good runs. Derrick Henry wouldn't have gotten any more yards than Wendell Smallwood did, and I probably could've gotten the first down with the amount of space Bilal Powell had.
This isn't to say that running backs are irrelevant. In the words of Josh Hermsmeyer, running backs are just all good -- they're a solved problem.
Caio Brighenti is an undergraduate in his final year at Colgate University and a finalist in this year's NFL Big Data Bowl. You can follow him and his football analytics work at @CaioBrighenti on Twitter.