Want to see what good middle linebacker play looks like? Watch this Alabama prospect work.
10 Nov 2006
by Mike Tanier
It's first-and-10 in the first quarter. The home team breaks the huddle. Their first play from scrimmage is a handoff to the running back. He's hit at the line but dives forward. Gain of two yards.
You grumble. Two yards just doesn't cut it on first-and-10. The running back should be able to do better; after all, he averages over four yards per carry, and the league rushing average is around four yards per attempt. Even average runners gain yardage in four-yard chunks. Why can't your favorite halfback be more consistent?
It turns out that the problem isn't with the running back, but with our perception of how that league rushing average is achieved. We may expect four yards per attempt, but a two-yard gain is the most common result for a running play, and no-gainers are almost as common as four-yard runs. We think of running plays as a way to generate consistent yardage with minimal risks. In fact, running plays, like passes, carry a high risk for a non-positive result.
To illustrate how rushing yardage is actually generated, we're going to analyze every rushing play over a two-and-a-half year period. We'll examine how many yards were gained on each play. Then we'll look at specific down-and-distance situations to determine the actual results for rushing on first-and-10 or second-and-long. Finally, we'll break down the statistics of some individual players so we can learn more about the differences between "big play" running backs and more consistent runners.
The NFL rushing average has hovered around 4.0 yards per carry for so long that I refer to the number as "Planck's Constant" in Pro Football Prospectus 2006. That number has colored our perception of what running backs should do on a play-by-play basis. Unfortunately, it's a misleading figure.
Last season, the official yards-per-carry average crept up to 4.1. The Football Outsiders figure was 4.17: 13,903 carries for 58,043 yards. Our data is always slightly different from the official data because we remove quarterback kneel plays (as well as spikes and Hail Marys) and make other minor changes to the official data. Either way, there's no indication that the modest increase is anything but year-to-year fluctuation.
The 4.0-4.1 yard average is an arithmetic mean: add up all the yards, divide by the attempts. The arithmetic mean is easily skewed by extremes in data. A 75-yard run can increase a starting running back's rushing average by several tenths of a point by the end of a season. This skewing always increases rushing averages: there are several 50+ yard rushes every year, but no 50+ yard losses on running plays.
We all know that a few big plays can make a mediocre running back's rushing average look great. But how much effect do long gains have on the league rushing average? The best way to see this is to break down every running play by distance. Table 1 shows the distribution of yardage gained on every rushing play in the NFL in three years: 2000, 2005, and the first six weeks of 2006. The three seasons were chosen so that the data would be current, but would also reflect any changes in the distribution over the last half decade. The table reveals a surprising fact: the mean carry may yield four yards, but the median carry yields only three yards, and the data distribution is centered at two yards.
|Table 1: Rushing Yardage Distribution|
Almost 90 percent of all runs fall into the range shown in the table. Over 20 percent of running plays gain zero or one yards. Factor in losses, and over one-fourth of all runs result in negative or negligible yardage. The rushing average for the plays in the -4-to-10 yard range in 2005 was 2.95 yards per attempt. Long runs make up only about nine percent of all rushing plays, but they increase the league rushing average by over 40 percent.
The percentages you see in Table 1 aren't broken down by down or field position. Obviously, some of those one-and-two yard runs are positive plays: first downs or touchdowns. But not many. Only about 1.7 percent of one-yard runs and 2.0 percent of two-yard runs yield first downs or touchdowns. Still, there are pollutants in the data which we should address. Are the distributions in Table 1 distorted by short-yardage carries or other factors?
Let's break the data down. We'll start by isolating one of the most basic situations in football: first quarter, first-and-10, in what Football Outsiders calls the "back zone" (between your own 20- and 39-yard lines). The offense is probably working though its pre-game script, and there's nothing about the score of the game or the field position to force the offense's hand or to tip off the defense about what to expect. Table 2 shows the gain distribution in these first down, early-game situations as compared to the overall distribution:
|Table 2: First-and-10, Back Zone vs. All Runs|
Pretty similar, right? The "on the chart" mean (the average of all rushes represented in the table) for first-and-10 situations is actually 2.66, lower than the overall overage, and it represents 88.6 of all rushes.
The overall Success Rate for rushing plays hovers around 40 percent. But on first down in the first quarter, the Success Rate was 31 percent in 2005 and the start of 2006 and 35 percent in 2000. For comparison's sake, let's look at the passing data. We don't usually use Success Rate in our analysis of the passing game, but we can compute it, and the Success Rate on first-down, first-quarter, back zone passing plays was 49 percent in 2005, 45 percent in 2000, and 57 percent in a small (188 play) sample of the first few games of 2006.
Now let's analyze the second down data. We'll use full-game, full-field data, but we'll break the carries down by the distance situation: 10+ yards to go, 7-9 yards to go, 4-6 yards to go, and 1-3 yards to go. Table 3 contains all of the data, plus the overall yardage distribution for easy comparison. For the record, the percentages in the chart represent nearly 10,000 rushing plays across three seasons.
|Table 3: All Second Downs|
Before we really crunch the numbers, let's get some John Madden wisdom out of the way. Madden is fond of saying that teams often follow up an incomplete pass on first down with a run on second-and-10. Our data shows that teams run on second down and 10 or more yards about 36.2 percent of the time. The data includes all manner of 2nd-and-long situations, some of which may have occurred after sacks or penalties. If we skimmed away all of these situations, the run percentage on 2nd-and-10 would probably creep up to around 40 percent. So teams don't run "all the time" on 2nd-and-10, but give Madden the benefit of the doubt: teams do run a lot on a down that many of us would associate with passing.
In fact, teams probably run more than they should, considering the 24 percent Success Rate on second-and-very long. The "in-the-box" mean on second-and-10+ is 3.12, indicating that teams are getting some benefit from running against a defense that is anticipating a pass. But the modest increase in yards-per-carry doesn't offset the high likelihood that the team will face a third-and-long situation. Even in "unpredictable" down-and-distance combinations like second down with 4-6 yards to go, the in-the-box mean, covering 89.2 percent of all carries, is just 2.85.
On early downs, the data is very stable and predictable. An offense's chance of gaining three or fewer yards on a running play hovers around 50-55 percent, even on first down in the first quarter or on second-and-10. Only about 17 percent of runs gain the 4-5 yards we would hope to get when a good running back is getting the ball in favorable circumstances. And of the 35-40 percent of runs that gain more than six yards, most are 6-8 yard gains that, when averaged with all of the losses and no gainers, don't get us anywhere near four yards per carry.
The rushing distributions clearly explain why teams throw so often on early downs. Fans may wish that teams would "establish the run" more early in games, but there are good reasons why they don't. That league average of four yards per attempt suggests that teams would face a lot of favorable third-and-two situations if they just plowed ahead with the running game. But working through the charts, we discover that the team that hands off on first and second down has about a 33 percent chance of getting caught in third-and-7 and nearly a 15 percent chance of ending up in a third-and-8 or worse scenario. The rewards for running the ball are often so low that teams are willing to assume the increased risk of a turnover and pass more frequently.
Rushing distributions tell us a great deal about how the modern NFL running game really works. The distributions may also tell us something about the merits of individual running backs. The data samples are smaller, and of course the overall quality of each running back's team is a huge, unaccounted for variable when making comparisons. As a way of negating the importance of team strength as well as studying the contrasts between rushing styles, let's examine a pair of teammates from 2005.
Last season, Tatum Bell gained 920 yards and averaged 5.3 yards per carry. Mike Anderson gained 1,014 yards but averaged just 4.2 yards per carry. Despite the wide disparity in yards per carry, DVOA and DPAR ranked Anderson as the better back. Anderson was 37.0 points above replacement level, Bell 16.4. Anderson was 20.3 percent better than the average back, Bell just 7.6 percent.
Bell's rushing average was inflated by several long runs: he had a 68, 67, and 55 yard run in 2005, plus several 35-yard runs. Anderson's longest carry of the season was 44 yards, and that was his only run longer than 25 yards. We all know that Bell is a "home run threat" while Anderson is more consistent. But is it really fair to downgrade Bell because of his long runs? We're inclined to downgrade Bell somewhat because so much of his value is contained in a few plays. But is that really fair? After all, gaining four yards at a time is great and all, but big plays are pretty important, too.
If we look at the rushing breakdowns for Anderson and Bell (Table 4), we can clearly see the contrast in their contributions. Anderson's yardage distribution is centered in the 2-3 yard range, while Bell's is centered in the 1-2 yard range, giving Anderson a full yard-per-play advantage on carry after carry. Bell's advantage, of course, is on runs of more than 10 yards. All but 6.5 percent of Anderson's runs gain from -4 to 10 yards, while 10.5 percent of Bell's runs are outside the chart (he only lost five yards on one play last season). Give them both 200 carries, and Bell will have eight more long runs than Anderson, and those runs will be longer than what Anderson can usually muster. But Anderson will gain an extra yard that Bell couldn't on dozens of other
|Table 4: Denver Yard Distributions, 2005|
|Mike Anderson||Tatum Bell||Overall|
Anderson's in-the-box mean was 3.36 yards per attempt, noting again that his "box" is larger. Bell's was just 2.67. What's interesting is that we tend to think of backs like Anderson as "ordinary" while backs with Bell's big-play potential are held in higher esteem. But Bell's rushing distribution is more in line with the league norms than Anderson's. He's very good, but his contributions are typical of what backs around the league provide. Anderson, at least in 2005, was the unique player, providing hard-to-get, down-in, down-out production.
The difference between Bell and Anderson suggests that "cloud of dust" backs are more valuable than "boom or bust" backs, but we must be careful when using cheesy labels. Our perception of a back's production profile are often way off. How would you classify Marshall Faulk in his prime? Probably as a boom-or-bust back, albeit one with lots of boom and only a little bust.
But Faulk's running distributions show that in his prime he was much more than a big-play machine. Table 5 shows Faulk's distributions from 2000. For comparison, let's also take a look at James Stewart's breakdowns from that season.
|Table 5: Marshall Faulk vs. James Stewart, 2000|
|M. Faulk||J. Stewart|
Faulk averaged 5.2 yards per carry and finished first in the NFL in DPAR. His longest run from scrimmage that year was just 38 yards; unlike Bell, his rushing average wasn't pumped up by a few huge gains. Stewart averaged 3.5 yards per carry. Despite gaining 1,184 yards, he posted a negative DVOA, and his DPAR of 3.6 ranked him 26th in the league.
Faulk's in-the-box mean was 3.37, a very good figure. What's more, his "box" only included 86 percent of his runs. Faulk had seven 12-yard runs, six 16-yard runs, and three 18-yard runs in 2000, giving him a very high percentage of 11-20 yard runs. But what's most remarkable about his production was his ability to avoid no-gainers and his above-average totals in the 3-5 yard range. Fast, shifty Faulk was just as good at using his skills to gain a yard or two as he was at burning defenses for long gains.
By contrast, Stewart's ability to avoid losses and pick up two or three yards couldn't offset his complete lack of big-play potential. At first glance, Stewart's distribution looks similar to Andersons. But his in-the-box mean of 2.8 is over a half-yard lower. The differences are subtle -- Anderson is a little more likely to gain five or six yards and a little less likely to lose yardage -- but they add up over a few hundred carries. And Stewart, like Anderson, concentrated 95 percent of his carries in the -4-to-10 yard range, so he had few 10-20 yard bursts to increase his productivity. Stewart, like Anderson, was providing a unique skill, which is why he was able to stay in the league for several years. Unlike Anderson, he wasn't a great exemplar of that skill, and the Football Outsiders metrics took him to task for it.
These players were carefully selected to illustrate certain points. If we analyzed dozens of backs, we may find common distribution patterns. If we studied the same back from year-to-year, we could determine if those patterns are stable and predictive. Some of that research is incorporated in the calculations for Adjusted Line Yards and other stats. The rest, unfortunately, is outside the scope of this little article.
Teams don't generate rushing yards in three-, four-, or five-yard bursts. They gain it through punctuated equilibrium, waiting through dozens of minimal gains for a few big plays per game.
And those big plays aren't that big. We've focused on gains of ten or less in this article, ignoring the 10.5 percent or so of plays that yield more yardage. The vast majority of those runs gain 11-20 yards: 6.9 percent overall. Almost 25 percent of the rushing yardage gained in the NFL is generated on runs of 11-20 yards. There were 960 such runs last year: 30 per team, or just over two per team per game. Amazingly nearly 10 percent of all rushing yardage is generated on runs of 30 or more yards, plays which occur about four times per year for a typical team.
These distribution breakdowns are so interesting that they might seduce us into making some wacky conclusions. Keep in mind that all of these averages and distribution patterns are situation dependent. We might look at the data and suggest that teams stop running the ball altogether on second-and-10, but of course the Success Rate on passing plays would dip sharply if teams stopped threatening to run. These league-wide averages don't necessarily apply to individual teams, so teams with a quality running game may have different distributions that would suggest different optimal strategies. The Bell and Anderson data, for example, indicates that the Broncos have a more versatile running game than the average team, and observation (i.e. actually watching games instead of craning over spreadsheets) bears this out.
Without further study, we shouldn't leap to grand conclusions. But we know this much: if we expect to gain four or five yards on every running play, we're going to be disappointed most of the time. No wonder passing totals have been creeping up for decades. If all a handoff gets you is two yards and a cloud of dust, you might as well throw the ball.
93 comments, Last at 10 Jul 2007, 4:15pm by DC Greg