METHODS TO OUR MADNESS
THE SHORT VERSION: DVOA is a method of evaluating teams, units, or players. It takes every single play during the NFL season and compares each one
to a league-average baseline based on situation. DVOA measures not just yardage, but yardage towards a first down: five yards on 3rd-and-4 are worth more than five yards on 1st-and-10 and much more than five yards on 3rd-and-12. Red zone plays are worth more than other plays. Performance is also adjusted for the quality of the opponent. DVOA is a percentage, so a team with a DVOA of 10.0% is 10 percent better than the average team, and a quarterback with a DVOA of -20.0% is 20 percent worse than the average quarterback. Because DVOA measures scoring, defenses are better when they are negative. For more detail, read below.
Please feel free to contact us with questions and comments about our new statistics
using the contact form.
The majority of the ratings featured on FootballOutsiders.com are based on DVOA, or Defense-adjusted Value Over Average.
DVOA breaks down every single play of the NFL season to see how much success offensive players achieved in each specific situation compared to the league
average in that situation, adjusted for the strength of the opponent.
The NFL determines the best players by adding up all their yards no
matter what situations they came in or how many plays it took to get them. Now
why would they do that? Football has one objective-to get to the end zone-and
two ways to achieve that, by gaining yards and getting first downs. These two
goals need to be balanced to determine a player's value or a team's performance.
All the yards in the world aren't useful if they all come in eight-yard chunks
on third-and-10.
The popularity of fantasy football only exaggerates the problem. Fans have
gotten used to judging players based on how much they help fantasy teams win and
lose, not how much they help real teams win and lose. But fantasy scoring skews
things by counting the yard between the one and the goal line as 61 times more
important than all the other yards on the field. Let's say Steve Smith catches a
pass on third-and-15 and goes 50 yards but gets tackled two yards from the goal
line, and then DeShaun Foster takes the ball on first-and-goal from the two-yard
line and plunges in for the score. Or, let's say that the Falcons take a
touchback on the opening kickoff, and the Carolina defense stuffs Warrick Dunn
twice, and on third-and-10 Michael Vick throws the ball into the arms of Ken
Lucas, who gets taken down by Alge Crumpler at the two-yard line. Then on the
ensuing first-and-goal, Foster scores a touchdown.
Has Foster done something special? Not really. When an offense gets the ball
on first-and-goal at the two-yard line, they are going to score a touchdown five
out of six times. In the first situation, Foster is getting the credit that
primarily belongs to the passing game. In the second situation, Foster is
getting the credit that primarily belongs to the defense.
DVOA does a better job of distributing credit for scoring points and winning
games. It uses a value based on both total yards and yards towards a first down,
based on work done by Pete Palmer, Bob Carroll, and John Thorn in their seminal
book, The Hidden Game of Football. On first down, a play is considered a
success if it gains 45 percent of needed yards; on second down, a play needs to
gain 60 percent of needed yards; on third or fourth down, only gaining a new
first down is considered success.
We then expand upon that basic idea with a more complicated system of
"success points." A successful play is worth one point, an
unsuccessful play zero points. Extra points are awarded for big plays, gradually
increasing to three points for 10 yards, four points for 20 yards, and five
points for 40 yards or more. There are
fractional points in between. (For example, eight yards on third-and-10 is worth
0.63 "success points.") Losing four yards is -1 point, losing 12 yards
is -1.8 points, an interception is -6 points, and a fumble is worth anywhere from -1.70 to -3.98
points depending on how often a fumble in that situation is lost to the defense
- no matter who actually recovers the fumble. Red zone plays are worth 20
percent more, and there is a bonus given for a touchdown.
(The system is a bit more complex than the one in Hidden Game thanks to a
number of improvements since we launched the site in 2003.)
Every single play run in the NFL gets a "success value" based on
this system, and then that number gets compared to the average success values of
plays in similar situations for all players, adjusted for a number of variables.
These include down and distance, field location, time remaining in game, and
current scoring lead or deficit. Teams are always compared to one standard, as
the team made its own choice whether to pass or rush. However, when it comes to
individual players, rushing plays are compared to other rushing plays, passing
plays to other passing plays, tight ends get compared to tight ends and wideouts
to wideouts.
Imagine two running backs who each gain three yards. Player A gains three
yards under a set of circumstances where the average NFL running back gains only
two yards (for example, third-and-1), it can be argued that Player A has a
certain amount of value above others at his position. Likewise, if Player B
gains three yards on a play where, under similar circumstances, an average NFL
back would be expected to gain five yards (for example, second-and-15), it can
be argued that Player B has negative value relative to others at his position.
Once we have all our adjustments, we can find the difference between this
player's success and the expected success of an average running back in the same
situation (or between this defense and the average defense in the same
situation, etc.). Add up every play by a certain team or player, divide by the
total baseline for success in all those situations, and you get VOA, or Value
Over Average.
Of course, the biggest variable in football is the fact that each team plays
a different schedule. By adjusting each play based on the defense's average
success in stopping that type of play over the course of a season, we get DVOA,
or Defense-adjusted Value Over Average. Rushing and passing plays are adjusted
based on down and location on the field; receiving plays are also adjusted based
on how the defense performs against passes to running backs, tight ends, and
wide receivers. Defenses are adjusted based on the average success of the
offenses they are facing. (Yes, this is still called DVOA, for the sake of
simplicity.)
The biggest advantage of DVOA is the ability to break teams and players down to
find strengths and weaknesses in a variety of situations. In the aggregate, DVOA
may not be quite as accurate as some of the other, similar "power
ratings" formulas based on comparing drives rather than individual plays,
but, unlike those other ratings, DVOA can be separated not only by player but
also by down, or by week, or by distance needed for first down. This can give us
a better idea of not just which team is better but why, and what a team has to
do in order to improve itself in the future. You will find DVOA used by Football Outsiders
in a lot of different ways. Because it takes every single play into account,
it can be used to measure a player or a team's performance in any situation. All
Minnesota third downs can be compared to how an average team does on third down.
J.P. Losman and Kelly Holcomb can each be compared to how an average quarterback
performs in the red zone, or with a lead, or in the second half of the game.
Since it compares each play only to plays with similar circumstances, it
gives a more accurate picture of how much better a team really is compared to
the league as a whole. The list of top DVOA offenses on third down, for example,
is more accurate than the conventional NFL conversion statistic because it takes
into account that converting third-and-long is more difficult than converting
third-and-short, and that a turnover is worse than an incomplete pass because it
doesn't provide the opportunity to move the other team back with a punt on
fourth down.
One of the hardest parts of understanding a new statistic is grasping the idea
of what numbers represent good performance or bad performance. We try to make
that easy with DVOA, because it gets compared to average. Therefore, 0% always
represents league-average. A positive DVOA represents that the offense is more
likely to score, and a negative DVOA represents that the defense is more likely
to stop them. This is why the best offenses have positive DVOA ratings
(Indianapolis in 2005: +26.9%) and the best defenses have negative DVOA ratings
(Chicago in 2005: -21.8%).
Ratings for teams and starting players generally follow that scale, with the
best being around 30% and the worst being around -30% (opposite for defense).
However, because the baseline represents four years of play (2002-2005) no year
will average exactly 0%. Over the past four years, offensive levels have bounced
back and forth, so in 2002 and 2004 the league average was positive, and in 2003
and 2005 it was negative.
Team DVOA totals combine offense and defense, and the team total is given by
offense minus defense to take into account that better defenses are more
negative. (Special teams performance is also added, as described below.)
After dealing with DVOA for a few months, we had to deal with a strange tendency; well-regarded players, particularly those known for their durability, had DVOA ratings that came out around average. Players along these lines included Deuce McAllister, LaDainian Tomlinson (in 2002, not 2003), and Jeremy Shockey.
The problem is that DVOA doesn't take into account the value of a player being involved in a greater number of plays, even if his performance is league-average. A player who is involved in more plays can draw the defense's attention away from other parts of the offense. If that player is a running back, he can take time off the clock with repeated runs. And most importantly, nearly every player is a starter for a reason: he is better than the alternative.
Let's say you have a running back who carries the ball 300 times in a season. What would happen if you were to remove this player from his team's offense? What would happen to those 300 plays? Well, the player would not be replaced by thin air. This is why you have to compare performance to some kind of baseline; two yards is not two yards better than the alternative. On the other hand, while comparing players to the league average works on a per play basis, it doesn't work on a total basis because a player removed from an offense is not generally replaced by a similar player. Those 300 plays will generally be given to a significantly worse player, someone who is the backup because he doesn't have as much experience and/or talent.
To take this into account, we borrowed the concept of replacement level from Baseball Prospectus. Using a scale similar to the scale BP uses to determine baseball's replacement level, we've determined that a replacement level player has a DVOA of roughly -13.3%. (If you want to know why, it is explained in the original article introducing PAR.) Instead of determining value by comparing each play's "success value" to the average, as in DVOA, each play is instead compared to a number roughly 13.3% below the average success value of similar plays. That gives us value over a replacement level player, a better representation of a player's total contribution to his team on all his plays.
Actually, while in general replacement level is -13.3%, technically it is different for each position depending on whether we are measuring passing, rushing, or receiving. And, of course, the real replacement player is different for each team in the
NFL. (Kansas City started 2005 with Larry Johnson as the backup running back,
while Houston had Vernand Morency. Big difference there.) No starter can be blamed for the poor performance of his backup,
so we create a general replacement level for use across the league.
Of course, giving a number of "success value points over replacement level" would be fairly useless to the average fan and even the non-average fan.
Ben Roethlisberger was worth 119.5 success value points over replacement in
2005, you would have no idea what the heck we were talking about. So we
translate those success value points into a number that represents actual
points. After working through statistics from the past five seasons, our best
approximation is that a team made up entirely of replacement-level players
would be outscored 407 to 260, finishing with a 4-12 record. Conveniently, this
is close to the average record of the last four expansion teams. But part of
the reason this team gives up so many more points than it scores is that it has
replacement-level special teams. Those replacement level special teams are
worth -27 points, making the actual baseline for determining offensive value
274 points (the baseline for defensive value is 394 points).
With a bit of math, it works out that each "success value point" over replacement level is worth about .48 actual points above this offensive baseline. We also adjust this number for the strength of the opponents each player has faced.
Now I can tell you that Ben Roethlisberger was worth 57.4 points more than a
replacement level quarterback in 2005, or 57.4 DPAR (Defense-adjusted Points
Above Replacement). Tom Brady was worth 104.0 DPAR, Kyle Orton was worth -38.9
DPAR, and so on.
HOW CAN A 16-GAME SEASON BE SIGNIFICANT?
Football statistics can't be analyzed in the same way baseball statistics are. After all, there are only 16 games in a season. Baseball has ten times more, and even the NBA offers five times more. The more games, the more events to analyze, and the more events to analyze, the more statistical significance.
That is true, but the trick is to consider each play in an NFL game as a separate event.
For example, Eli Manning played only 16 games in
2005, but in those 16 games he had 586 passing plays (including sacks) and 29 rushing plays (including scrambles) for a total of
615 events. Manny Ramirez in 2005 played in 152 games and had 650 plate
appearances. For the most part, a quarterback who plays a full season will have
almost the same number of plays as a baseball hitter who plays in most of his team's games.
A running back will have fewer plays than a quarterback, and wide receivers and tight ends will have even fewer. But there should still be enough plays with most starting running backs and receivers to allow for analysis with some significance. As an example, LaDanian Tomlinson ran the ball
339 times in 2005, and was the target of 77 pass targets (including incompletes), for a total of
416 plays. In general, a starting running back will have 375-450 plays over 16 games. Receivers are used a bit less, and therefore their stats are likely not as accurate. In general, starting wide receivers have 75-150 pass targets over a full season.
You need to have the entire play-by-play of a season in order to compute it, so it is useless for comparing players of today to players of history. As of this writing, we have processed
nine seasons, 1997-2005.
DVOA is limited by what's included in the official NFL play-by-play, so we can't say which teams have the best offensive DVOA when play-faking, or the best defensive DVOA against three-receiver sets.
Since play-by-play lists tackles, sacks, and interceptions, but not attempted tackles, or attempted sacks or interceptions, we don't have individual DVOA or DPAR for defensive players at this point. We're working on these issues with the Football Outsiders game charting project.
DVOA is still far away from the point where we can use it to represent the value of a player separate from the performance of his ten teammates that are also involved in each play. That means that when we say, "Larry
Johnson has a DVOA of 27.6%," what we are really saying is "Larry
Johnson, playing in the Kansas City offensive system with the Kansas City offensive line blocking for him and Trent Green selling the fake when necessary, has a DVOA of 27.6%."
With fewer situations to measure, the numbers spread out a bit more, so you'll see more extreme DVOA ratings for part-time players and for measurements of teams in more specific situations (for example, passing on third downs). The charts listing players in order of DVOA have cut-offs for number of attempts, because players with just a handful of plays end up with absurd VOA and DVOA numbers. (In 2002, for example, Henry
Burris had a -103% passing DVOA.)
Passing statistics include sacks as well as fumbles on aborted snaps. Receiving statistics include all passes intended for the receiver in question, including those that are incomplete or intercepted. At some point,
we hope to be able to determine just how much impact different receivers have on completes vs. incomplete passes, but various regression analyses make it clear that both quarterback and receiver have an impact on whether a pass is complete or not. The word passes refers to both complete and incomplete pass attempts.
Unless we say otherwise, all references to third down also include the
handful of rushing and passing plays that take place on fourth down (primarily
fourth-and-1).
The problem with a system based on measuring both yardage and yardage towards a first down, of course, is what to do with plays that don't have the possibility of a first down. Special teams are an important part of football and we needed a way to add that performance to the team DVOA ranking. Our special teams metric includes five separate measurements: field goals (and extra points), net punting, punt returns, net kickoffs, and kick returns.
The foundation of most of these special teams ratings is the concept that each yard line has a different value based on how the likelihood of scoring changes with better field position. In
Hidden Game, the authors suggested that the value of field position for the offense existed on a straight line with your own goal line being worth -2 points, the 50-yard line 2 points, and the opposing goal line 6 points. (-2 points isn't just the value of a safety; it also reflects the fact that when you are backed up in your own zone, you are likely going to see your drive stall, and you'll need to punt and give the ball to the other team in good field position. Thus, the defense is more likely to score next.)
We use a more refined set of values based on our research, but the idea is the
same.
The special teams ratings compare each kick or punt to the league average for based on the point value of field position at the position of each kick, catch, and return. We've determined a league average for how far a kick goes based on the yard line from where the kick occurs (almost always the 30-yard line for kickoffs, variable for punts) and a league average for how far a return goes based on both the yard line where the ball is caught and the distance that it traveled in the air.
The kicking or punting team is rated based on net points compared to average, taking into account both the kick and the return if there is one. Because the average return is always positive, punts that are not returnable (touchbacks, out of bounds, fair catches, and punts downed by the coverage unit) will rate higher than punts of the same distance which are returnable. (This is also true of touchbacks on kickoffs.) There are also separate individual ratings for kickers and punters that are based only on distance and whether the kick is returnable, otherwise assuming an average return in order to judge the kicker separate from the
coverage. For the return team, the rating is only based on how many points the return is worth compared to average, based on the location of the catch and the distance the ball traveled in the air. Return teams are not judged on the distance of kicks, nor are they judged on kicks that cannot be returned.
Field goal kicking is measured differently. Measuring kickers by field goal percentage is a bit absurd, as it assumes that all field goals are of equal difficulty. In our metric, each field goal is compared to the average number of points scored on all field goal attempts from that distance. The value of a field goal increases as distance from the goal line increases.
Kickoffs, punts, and field goals are then adjusted based on weather and altitude. It will surprise no one to learn that it is easier to kick the ball in Denver or a dome than it is to kick the ball in Buffalo in December. Because we do not yet have enough data to tailor our adjustments specifically to each stadium, each one is assigned to one of four categories: Cold, Warm, Dome, and Denver/Mexico. Beginning this year, there's an additional adjustment dropping the value of field goals in Florida and raising the value of punts in San Francisco.
Once we've totaled how many points above or below average can be attributed to special teams, another formula then transforms these numbers from points to DVOA so the ratings can be added to offense and defense to get total team DVOA.
There are three aspects of special teams that don't show up in our numbers because a team has little or no influence on them -- and yet, these plays do have an impact on wins and losses. The first is the length of kickoffs by the opposing team, because no matter how strong your return man is, you can't make the other guy kick it shorter. The other two are field goals against your team, and punt distance against your team. Research shows no indication that teams can influence the accuracy or strength of field-goal kickers and punters, except for blocks. And although blocked field goals and punts are definitely skillful plays, they are so rare that they have no correlation to how well teams have played in the past or will play in the future. Special teams ratings also do not include two-point conversions or onside kick attempts, which like blocks are so infrequent as to be statistically insignificant in judging future performance.
(Note: The Adjusted Line Yards formula was substantially overhauled in the summer of 2005.
Adjusted Line Yards in articles from 2003 and 2004 are based on a different formula and will look smaller.)
One exception to the use of DVOA/DPAR, and the use of "play success" instead of raw yardage, is the rating system for offensive and defensive lines. Actually, these are only measures of running plays, and of course the defensive numbers don't measure just the defensive line, but the whole front seven against the run.
One of the most difficult goals of statistical analysis in football is somehow isolating how much responsibility for a play lies with each of the 22 men on the field. Nowhere is this as obvious as the running game, where one player runs while up to nine other players -- including wideouts, tight ends, and fullback -- block in different directions. None of the statistics we use for measuring rushing -- yards, touchdowns, yards per carry -- differentiate between the contribution of the running back and the contribution of the offensive line. Neither do our advanced metrics DVOA and
DPAR.
We have enough data amassed that we can try to separate the effect that the running back has on a particular play from the effect of the offensive line (and other offensive blockers) and the effect of the defense. A team might have two running backs in its stable: RB A, who averages 3.0 yards per carry, and RB B, who averages 3.5 yards per carry. Who is the better back? Imagine that RB A doesn’t just average 3.0 yards per carry, but gets exactly 3 yards on every single carry, while RB B has a highly variable yardage output: sometimes 5 yards, sometimes –2 yards, sometimes 20 yards. The difference in variability between the runners can be exploited to not only determine the difference between the runners, but the effect the offensive line has on every running play.
We know that at some point in every long running play, the running back has gotten past all of his offensive line blocks. From here on, the rest of the play is dependent on the runner's own speed and elusiveness, combined with the speed and tackling ability of the defensive players. If Tiki Barber breaks through the line for 50 yards, avoiding tacklers all the way to the goal line, his offensive line has done a great job -- but they aren't responsible for most of that run. How much are they responsible for?
For each running back carry, we calculated the probability that the back involved would run for the specific yardage on that play, based on that back’s average yardage per carry and the variability of their yardage on every play. We also calculated the probability that the offense would get the yardage based on the team’s rushing average and variability without the back involved in the play, and the probability that the defense would give up the specific amount of yardage based on its average rushing yards allowed per carry and variability. For example, based on his rushing average and variability, the probability in 2004 that Tiki Barber would have a positive carry was 80% while the probability that Giants would have a positive carry without Barber running was only 73%.
Yardage ends up falling into roughly the following combinations: Losses, 0-4 yards, 5-10 yards, and 11+ yards. In general, the offensive line is 20% more responsible for lost yardage than it is for yardage gained up to four yards, but 50% less responsible for yardage gained from 5-10 yards, and not responsible for yardage past that. Thus, the creation of Adjusted Line Yards.
Adjusted Line Yards take every carry by a running back and apply those percentages. (We don’t include carries by receivers, which are usually based on deception rather than straight blocking, or carries by quarterbacks, which are generally busted passing plays except in Atlanta.) Those numbers are then adjusted based on down, distance, and situation as well as opponent (similar to DVOA) and then normalized so that the league average for Adjusted Line Yards per carry is the same as the league average for RB yards per carry (in
2005, 4.07).
Runs are listed by the NFL in seven different directions: left/right end, left/right tackle, left/right guard, and middle. Further research showed no statistically significant difference between how well a team performed on runs listed middle, left guard, and right guard, so we also list runs separated into five different directions. Note that there may not be a statistically significant difference between right tackle and middle/guard either, but until we can research further (and for the sake of symmetry) we do still split out runs behind the right tackle separately.
The system is far from perfect. We don't know when a guard is pulling and when a guard is blocking straight ahead. We know that some runners are just inherently better going up the middle, and some are better going side to side, and we can't measure how much that impacts these numbers. We have no way of knowing the blocking contribution made by fullbacks, tight ends, or wide receivers.
Other numbers we use to measure the running game:
- 10+ Yards gives the percentage of the team's rushing yards that come from double-digit runs, past the first 10 yards of each run. So for a 15-yard run, five yards are counted; for an 80-yard run, 70 yards are counted. This number gives you an idea of how much of a team's running game was based on the breakaway speed of the running backs -- not to mention the opportunity provided by getting past the front seven with a lot of field in front of you. After all, you can only run 80 yards if you're on your own 20. This number is not adjusted in any way.
- Power success measures the success of specific running plays rather than the distance. This number represents how often a running attempt on third or fourth down, with two yards or less to go, achieved a first down or touchdown. Since quarterback sneaks, unlike scrambles, are heavily dependent on the offensive line, this percentage does include runs by all players, not just running backs. This is the only stat given that includes quarterback runs. It is not adjusted based on game situation or opponent.
- Stuffed measures the percentage of runs that result in (on first down) zero or negative gain or (on second through fourth down) less than one-fourth the yards needed for another first down. Note that this is slightly different from the definition of "stuffed" used by STATS, Inc.
The stats section of our website also features drive
stats compiled by Jim Armstrong. These stats are computed from NFL Drive Charts
and are not adjusted for strength of schedule or situation. Take-a-knee drives
at the end of a half are discarded. Drive stats are generally self-explanatory, giving each team's total number
of drives as well as average yards per drive, points per drive, touchdowns per
drive, punts per drive, and turnovers per drive, interceptions per drive, and
fumbles lost per drive. LOS/Drive represents average starting field position
(line of scrimmage) per drive from the offensive point of view. Drive stats are
given for offense and defense, with NET representing simply offense minus
defense.
Our data may differ slightly from official NFL numbers due to discrepancies in different play-by-play reports. In addition, we've adjusted clock plays, with kneels no longer counting as rush attempts and spikes no longer counting as pass attempts. We also count most aborted snaps as passing plays, not rushing plays, unless the play-by-play specifies that the play was an aborted handoff.