Looking back at FEI's preseason projections, we find that most teams did about what they were supposed to do -- but not in the Big Ten, where things got screwy.
02 Oct 2003
by Aaron Schatz
In the months since Football Outsiders came online, we've received a lot of comments and questions about our VOA rating system. One question comes up again and again, however: Why are some very well-regarded players rated as average or near-average using our statistics? Do we really believe that Jeremy Shockey is an average pass-catching tight end, or that Kenny Watson and Moe Williams were more valuable last year than Ricky Williams and LaDainian Tomlinson?
We've been a bit troubled by this too. Comparing players to the mean league performance in a given situation makes a lot of sense, but there are a lot of well-regarded players who come out pretty ordinary. But while each individual Ricky Williams run might result, on average, in the same amount of success as a run from an average back, an average back isn't carrying the ball 35 times a game -- running down the clock, wearing out the opposing linemen, and taking the defense's attention away from the passing game to allow Jay Fiedler more success.
Therefore, we're going to take a page from the folks at Baseball Prospectus and introduce a concept called replacement level. I hope they don't mind. The idea of replacement level says that when a regular player gets injured, he isn't usually replaced by an average player; all the average players are starting for other teams. He gets replaced by a replacement level player. In baseball, that's a minor leaguer or bench scrub; in football, that's a backup quarterback riding the bench, or a free agent some other team dropped in preseason, or a fourth receiver who suddenly finds himself playing opposite Randy Moss.
So now, an average player who can be used repeatedly -- thus opening up other parts of the offense and gaining yards on a regular basis -- becomes more valuable. Because if you lose him, you aren't replacing him with a similar player. You're replacing him with a significantly worse player.
How high do we place the replacement level? That's a very good question, and one that may take a lot of research. For now, however, I'll use the baseball replacement level as a guide. I'm about to explain how this all works. If you are allergic to math, skip down a few paragraphs to find out how our new statistic upgrades the rankings of players like Williams and Tomlinson.
Football Outsiders has an all-encompassing statistic that gives value per play, which we call VOA. In the same way, Baseball Prospectus has an all-encompassing statistic that gives value per plate appearance, Equivalent Average (EqA). For the most part, EqA numbers vary from .180 to .340 with .260 representing an average offensive baseball player. Replacement level is set at .230, which means that it is roughly 3/8 of the difference from an average player to the worst player.
So, how do we then figure out the replacement level for, to give an example, running backs? The average offensive player has a VOA of 0%, of course. Based on 2002 numbers, the VOA rating for a running back bottoms out at about -36.5%, or Jonathan Wells. Measure 3/8 of the distance from 0% to -36.5%, and you get a reasonable approximation of replacement level: -13.7% VOA.
In the VOA system, each play is given a "success value" which is then compared to the average success value of similar plays based on a number of variables (down, distance to go, score gap, direction, etc.). To determine value over replacement level, each play is instead compared to a number 13.7% below the average success value of similar plays.
The next step is translating the number of "success value points" over a replacement-level player into a number that represents actual points. After working through the stats, my best approximation is that a team made up entirely of replacement-level players (roughly 14% worse than average) would be outscored 417 to 285, finishing with a 5-11 record (using the Pythagorean theorem). But part of the reason this team gives up so many more points than it scores is that it has replacement-level special teams. Those replacement level special teams are worth -32 points, making the actual baseline for determining offensive value 301 points (the baseline for defensive value is 401 points). With a bit of math, it works out that each "success value point" over replacement level is worth .3841 actual points above this offensive baseline.
OK, PEOPLE WHO DON'T LIKE MATH CAN COME BACK NOW. Here are tables showing the top ten running backs in three different rushing statistics. First, our new stat DPAR showing how many points this player's rushing attempts created compared to a replacement-level running back. Second, the stat DV+ showing total "play success" value over average. Finally, the stat DVOA showing Value Over Average per play. Each stat has a D before it to show that it represents value adjusted for defensive opponents.
To help demonstrate how using Points Above Replacement changes the rankings of the top running backs, I've included the rank for two running backs who finished surprisingly high in DVOA -- Stacey Mack and Kenny Watson -- on all three charts. I've also included the rank for two running backs who finished surprisingly low in DVOA -- Ricky Williams and LaDainian Tomlinson -- on all three charts.
When we compare all of these running backs using DVOA, the top performers are backs who had great performances in limited playing time. Priest Holmes ran the ball three times as often as Moe Williams or Stacey Mack, making his high DVOA rating all the more impressive. When we compare these running backs using DV+, which represents total value over an average back, players with more rushing attempts begin to climb higher up the rankings. But Ricky Williams and LaDainian Tomlinson are still farther down, despite their high number of attempts, because they performed only slightly above average.
Now look at the table showing DPAR, or defense-adjusted Points Above Replacement. Those workhorse backs like Williams and Tomlinson have climbed up the rankings, demonstrating the importance of their ability to carry the ball over and over again. After all, take a star player like Ricky Williams or LaDainian Tomlinson out of the game, and the team usually can't replace him with an average player. They bring in a benchwarmer -- a replacement player -- and when you add up all those carries, the difference between a replacement player and a player who is even slightly above average is gigantic.
In fact, the Points Above Replacement rankings will move a below-average running back with plenty of carries ahead of a great running back who isn't used as often. The tables above show Stacey Mack ranks #16 in DPAR, despite one of the highest rates of running successful plays in the league. Just above Mack at #15 is Eddie George, who has a below average DVOA (-1.0%) but more than three times as many rushing attempts.
While comparing players to replacement level instead of average will boost average players up the rankings, it still doesn't do much for players whose performance, even in a high number of attempts, is significantly below average. A good example here would be Bills running back Travis Henry. In our IN FOCUS article about the AFC East, I noted that the Bills decision to draft Willis McGahee wasn't as strange as people assume. Henry's below-average, fumble-rific performance in 2002 was masked by a huge number of carries, not to mention a quarterback whose passing helped Henry get into position to score and make the highlight reel. Points Above Replacement reflects this as well. After adjusting for opponents, Henry ended up only 17.2 points over PAR in 2002. That ranks #30 among running backs with over 75 carries, which sounds to me like a player you might want to consider replacing.
(Yes, I said "over PAR," and yes, technically the word "over" and the word "above" are redundant, but we like the golf-like phrasing here.)
Here's another example of how Points Above Replacement can change the way we rank players. Remember Jeremy Shockey and his unexpectedly average DVOA rating of 0.0%? According to DVOA, he wasn't even one of the 20 best tight ends in football last year; he's down below such luminaries as Dustin Lyman and Matt Schobel. But rank last year's tight ends according to DPAR, defense-adjusted Points Above Replacement, and Shockey gets more credit for being such a frequent part of the Giants' passing game:
Shockey still isn't the top tight end in football -- Billy Miller is, by a very hefty margin -- but at least he's now in the top ten, and Tony Gonzalez also moves way up the charts.
So, now that we are going to rank players by PAR, does all this mean that VOA is no longer important? Not at all. We now have two important statistics that tell us two different things. Points Above Replacement rewards players that have a lot of success, but also rewards those who can get out there and catch pass after pass or make run after run, drawing the defense's attention and -- in the case of running backs --working away at the clock. Value Over Average tells us which players are the best on a per-play basis, in some cases highlighting players who could be even more valuable if given more opportunities (for example, Moe Williams or Ashley Lelie).
To put it in baseball terms, the question, "Who's more valuable, the player with the higher PAR or the player with the higher VOA?" is similar to the question, "Who's more valuable to a baseball team, a middle reliever who pitches lights out for 80 innings with an ERA under 2.00 or a starting pitcher who gives you 200 innings with an ERA of 4.00?" If you can turn that middle reliever into a starting pitcher with similar statistics, he'll be a lot more valuable to you. This, in effect, is what Minnesota has done with Moe Williams this year.
Our rankings by position are now ordered by PAR to show total value for the season so far, but the tables also include VOA to show which players have been the best per play. To give an example how those lists differ this season, here are the top 10 running backs after four weeks of 2003, in both PAR and VOA, minimum 16 carries, not yet adjusted for opponent quality:
Before we finish, I need to say a few things about these numbers and the methods that created them.
1) While I might say, for shorthand, "Priest Holmes was 41.3 points over PAR in 2002," what I really mean is that "plays where Priest Holmes ran the ball were worth 41.3 points over PAR to the Kansas City offense in 2002." In other words, these numbers in no way separate the performance of a running back from the performance of an offensive line. If you don't believe me, check out Stacey Mack, who has gone from one of the league's best backs behind the Jacksonville line to the league's worst back (-4.9 PAR) behind the Houston line. Stacey, your agent is a dumbass.
2) An important corollary to this is that PAR numbers for passes do not attempt to split value between the quarterback and the receiver. Unless the pass results an interception or fumble after the catch, the "success" of the play is attributed equally to both players before the variables are applied that determine Value Over Average. So don't try to add the numbers up to get the points that a certain team has scored so far, because it will look like all passes are worth "double credit" and you'll get a headache.
3) Remember what I said here about "field position being fluid?" That means that the Points Above Replacement do not necessarily translate into offensive points. They might translate into points prevented by the defense, because even if a long drive doesn't result in a score it will result in worse field position for the other offense. The number that translates "play success" into points is based on a team's point differential, not points scored and points allowed as separate numbers.
4) All of these numbers and equations are based on only one year of data. That includes the numbers representing replacement level and the multiple that translates "successful plays over replacement" into a number resembling actual points. Therefore they are subject to lots of future fiddling, particularly at the end of the season when I have two (or, if I'm lucky more) seasons to work with.
5) The names of all these statistics are also subject to change if I think of anything better or easier to understand. Feel free to make suggestions. .
1 comment, Last at 17 Nov 2005, 2:14pm by ernie cohen