In this week's Varsity Numbers, Bill Connelly takes a page out of baseball's playbook and attempts to isolate power from efficiency.
11 Jul 2008
by Aaron Schatz
When we announced the new version of our individual stats on Monday, it spawned a lot of discussion among Football Outsiders readers. Some people thought we had sold out. A few readers were worried that changing our nomenclature to yardage was a sign that we no longer were measuring performance based on situation and opponent. Others liked the new numbers and their names, filled with piratey goodness.
I'm not surprised there was a lot of discussion, because it took a lot of discussion to get to this point. Believe it or not, the change from DPAR to DYAR has been in the works for almost two years. These new statistics were bandied about on the Football Outsiders staff e-mail list numerous times. There was an ad hoc focus group that gave their thoughts on how we could best improve the accessibility of our stats. Instead of seeking advice from hardcore readers, I went to people who deal with the same issue of trying to explain complicated concepts to a wide audience -- other Internet sportswriters, guys like Joe Sheehan and Roland Beech and King Kaufman. Everybody had different ideas. We threw around the idea of putting everything on a 0 to 100 scale, but that's something you do with a rate stat, not a total stat. We played around with a "tweaked" DYAR that always looked exactly like actual yardage, with the best passing games worth roughly 400 yards. However, if you make replacement level zero yards, you have to make "average" something more than zero yards, and that leaves you with guys getting 160 DYAR on three pass attempts. Try adding that up for season totals and you get a total mess.
The reader discussion on the original post from Monday ended up somewhere around 200 comments, with people disagreeing with each other about how our stats are computed, what they mean, and what they should be named. If that's not a good example of how difficult it is to give an advanced football metric a name and form that everybody can agree on, I don't know what is.
Of course, it didn't help things that one of the new stats we introduced, "Equivalent Yards (EqYds)," didn't seem to actually show what we wanted it to show.
The idea behind "Equivalent Yards (EqYds)" was to create a simple number that an average fan could compare to standard yardage in order to see if a player was more or less valuable than his stats otherwise would indicate. Unfortunately, it turned out that "players with DVOA above 0% will have more Equivalent Yards (EqYds) than actual yards" was not true, and we were ending up with strange things like Adrian Peterson's 2007 season, with 1,344 rushing yards but only 1,060 "Equivalent Yards (EqYds)."
The reason is pretty simple. The original Equivalent Yards (EqYds) was just "success points" adjusted to look like yards. We wouldn't list guys by just success points without looking for opportunity, so thinking this was going to work was pretty darn stupid on my part.
The good news is that I went back to the drawing board, and it didn't take me long to come up with a newer version of Equivalent Yards (EqYds) that does what we want: provides a simple way to see if a player's performance was better or worse than his actual yardage total in that game or that season.
Our new version of Equivalent Yards (EqYds) is based on DVOA compared to league-average yardage. For example, the average running back over the past few years has gained 4.16 yards per carry, and Adrian Peterson's DVOA of 16.4% ends up as 4.85 yards per carry. Then, the new formula adjusts for the baseline on each play, so that guys who succeed in a situation with a low baseline (say, a first down on third-and-25) don't end up with 300 "true yards."
This means that the statement "players with DVOA above 0% will have more Equivalent Yards (EqYds) than actual yards" is still not correct. However, we do end up where players with DVOA above 0% will have Equivalent Yards (EqYds) per attempt over the league average of yards per attempt. That makes a lot of sense. The standard for a strong passing day stays 300 yards, for example, but now Football Outsiders readers are looking for 300 Equivalent Yards (EqYds), not 300 passing yards. When Jon Kitna throws the ball 50 times and ends up only completing half his passes, with 350 yards and three interceptions, he won't have 300 Equivalent Yards (EqYds). That's not a 300-yard game that really has the value we tend to associate with 300-yard games.
Here are a couple examples of how the new method works. I'm going to use examples from single games, rather than full seasons, because that's what Equivalent Yards (EqYds) are really designed to help readers understand. Here are eight quarterback games from last year that had roughly 250 standard passing yards. I think the new Equivalent Yards (EqYds) stats does a good job of showing which players really had good passing days and which players did not.
Here's another example, the top running backs from Week 9, the week Peterson broke the all-time rushing record. I'll run both the new and old versions of Equivalent Yards (EqYds), with the nine players who had 100 rushing yards and two others who are particularly interesting.
Note that Peterson still has fewer Equivalent Yards (EqYds) than actual yards in Week 9 -- because as good as he was, he wasn't 9.87 yards per carry good. He was stuffed on some third downs, lost yardage on three carries, and fumbled the ball once. That's not to say that this game was poor by any means. Peterson's totals of 253 Equivalent Yards (EqYds) and 91 DYAR still end up as the best of the year by a significant margin, and one of the top games of all-time.
Meanwhile, Robinson and Westbrook give a great example of how two games with similar yardage can be very different. Westbrook had a 63 percent Success Rate on the day against a good run defense, never losing yardage on a carry. Robinson had a 41 percent Success Rate against a poor run defense, with a fumble and four different carries that lost yardage.
Justin Fargas might give us the best example here of where Equivalent Yards (EqYds) can be useful. Fargas had a very consistent day, with nine carries between six and 11 yards. He had seven first downs and a touchdown, converting four out of five runs with 1-2 yards to go. Because he didn't have any really long runs, his total for the day was just 104 yards. 23 carries for 104 yards sounds like a pretty good day, but not a great one. 23 carries for 152 Equivalent Yards (EqYds) gives you a better idea of how valuable he really was to his team.
Hopefully, this new method for computing Equivalent Yards (EqYds) answers a lot of questions. (For example, the best tight ends no longer all have fewer Equivalent Yards (EqYds) than actual yards.) Now let's see if I can answer some of the other questions. These are all from the discussion thread on that Monday article.
Joe T.: So a red zone fullback would have his DPAR skewed in one direction, and his DYAR skewed in the other?
This question was one of many comments that seemed to misunderstand what we were doing by changing from "Defense-adjusted Points Above Replacement" to "Defense-adjusted Yards Above Replacement." The value of an actual yard didn't change. All our stats are still based on computing success towards both a first down and a touchdown, and comparing that success to a league-average baseline based on situation and opponent.
I should point out that the "success points" system is not actually based on finding out how often teams score from every specific down-and-distance at every time in the game, or anything like that. To be honest, it isn't quite that complex or exact. Reader Scott C. wrote:
Football Outsiders statistics start out by finding the value of plays, in points. It does this by looking at the down/distance/situation before and after a play:
How many points does a drive attain on average at first down and 10 from midfield?
How many points does a drive attain on average on second down and 5 from the opponent's 45?
These two point values differ slightly. That first down play for 5 yards in between those two points is worth a value in POINTS that is the difference between the values above. Thus, any play can be tied to expected value point differential. This needs further adjustments, but as far as I can tell, is the foundation of the statistics at FO.
Actually, Scott, it isn't the foundation of the statistics at FO. The "success points" system is just taken from The New Hidden Game of Football and then gradually tweaked over five years through trial and error to be as accurate as possible. It isn't as strict as the kind of system you describe, but it does its job.
Alex: Why can't we keep DPAR and bring in DYAR? What, is DPAR going to get jealous?
Once again, this comes down to accessibility to the general public. There's no reason to have two statistics that measure the exact same thing in almost the exact same way. That's confusing.
However, I have to admit that DPAR and DYAR are not quite exactly the same. In a quick note in the original Monday discussion thread, I said that changing from DPAR to DYAR was no different from changing the temperature scale from Fahrenheit to Celsius. I apologize, because after thinking further, I realized two things that changed with DYAR.
First, the new, improved version of DYAR is more accurate, because it accounts for the fact that the average situation faced by one player may be harder than the average situation faced by another player. The old method took total success points, subtracted the baseline on that play (or those plays), and then multiplied by a coefficient to get DPAR. The new method takes total success points and multiplies by a coefficient, and then subtracts the baseline multiplied by a slightly different coefficient in order to get DYAR. Therefore, the ratio between DPAR and DYAR will be different on each play. Those differences get smaller if you add the plays up into games, and even smaller if you add the games up into seasons, but they are still there.
Second, the translation of "success points" into yardage didn't end up with the same coefficients for every position. For an explanation, let's look at short-yardage situations. Last year, the average pass attempt on third-and-1 or fourth-and-1 gained 5.4 yards and was worth 1.5 "success points" according to the basic system at the foundation of DVOA and DYAR. The average running back carry on third-and-1 or fourth-and-1 gained 3.2 yards... but was worth 1.6 "success points." How on earth can passes gain more yardage and lead to less success? Because, "success points" are not based simply on gaining yards. They are an attempt to balance yardage with progress towards a first down, and the big successful event here is the first-down conversion -- which is more frequent on a run -- rather than the yardage.
That means that even if you ignore the small differences in the ratio of DPAR to DYAR caused by the improved accuracy of the new method, there are still larger differences in the ratio from position to position. Perhaps the better way to create DYAR would have been to come up with a universal translation from "success points" to yardage based on team totals, rather than figuring the equations separately for each position. I'm certainly willing to look into the possibility of making that change next offseason, but there really isn't the time to do the work now. (Even just re-doing Equivalent Yards (EqYds) put me behind on a lot of business things I needed to deal with this week.)
Patrick: Are we now double counting? In the olden days, the DPAR for a completed pass were divided between the QB and receiver, right?
No, DPAR have never been split between a quarterback and the receiver. It counts for both. We would like to one day be able to make such a split, but we have not done enough research. This is also one of the reasons why, although I do it in Quick Reads, I am hesitant to add together values in rushing and receiving (or for quarterbacks, passing and rushing) to get some sort of "total value." We don't know if a yard of DYAR rushing is equal to a yard of DYAR receiving, so we can't say that a player with 50 rushing DYAR and 20 receiving DYAR is equal to a player with 20 rushing DYAR and 50 receiving DYAR.
Which brings us to another question from Patrick:
Patrick: With DPAR (or VORP), you could take a replacement offense (maybe 150 points? â€˜06 raiders?), then add up all the DPAR from individual players and it would give you a â€˜predicted points scored'. With DYAR, can I take a replacement team yardage (4000-yard season maybe?) and add all the individual DYAR totals and get a predicted yardage? will their be a table for how many points a team with a given teamwide True Yardage would be predicted to score?
Actually, we never really meant for it to work this way, for a number of reasons. First, you couldn't add together everyone's DPAR because passing yards counted for both quarterbacks and receivers -- but you couldn't just use half of the quarterback's DPAR and half of the receiver's DPAR, because receiver DPAR didn't account for interceptions. You also had to deal with the fact that field position is fluid. That means that to approximate how well a team played compared to its DPAR stats, you would also have to consider the DPAR stats of all the offensive players who faced their defense -- except you would have to re-do DPAR so that "replacement level" represented more offense, not less offense, since a replacement-level defense would allow more points.
LearnFromTheMasters: Funny that Baseball Prospectus is mentioned in this article. While they provide stats like EQA and QERA (which are on the same scale as batting average and ERA, respectively), the stats they use for overall player comparisons are from the VORP/WARP category â€” expressed in runs ("points") and wins ("tens of points").
At least this guy mentioned EqA. Another reader wrote "[DYAR] is tantamount to baseball sabermatricians moving from a runs-based system to adjusted batting average just because old-school fans love BA." Apparently, he didn't know that Baseball Prospectus does precisely this. That's what Equivalent Average (EqA) is meant to be, "adjusted batting average." It measures all offense, not just hits per at bats, but it does it on a scale the average person understands, where .300 is good and .200 is bad.
Baseball Prospectus expresses things in terms of runs because that's how baseball people think. Players are listed with runs scored and runs batted in. How many points does a star football player generally create in a good season? Who knows? How many runs does a star baseball player generally create in a good season? Everybody knows the answer: 100. When I asked BP guys about changing our individual statistics to yards, they were surprised I didn't do that in the first place. Let's be honest -- a lot of people are attached to DPAR because that's what we have been doing for five years. If I had put this all in terms of yards back in 2003, it wouldn't have been very controversial.
As for putting things in terms of wins, the baseball people have been doing research far longer than we have, and they have ten times as many games to work with each season, so they have a much better model of how runs translate into wins. It also helps that the model can be simplified to "ten runs is roughly one win." Would it be as easy to understand if 23.6 runs was roughly one win? Maybe if you live in a society where everyone is born with 23.6 fingers and the language uses a base-23.6 number system or something.
Goo: Isn't the whole point to move beyond yards and think about what actually helps teams win games? You don't win by getting more yards than the opponent, you win by scoring more points, and that was one of the big appeals of DPAR... At least with DPAR I could subtract it from a team's point total or swap players and use the old pythagorean formula to run â€œwhat ifsâ€ with different players; how do you do that with DYAR?
This is the other problem with creating some sort of "wins over replacement" statistic. Baseball stats, as we all know, are much less dependent upon teammates than football stats are, plus there's no crossover between offense and defense. When Brian Urlacher intercepts a pass, Rex Grossman may get to start a drive already in field goal position, but Justin Upton does not get to start on second base if Brandon Webb strikes out the side. While game situation may slightly affect pitch selection, for the most part a Manny Ramirez home run has nothing to do with who is on deck or who is on first base. That's why baseball people can determine very specific values for each player, switch out one player for another and approximate how many wins it would add, run models that work out "contract dollars per win produced," and so forth. Football just doesn't work that way -- not yet, anyway. You can't take Randy Moss's stats from New England, stick them in Oakland's offense instead of Johnnie Lee Higgins, and say "this is how many points Oakland would have scored if they had not traded Moss."
This is all sort of wrapped up with a general problem that we have. To but it bluntly, there is a portion of the readership which seems to believe that our stats are more accurate than they really are, and they often try to use those stats to do things they weren't really meant to do. DYAR will give you a more accurate picture of how good players are when compared to players at the same position, and that's the goal of the individual statistics. We don't mean for people to use the individual stats to build team models.
Thok: Shouldn't it be Yards Adjusted for Replacement and Defense, a.k.a. YARD?
Yes, that exact name came up when we were figuring out what to call this stuff. Close your eyes and imagine that you are listening to me on the radio. Now you should understand why it can't be called "YARDs."
I understand the people who don't like the name "Equivalent Yards (EqYds)." Originally, my plan was to call them "Equivalent Yards," just like BP calls its stat "Equivalent Average." Then someone suggested "Equivalent Yards (EqYds)," and that name sounds so much simpler. But I'm willing to entertain suggestions on name changes before the season begins and we start putting this out there in Quick Reads (which will be moving from FOXSports.com to ESPN.com). Do people like "Equivalent Yards" better? I would rather something that is a name than something that is yet another acronym in our alphabet soup.
Ammek: Now that you are working from five years of data (2003-07), I wonder: has the value of Average in DVOA changed at all? If, to take an extreme scenario, passing success on first down increased by, say, 15 per cent in 2008, would DVOA adapt to that? Or does Average mean â€˜average for the era'?
Right now, all DVOA stats are done based on the same average baselines, which are based on 2002-2006 (team) and 2003-2007 (individual). (At a certain point, there are only so many plays I can fit in a spreadsheet, and it slowed things down too much to use 2002 in the individual baselines.) You are correct that this may create problems because the offensive environment and style of the league change over time. Separate baselines for each year are impossible, really, but eventually we may need to look into some sort of normalization process. I've thought about trying some sort of "rolling baselines" that change for each year based on the years before and after. However, what's interesting is that while there are small changes up and down each year, the DVOA Era can really be split into four parts based on major jumps in the offensive level: 1995, 1996-1999, 2000-2001, and 2002-2007. There was a huge drop in the offensive level in 1996, a jump in 2000, and another jump in 2002. Yes, 2004 was a little higher, 2005 a little lower, but in general, offensive level from 2002-2007 has been pretty steady.
Gat: Are kickers going to be ranked using DYAR instead of DPAR? That seemsâ€¦ stupid.
Actually, kickers have never been ranked using DPAR. The special teams stats are all based on scoring value compared to average, not replacement level. In addition, the methods for turning kickoffs and punts into an approximate number of points are much more accurate than the methods for turning rushing and passing into an approximate number of points, because we only have to worry about the value of field position, not the value of earning a new set of downs.
Chris: A lot of individual stats are dependant on the venue as a huge variable which doesn't effect either say the QB or the defense. Playing in a snowy windy game in Buffalo with 5 degree temps will hurt both QB's stats and help both secondaries.
This comment gets to the other thing I was supposed to fix this offseason. We talked a lot last year about adjusting DVOA for the weather, because passing games were making defenses look better than they really are and making quarterbacks look worse. You may be wondering what happened to this project. Well, I spent a ton of time on it in February and March. I looked the last few years of DVOA based on the weather in each game, both the wind speed and the precipitation. Here's what I found:
But most importantly...
With those kind of unclear results, I was simply uncomfortable adding the adjustment to all our stats. I don't want to make changes unless I feel confident they are improving things. So for now, we'll continue to note when bad weather may have affected DVOA, and we'll continue to hope that readers continue to read our stats through the filter of common sense, and we'll try to play with this again next spring.
Finally, a couple of questions about specific players.
Steelberger: Personally, as a Steelers fan, I find any system that ranks Sage Rosenfels ahead of Ben Roethlisberger (new DVOA) sort of ridiculous.
The reason is primarily opponent adjustments. Rosenfels went through the Gauntlet of Hell last year, with two games against the top pass defense in DVOA (TEN) as well as games against the pass defenses ranked second (SD), third (IND), fourth (TB), and eighth (JAC). He didn't play a game against a pass defense ranked below 19th.
Disco Dack: Quinn Gray is now top 10 in DVOA?
Yes, I know, that seems strange to me too. A lot of that comes from that meaningless final game against the Texans, but still, believe it or not, Gray has a positive DVOA in every single game from 2007 except one. That one, of course, was the Monday Night Football against Indianapolis when he replaced an injured David Garrard and couldn't hit the broad side of a barn. Let's be honest, for most of us, that's about all the Quinn Gray we really saw last year so all we remember is how bad he was in that one game.
Of course, just because Quinn Gray is in the top 10 in DVOA doesn't mean he was one of the top 10 quarterbacks in the league last year. You have to consider sample size. You have to consider how good the Jacksonville offense was around him. A nice DVOA in 155 attempts doesn't mean you have the stuff to be a starting quarterback in the National Football League. We call this the "Doug Johnson Effect."
With this now out of the way, early next week I'll get to the article on the best and worst quarterback seasons and games since 1995.
107 comments, Last at 29 Jul 2008, 6:20pm by Jim Glass