Drew Stanton's 2014 season: a winning PowerBall ticket published on a four-leaf clover sitting atop a mound of horseshoes and rabbit's feet.
02 Nov 2004
by Aaron Schatz
The original version of the VOA (Value Over Average) rating system included only offense and defense. When Kansas City and Carolina got out to fast starts last season, fueled in part by great special teams, I developed a system to turn special teams into VOA that could be added in with offense and defense to get a more accurate picture of a team's performance. When I finally wrote an article describing all the special teams methods, I called it the Special Teams Manifesto.
Since last season, there have been a number of changes to the special teams method, and I constantly refer to them but I've never really explained them. This is the long-promised, long-awaited explanation of all those changes. But first, let's go back to the beginning and explain the basic ideas behind how the special teams values are derived.
The DVOA system gives each play on offense and defense a "success value" rather than simply judging plays by yardage, to take into account that the goal of each play is both to get closer to the goal line and to get closer to a new set of downs. But the same isn't true on special teams, where each play has just one single goal. Either you want to get the ball through the uprights, you want to kick the ball really far, or you want to return the ball for as many yards as possible.
For kickoffs and punts, as well as returns, yardage is translated into points using a method that gives each yard line a point value based on the average next score for an NFL offense from that point on the field (as you'll see below, the values have changed slightly from the ones in the article linked). Behind the 25-yard line, this value is actually negative, taking into account the fact that when you are near your own goal line, the average next score belongs to your opponent.
Punters and kickers are judged based on the difference in point value between each kick and an average kick from that position on the field. Kick returners are judged based on the difference in point value between each return and an average return from the spot where they picked up the ball. Punt returners are judged in the same way, but their baseline takes into account both the location of the punt and the location of the catch.
Field goals work a bit differently, comparing each field goal to the league-average percentage of field goals from that distance. A 30-yard field goal is worth .36 points above average. A 40-yard field goal is worth .77 points. A 50-yard field goal is worth 1.35 points.
For the purpose of measuring teams, we use five elements of special teams: field goals, punts, punt returns, kickoffs, and kick returns. Punts, punt returns, and kickoffs are based on net yardage, both the length of the kick and the length of the return. Kickoff returns, however, are judged on the return only, since you can't really block a kickoff or force the kicker to alter his angle because you have a nice kickoff rush going.
There's a more detailed explanation of everything in the original Special Teams Manifesto article, and rather than just regurgitate I decided to link. So here's the original discussion of kickoffs, punts, and field goals. Now, a rundown of all the improvements that have taken place between then and today. Let me warn you, there's some math ahead, so be prepared.
The value of kickoffs and punts in the original special teams method was based on scoring in the NFL in 2002 only. In the new method, the value of field position is based on scoring in the NFL from 2000-2003. This has caused a couple of changes in the general shape of the curve that represents "expected next score." First of all, expected next score is now lower from pretty much every place on the field. It turns out that an abnormal number of drives in 2002 ended in touchdowns rather than field goal attempts. Here is the ratio of touchdowns to field goals for the past five seasons as well as 2004 through Week 7:
|Year||TD to FG ratio||Year||TD to FG ratio|
Since 2002 was a bit abnormal, the expected next score table, now based on multiple seasons, is a little closer to reality. The other difference sounds a bit strange at first. It turns out that the lowest expected next score is not at the one-yard line, with expected next score rising with every yard line until you get to the other goal line. Instead, the lowest expected next score is actually around the five-yard line. With just one year of data, this didn't make much sense, there were only a couple of plays on the one- or two-yard line anyway. With four years of data, though, the pattern is pretty clear. In general, there is actually less of a chance that your opponent will get the next score if you are pinned right up to the goal line than if you have a couple of yards to work with behind you. It makes sense if you give it some thought; right up against the goal line, no team is going to take any chances, but once you get to the four- or five-yard line, you might try passing the ball, or run a pitch instead of a handoff risking a more likely penalty. That increased chance of a turnover means that expected next score is lower (or, more accurately, that it is higher in favor of the defense). Anyway, this doesn't have much effect on the value of special teams but it is a fact of the new method and probably something worth looking at more in the future.
The original kickoff method had a major flaw, discovered by Jim Armstrong (a.k.a. the guy behind drive stats). All kickoffs were measured by comparing the end point of the kickoff return to the end point of an average kickoff return. This meant that all touchbacks were measured as if they were kickoffs that went to the 20-yard line and then stopped with no return. Since the average kick from the standard 30-yard line lands between the 10- and 11-yard line, this turned a touchback into a below average kick for the purpose of measuring gross kickoff value. Oops. In reality, while the average kick lands between the 10- and 11-yard line, the average kick also gets returned to a point between the 28- and 29-yard lines. The value of a touchback is that it can't possibly be returned, and so the kicker has put his team in a good position even if it isn't quite as good as the rare times that a kick returner gets stopped before the 20-yard line.
This change also affected kickoffs which went out of bounds, making them not quite as negative as they were in the original method. After all, just as a touchback is similar to a kick to the 20-yard line that can't be returned, an out of bounds is similar to a kick to the 40-yard line that can't be returned. That's a bad result, but not as bad as if a kickoff landed on the 60-yard line and actually was returned -- which is how the original method was judging out of bounds kickoffs.
The unforeseen side effect of this change was a drop in how important kickers are to the value of the kickoff according to our methods. The spread between the value of the best and worst kicker in the NFL in 2003 using the original method was 45.6 points; using the new method it's 19.0 points. That's because a longer kickoff, unless it is a touchback, has a longer average return than a shorter kickoff. The difference in value in between a kickoff that lands at the 5-yard line and a kickoff that lands at the 25-yard line is much larger than the difference in value between the end of the average return of a kickoff that lands at the 5-yard line and the end of the average return of a kickoff that lands at the 25-yard line.
Here's a list of the top five kickers in 2003 based on the old method and then the method that fixes the touchback problem (which, as you'll see in a minute, isn't the final version of the new method):
|Top Kickoffs 2003, Old Method||Top Kickoffs 2003, Touchbacks Fixed|
Mare has so many touchbacks that the original method had his kickoffs worth 9.0 points below average. This is much, much more accurate. I should add that I did not make this touchback change in the punt formulas, because unlike a kickoff that is almost a touchback, a punt that is almost a touchback (i.e. to the two- or three-yard line) is very rarely returned anyway. I may look at using this touchback fix in the punt equations over the offseason to see if I am wrong and it does in fact make a difference.
In the first special teams article, I wrote, "I hope to re-do the punt return stats so it uses two variables for the baseline, both the line of scrimmage of the punt and the location of the catch." And, in fact, this is now the case with measuring punt returns. Not much more to say about it. Drive safely. Tip your waitresses.
As soon as I began to use the special teams method in the middle of 2003, I noticed that special teams were performing much better than the baseline numbers for 2002. To give an example, at this point last season I had 22 of 32 teams listed with positive special teams ratings. Were my baseline numbers way off? Had special teams all improved dramatically in one year? Or did weather, perhaps, play a major role in special teams performance that wasn't reflected in early-season numbers? As the season went on, and special teams ratings dropped for the league as a whole, it looked like the weather was definitely the culprit.
Another reason to believe that weather had something to do with special teams numbers: look at the list of the top kickers. You'll notice a warm weather kicker, an altitude kicker, and two dome kickers. Each year, in fact, nearly every dome kicker showed up in the top half of the league. That matches the conventional wisdom which says that it is easier to kick in a dome.
So we had to take into account both weather and altitude in determining special teams performance compared to "league average," because league average included every single stadium both in the balmy Miami September and the snowy Buffalo December. I created four different groupings of stadiums, and then looked at the trend for each stadium type by week during the past four seasons. Here are the groups:
Unfortunately, it wasn't realistic to judge each game by its specific temperature and precipitation. I just didn't have that data for every game in my numbers, and I wanted as much data as possible to make sure I wasn't overestimating or underestimating the importance of weather and altitude. It's bad enough to try to judge the influence of that thin Denver air from just 32 games over the past four seasons, and I can only use half the plays in those games since we want to measure the league instead of the quality of Jason Elam, but there's clearly such a huge effect from the altitude that Denver had to have its own category. There's also the problem where some stadiums are right in the middle between Cold and Warm -- Kansas City, Oakland, San Francisco, and Tennessee -- and Houston is sometimes a dome, sometimes not. For now, Houston is always categorized as Dome, and I made KC "Cold" and the other three "Warm" but may create a "Mid" category before next season.
Even without getting that specific, however, there are clear trends that can be used to make adjustments which create a level field for comparing kickers and punters. This next table will probably give you a headache, but bear with me. This table represents the average value of all kickoffs from 2000-2003 for each week of the season, separated into the four stadium categories. The dotted lines are the actual numbers, while the solid lines represent the trends, in matching colors. (Denver is orange, of course, and partially off the table because with many fewer plays in the sample the numbers vary wildly.)
As you can see, the extra length of kickoffs in domes and Denver is enough to raise the league-average value of a kickoff far above the actual average in outdoor, sea level stadiums. Meanwhile, the value of the average kickoff drops faster in the cold weather cities than the warm weather cities. As a result, I now adjust kickoff values by week and stadium type. That trend where kicks actually travel farther in Denver later in the year seems a bit counterintuitive -- you will see in a moment that the opposite happens with punts -- so the adjustment for Denver, like for domes, is the same each week. Every Denver kickoff gets a penalty of .126 points. Every dome kickoff gets a penalty of .045 points. That same penalty goes for Week 1 kickoffs in all outdoor stadiums as well, but as the season goes forward those kickoffs begin to get a bonus adjustment instead of a penalty, until by Week 17 the warm weather kickoffs are getting a bonus of .03 points apiece, and the cold weather kickoffs a bonus of .12 points apiece.
Here's another example, with the average value of all punts from 2000-2003 for each week of the season, separated into the four stadium categories.
Basically the same story, except that the difference in field position value is smaller with punts, I use slightly curved trendlines for the outdoor stadiums, and with punts it becomes clear that cold weather makes punting harder in Denver just like any other outdoor stadium, even if on the whole punts travel farther there.
Even more confusing and requiring a lot of massaging on my part were the adjustments for field goals, because weather and altitude don't affect field goals of all distances in the same way. Short field goals only seem to be affected by cold weather, not warm. Denver has a much stronger effect on long field goals than short and medium range ones. It's a complicated set of adjustment values, but it works.
To demonstrate, here's that list of the top kickers from 2003 with the method that fixed the value of touchbacks. Now, however, we'll look at the top ten before and after adjustments for weather and altitude. The adjustments are based on the stadium of each game, not the home stadium of each player, so Micah Knorr is only getting penalized for altitude eight games a year, Mare gets a bonus for a December game in Foxboro, etc.:
|Top Kickoffs 2003, No Weather||Top Kickoffs 2003, Weather Adjusted|
Yes, the thin air of Denver means that much, enough to drop Micah Knorr off the top kickoffs list
The final adjustment that needed to be made to the special teams method was actually noticed by my good friend Roland Beech at Two Minute Warning. The new method changed the baseline values for all kicks, punts, and returns so that they were based on the NFL averages from 2000-2003 rather than just for 2002. The problem with this change is that special teams didn't quite work the same for all four years of this period.
You may be familiar with the idea of the "K-Ball," a special ball introduced by the NFL in 1999 only for kicking plays, so that all kickers were using a standard ball out of the box instead of being able to doctor to the balls to their own styles. For the first three years of the K-Ball, however, the NFL wasn't really very good about enforcement. That changed in 2002, and so did the length of kickoffs and punts. These numbers are from my spreadsheet, which counts touchbacks on kickoffs as 70 yards no matter how far into the end zone the kick, so the actual difference is a bit higher than this:
|2000||63.2 yards||41.8 yards|
|2001||62.8 yards||42.1 yards|
|2002||62.0 yards||40.8 yards|
|2003||62.6 yards||41.3 yards|
This problem meant that kickers in 2000-2001, in total, were worth more "points over average" than kickers 2002-2003, and the same with punters, because the average was actually different. So an adjustment was made that fixed this issue by considering pre-2002 kicks slightly different than post-2002 kicks. For kickoffs, I created two entirely different sets of baselines for kicking, one for 2000-2001 and one for 2002-2003. For punts, I had to create a general adjustment, since doing separate 2000-2001 and 2002-2003 baselines would mean much smaller statistical significance from each specific line of scrimmage where punts take place. The 2004 special teams numbers reflect this adjustment.
Strangely enough, the new K-Ball enforcement in 2002 doesn't seem to have affected field goals at all, so there's no adjustment there.
First of all, do special teams numbers include opponent adjustments? Not yet. I've thought about it and played around with it. Last year was a pretty good example of why you might want to use them -- shouldn't the kickoff coverage units of the teams in the AFC West get a little adjustment to make up for facing Dante Hall twice? But when I played around with opponent adjustments, they actually made special teams correlate worse from year to year, which seemed awful counterintuitive. (Remember, the opponent adjustments improve the year-to-year correlation of offense and defense.) Because of that problem, and the fact that there are so few special teams plays to begin with, I've held off on using opponent adjustments on kickoffs and punts for the time being.
A second way in which the special teams ratings currently differ from offense and defense ratings is that all fumbles are not considered equal, no matter which team recovers them. The reason is that there is a major difference between a punt that gets muffed and then immediately picked up and returned as if nothing happened, and a punt that gets fumbled 20 yards into the return. Until I can go back and differentiate between these different types of fumbles, I can't punish teams for plays where a fumble is recovered by the return team.
A third major problem is the issue of hang time. As we all know, there is a lot more to just punting the ball than getting distance. The team ratings for punts and punt returns are based on net yardage, so they take everything into account: the punt rush, the punt itself, hang time and distance, the return, blocking, and pursuit. However, if we want to split out the punter from the coverage team in order to get a sense of the value of the punt itself, right now that can only be based on distance because only a couple of stadiums have official scorers who include hang time in their play-by-play logs. We just have to accept that a team that ranks with a great punt coverage unit may, in fact, be getting this value partially because of a punter with good hang time and placement rather than good distance.
Other outstanding issues include what to do about onside kicks, two-point conversions, and laterals/handoffs on returns.
OK, let's put it all together again. The special teams table in the JUST THE STATS section shows the points over or under NFL average for all 32 teams for each of the five aspects of special teams counted by Football Outsiders. Then those five categories are added together, and multiplied by a coefficient (it changes depending on number of games played) to get a DVOA percentage that can be added into offense and defense. Let's look at an example. This is Arizona through Week 7:
That's right, folks. Neil Rackers, the patron saint of the Loser League, currently ranks as one of the top kickers in football. This is about as likely as the Boston Red Sox coming back from 3-0 to beat the New York Yankees in four straight games and then win the World Series. (um...) The total between these five categories is 1.7 points, which multiplied by the coefficient for six games becomes 0.8%. Now we can add it into offense and defense.
What are the last two numbers? NON-ADJUSTED VOA represents the numbers without the adjustments for weather and altitude. The final number w/HIDDEN includes the impact of two plays that a team has virtually no control over: the length of kickoffs against them (since you can't exactly rush or block a kickoff) and the accuracy of field goals against them (because blocked field goals are fairly random). Some of us refer to this as "luck," or one element of luck anyway. Denver and the dome teams will always look pretty unlucky because this is also a non-adjusted number.
For fun, here are the totals after eight weeks of the 2004 season for all the different cuts of special teams that we can measure, both team and individual. These numbers all represent the weather-adjusted numbers.
Here are the best five teams and worst five teams in 2004 for net kickoff value, including both the kick and the return:
|BEST NET KICKOFFS||WORST NET KICKOFFS|
Now we'll separate the kickoff into its component parts, the kick and the return.
|BEST KICKOFF KICKS||WORST KICKOFF KICKS||BEST KICK RETURNS ALLOWED||WORST KICK RETURNS ALLOWED|
The fact that Kansas City and Houston are both in the top five for kicks and for kick returns allowed does make you wonder about the importance of hang time -- which, as I mentioned before, we don't have numbers for.
What about the other side of the kickoff? Here were the best and worst kickoff return teams. Touchbacks and out of bounds kicks are not included in this statistic.
|BEST KICKOFF RETURNS||WORST KICKOFF RETURNS|
Wow, could this be the year for Tampa Bay to finally take one all the way? They've looked good so far.
While most teams have the same player kicking the ball off all year -- Wes Welker aside -- there are multiple players making kickoff returns, and a ranking of the top kickoff returners will look different from a ranking of the top return teams. Here are the top five and bottom five kickoff returners for 2004, based on points over average.
|BEST KICKOFF RETURNERS||WORST KICKOFF RETURNERS|
Yes, that's Mwelde Moore, who has been so good rushing and receiving. Not so good returning kicks.
As I mention above in the section about reading the special teams stats table, the fact that our kickoff return statistic does not take into account the length of the kick, except as a basis for judging the baseline average return, means that the quality of kickoffs against a team makes up a "hidden indicator" that can help explain past wins and losses. Here are the teams that faced the best or worst combined kickoff quality from the opposition before their returners could even touch the ball. (The numbers on this page don't include the weather adjustment, but the numbers below do.)
| GAINED FIELD POSITION
DUE TO BAD OPPONENT KICKS
|LOST FIELD POSITION
DUE TO GOOD KICKS AGAINST
Oh, gee, do you mean Jacksonville has had good luck? Shocker.
On to punting. Let's take a look first at the best net punting teams in the league, along with the main punter.
|BEST NET PUNTING||WORST NET PUNTING|
|CLE (D.Frost)||+7.9||KC (S.Cheek)||-13.4|
|NO (M.Berger)||+6.9||ARI (S.Player)||-12.9|
|CHI (B.Maynard)||+6.9||WAS (T.Tupa)||-6.2|
|TEN (C.Hentrich)||+5.6||DEN (M.Knorr)||-5.2|
|IND (H.Smith)||+5.5||SF (A.Lee)||-3.8|
This includes blocked punts and aborted punts. As I mentioned in the first part of the Special Teams Manifesto, I'm not sure what to do about blocked punts. How much responsibility does the punter himself have for getting his punt blocked? It's a question to be answered in the future.
Here's a look at the best and worst punters for distance:
|BEST PUNT DISTANCE||WORST PUNT DISTANCE|
How about preventing punt returns. This list includes returns, fair catches, and downed punts, but not touchbacks or blocked punts, representing the five best and worst punt coverage teams in the league:
|BEST PREVENTION OF
|WORST PREVENTION OF
Here are the best punt return teams in the league by net punt returns, including blocks, downed punts, touchbacks, and fair catches.
|BEST NET PUNT RETURNS||WORST NET PUNT RETURNS|
And here are the best and worst individual punt returners -- this time, not including non-returned punts like touchbacks and fair catches:
|BEST PUNT RETURNERS||WORST PUNT RETURNERS|
Finally, field goals:
|BEST FIELD GOAL KICKERS||WORST FIELD GOAL KICKERS|
While I don't think field goals kicked against a team say anything about that team's future performance, they are just as important as field goals kicked for the team when it comes to explaining past wins and losses. Here are the top five and bottom five teams this year in this hidden indicator, field goals kicked against (with the value lost or gained by the listed team):
| OPPONENTS WITH WORST
FIELD GOAL KICKING
| OPPONENTS WITH BEST
FIELD GOAL KICKING
3 comments, Last at 02 Sep 2006, 6:00am by apply auto loan online