How much do we tend to know after five weeks? Bill Connelly compares five-week data to full-season data to find out if we should be worried about TCU and Baylor.
17 Feb 2004
by Aaron Schatz
You may notice that the DVOA numbers given in the Top 5 boxes on the left side of the page are different now than they were before our recent unexpected hiatus. That's because I've been working on refinements to the DVOA method to make it better reflect the true quality of NFL teams, through correlation to wins as well as points.
There have been five major changes to the DVOA numbers:
1) The play-by-play logs for both 2002 and 2003 have been cleaned up to reflect official NFL statistics as accurately as possible. One big change related to this is that a number of turnovers that previously were not counted in the 2002 numbers are now taken into account. Because these were fumbles lost on "aborted snaps," I missed them when analyzing rushing and passing plays; however, in 2003 I began to mark aborted snaps as pass plays (or, on the rare occasion the abort was on a handoff, rush plays), so they are now included in the DVOA ratings.
2) The baseline variables for what success is expected based on the situation (down, distance, time of game, etc.) have now been changed to reflect the NFL average for both 2002 and 2003. Hopefully, this leads to a more accurate portrayal of how much each individual play compares to average because we are now drawing the concept of average from a pool of plays twice the size. Incidentally, 2002 as a whole has a VOA of +1.4%, while 2003 as a whole has a DVOA of -1.4%, not counting special teams. Speaking of which...
3) Special teams are now twice as strong as they were before. I simply had them too low before, and doubling their strength improves the correlation between VOA and both wins and points. One problem, however, is that special teams seem to have been much more important in 2003 than in 2002. If I was only looking at 2002, it would make sense to increase special teams by only about 50%. If I was only looking at 2003, it would make sense to more than triple the value of special teams. Why the effect of special teams was so different from year to year bears study, and once we have the 2001 numbers broken down we'll have a better idea of which year was the aberration. I decided to err on the side of caution and only double the special teams number for now.
Just to give you another idea of how 2002 and 2003 were very different, let's take VOA completely out of the discussion. If you aren't a math person, I'll tell you that a higher correlation coefficient represents a greater connection between two variables. The correlation between point differential and wins in 2002 was .916. The correlation between point differential and wins in 2003 was .904. So this year, there seems to have been more "luck" involved in the standings than in the year before, if you consider the idea that the teams that score more points than their opponents should win the most games. This helps explain why the VOA formulas, which are supposed have a relationship to scoring, did not do as good a job predicting winners in 2003 as they did in 2002, especially when all season long they were based on variables created from the 2002 play-by-play logs.
4) The "forest index" used for the statistic ESTIMATED WINS has been adjusted a little bit to reflect the variables that led to winning in both 2002 and 2003, not just 2003. Ian and Russell will enjoy this -- using the original "forest index" formula, the 2002 Buccaneers were estimated to win 16.5 games. That was a good indication something had to be fixed.
You may notice the new formula now gives Kansas City a slight edge over New England in estimated wins. This is both a good indication of how much Kansas City faded at the end of the season, and a good indication I still am not doing enough to blunt the effect on the formulas of an obscene blowout like Kansas City over Buffalo.
5) Finally, the 2002 data now looks like the 2003 data, including separate pages for offense and defense, a listing of estimated wins using both "forest index" and Pythagorean theorem, and weighted DVOAs to show you which 2002 teams got better as the year went along. Kansas City and Minnesota both had better weighted DVOAs, which might be an indication they were ready to play better in 2003. Then again, so did Pittsburgh, so we'll have to work on that theory.
I'll be refining and fiddling with the VOA formulas all throughout the offseason. When changes are made, the top of the VOA pages will reflect what date the stats were last adjusted to reflect newer, better fixes to the VOA system.
One more note: Indianapolis was the "Carolina of 2002." Although they didn't have the same amazing improvement in the playoffs, the Colts were the clear outlier in the DVOA system, ranking #17 despite going 10-6. I would be interested in hearing theories as to why this might be.