29 Sep 2010
(I'd publish this on my own site, only I don't have much of one and anyway I can't guarantee I'd maintain and improve it over the years. So I'm offering it to FO to discuss as well as blogging it.)
Several years ago, while in college, I became interested in the mayhem that is trying to predict the NFL. At first, I was merely curious to see if I could get playoff teams correct from one year to another, but then I got interested in the week-to-week problem. I made several abortive attempts (college is bad for concentration on other things), and then abandoned the project once I started reading TMQ and his dismissal of the complicated systems.
This year, I came back to it. The project I'm working on is drive-based. I'm not yet sold on play-by-play, even with DVOA, as a solid predictor; besides, I don't have time.
The system produces a rating for each team, based on three stats: yards per possession (or "drive"), yards per point, and points per drive.
- I do count penalties, sort of. Offensive penalties will affect an offense's rating, and defensive a defense's.
- Turnovers are counted as negative, with additional penalties if the opponent scores off the turnover.
- I do not count yards in the endzone; anything that happens behind the pylons is 0 (or I guess 100, though I don't know what that would be). I can't justify this, but it primarily affects returns, which often lose a yard or two from their "official" NFL-published length.
- I do count returns, whether interception, kick, punt, or fumble, as part of a possession even though I use the term "drive". This is against for simplicity's sake, even if it conflates special teams in with offense and defense.
- I make no account of punts, other than to mark them as a non-scoring drive that didn't end in a turnover. A muffed punt is not a possession for the returning team unless recovered by them.
From these I manufacture a rating for the drive, offense and defense; the team's offensive or defensive rating is then the average of all drives. I currently generate three rankings from these: a "raw" rank which averages the ranks of a team's offense and defense; a combined rank which generates a team rating from the offense and defense ratings, and ranks them; and adjusted or "versus" rank based on a comparison of a team's offense to all defenses and vice versa. The last two so far tend very close or identical; the first tends to differ quite a bit.
A sample complete rating (actual rating, after three weeks):
Pittsburgh Steelers: Offense 23.54 (#9), Defense -21.38 (#1). Ranks: Average 5, #1. Combined 22.46, #2. Adjusted 99.41, #3.
Full ranking, adjusted rank, no ratings published:
1. Atlanta Falcons
2. Indianapolis Colts
3. Pittsburgh Steelers
4. Green Bay Packers
5. New England Patriots
6. Kansas City Chiefs
7. Philadelphia Eagles
8. Tennessee Titans
9. New York Jets
10. Seattle Seahawks
11. San Diego Chargers
12. Chicago Bears
13. St Louis Rams
14. Miami Dolphins
15. Denver Broncos
16. Cincinnati Bengals
17. Baltimore Ravens
18. Minnesota Vikings
19. Dallas Cowboys
20. New Orleans Saints
21. Washington Redskins
22. Tampa Bay Buccaneers
23. Detroit Lions
24. Cleveland Browns
25. Oakland Raiders
26. Houston Texans
27. Arizona Cardinals
28. New York Giants
29. Carolina Panthers
30. Buffalo Bills
31. San Francisco 49ers
32. Jacksonville Jaguars
Unfortunately, I have neither time nor data to truly adjust this for opponents/strength of schedule. I'm sure you can all find your own team which looks way out of place right now; the sore thumb is of course the Texans. Because of the turnover penalties, I think the system may bias towards good defenses, but I'm going to run it unchanged for at least a full season before I panic and start adjusting.
If you want to see my full data and formulas, email me at marshaldiaz, which is a gmail address. I can provide the spreadsheet I do the majority of the work on in at least xls or the native ods (OpenOffice) formats, maybe more.
Want to know how a team ended up where it is in the ranking? Ask me: I'll give at least a brief rundown, maybe some numbers.
Want to help collect data? I've been pulling data from NFL play-by-play, published on the gamecenters each week. This takes 15-25 minutes per game; I'll throw together a game-record sheet and email it to you: you fill it out (either from NFL or another play-by-play) and email it back. This could speed things up immensely.
Have ideas for improvement? I'd like to hear them, although I personally won't implement them this year - I'll send you a copy of my sheet and you can modify it as you like.
Formulas, method, and ranking data copyright Jonathan Frank, 2010. You can use this information freely for discussion, demonstration, ranting arguments with other fans, and discrediting other people's rankings and conclusions. If you're making money betting using my data, please let me know but I do not demand a cut. (If you want to give me a percentage anyway, go right ahead.) Ranking system and rankings may not be sold, rented, pimped, or otherwise directly used to make yourself money (not that I think they're solid enough yet to make money). You know, the usual copyright stuff.
13 replies , Last at 18 Oct 2010, 1:04pm by Jonadan