Comparing the final Football Outsiders F/+ projections to the AP poll and conventional wisdom.
07 Feb 2013
by Bill Connelly
It seems to happen every year. About midway through a given season, based on new conclusions or lessons learned, I get the itch to adjust some measures and add other ones. And because I'm swamped, I have to sit and stew about it until the offseason.
This past season, the urge was stronger, the philosophical shifts larger. Now, a month after Alabama beat Notre Dame for the 2012 national title, it's time to unroll this year's batch of data changes.
Here's a summary of the changes you'll find at these pages:
This is the largest shift, and really, it renders the original intent of S&P+ moot. But that's a good thing. As I delve further and further into the world of football data, I realize that the art of finishing drives is one of the most important aspects of a game, and it is something that can be slightly overlooked when focusing only on play-by-play data. Without committing to any specific change in advance, I decided to look into what would happen if I took some of my general approaches to play-by-play data -- the things that result in the S&P+ rankings -- and apply them to the drive data so wonderfully available at cfbstats.com. Would adding a drive aspect to S&P+ make it more predictive, more properly evaluative?
The answer was, as I assumed it would be, a rousing yes. Adding a per-drive aspect to the overall S&P+ number, I was able to create a better S&P+ and, therefore, F/+ (the combination of S&P+ and Brian Fremeau's FEI).
It also, however, created something a little awkward. For a long time now, we've discussed F/+ as, basically, a combination of a play-by-play measure and a drive-for-drive measure. With the changes I made to S&P+, that is no longer the case. Instead, it is simply the combination of my measure (and all it entails) and Brian's. Since Brian and I use different methods and approaches, I didn't feel I was stepping on his toes, and when I described to him what I was doing, he agreed. So I moved forward with it. The new S&P+ makes FO's data better, I think, and that should always be the point.
(I'm really excited about this change, in other words.)
I've known for a while that the 'P' in S&P+ (Points Per Play) was worth more than the 'S' (Success Rate), but I wasn't sure I was appropriately weighting the two properly in the combined S&P+ measure. Explosiveness (and the ability to prevent it) is more of a winner in college football than efficiency, and strong PPP numbers were more indicative of quality than strong success rates. After extensive tinkering and experimentation, I think I've got the two aligned properly now. In the play-by-play portion of the S&P+ measure, what you now get is approximately 60 percent PPP and 40 percent Success Rate+. This also makes the overall S&P+ measure better and more appropriately evaluative.
As a nod to The Hidden Game of Football, I tinkered with my idea of equivalent points to see if I could make something better and more properly predictive. As a frame of reference, here was my last equivalent points chart. The idea behind it was to assign a point value to each yard line based on the likely points a team would score on a drive that included a play from that yard line. From the one-yard line, you can expect to score about 0.742 points per possession, therefore the one-yard line was worth 0.742 equivalent points.
In the seminal Hidden Game, however, the authors approached this from the idea of net points -- not how many points you might expect to score when you have the ball at the one, but which team might be expected to score next. If you have the ball at the one, not only are you unlikely to score, but you're relatively likely to punt the ball to the other team (or turn it over) and give them pretty good field position.
So I attempted a compromise. Instead of using my original equivalent points definition, I met the Hidden Game's net points idea halfway. I created an equivalent point value based on the number of points you can expect to score on a drive that involves a given yard line i and the average number of points an opponent typically scores on its next drive. I also smoothed out the rough spots a bit. So instead of the yardage chart above, you get something like this:
The general result of this: a one-yard gain near your goal line (from about the one to the nine) is worth about 0.05 equivalent points, from your 10 to your 21 is worth about 0.06 each, from your 22 to your 34 is worth about 0.07 each, from your 35 to your 46 is worth about 0.08 each, from your 47 your opponent's 41 is worth about 0.09 each, from their 40 to their 29 is worth about 0.10 each, from their 28 to their 16 is worth about 0.11 each, from their 15 to their four is worth about 0.12 each, and from their three onward is worth about 0.13 each.
This is a pretty significant change in thinking, but the end result isn't that different. It is different, however, and makes the equivalent points idea, and the resulting PPP (Points Per Play) figure, more accurately descriptive and predictive.
More than perhaps anything else, the one aspect of my stats that has changed my own way of thinking more than anything else was the idea of standard downs (first down, second-and-6 or fewer, third-and-4 or fewer) and passing downs. Through the years I've come to evaluate offenses and defenses using measures based more on this than simply running and passing. And really, considering the proliferation of the passing game as a series of extended handoffs, this makes sense. Instead of gauging whether an offense can move the ball on the ground and through the air, it makes more sense to look at how well they move the ball on downs in which the defense has to account for both the run and pass (standard downs) and ones in which they most likely have to pass to get the requisite yardage.
Take Nebraska, for instance. Through excellent play-calling, the Cornhuskers were able to better utilize quarterback Taylor Martinez's unique skill set in 2012. Despite Martinez's general inability to throw effectively when he had to, Nebraska's offense ranked eighth overall in Off. S&P+ -- seventh in per-play effectiveness and ninth in per-drive effectiveness. The Huskers ranked fourth on standard downs, in part because of their willingness to throw the ball on standard downs and steal free yardage from defenses that were loading up against the run. As a result, they rarely found themselves in must-pass situations. Now, Nebraska ranked fourth on standard downs and 15th on passing downs, and they ranked third in Rushing S&P+ and 15th in Passing S&P+; the differences are minimal there, but the context is very important.
One problem, however: because standard and passing downs were something I set up almost as an afterthought back in 2008 when I was first diving into play-by-play data, the formulas I was using did not separate out garbage time. So teams that were beating teams by a lot would naturally have higher run percentages on such downs, while teams typically trailing by a lot would skew more toward the pass. Because of the changes required to go back into eight years' worth of play-by-play data (and the nearly one million plays they entail at this point), I never had the time to go back and filter out garbage time. Now I have.
After a lot of tinkering and Excel Goal Seek, the amount of weight carried by the opponent adjustments for S&P+ has changed. It has actually gotten a bit lighter, which is why you see teams like 2012 Michigan State (ninth in S&P+ per the previous calculations) dropping a bit (to 16th). The Spartans were carried by a pretty good strength of schedule and a lot of really tight losses, but ninth was just too damn high.
With these new adjustments, the S&P+ rankings should look more sensible (Michigan State at ninth was off-putting, to say the least), but that wasn't really the concern. They are also more accurate. After all of these changes, I went back and did some retro-fitting to see how the end-of-season numbers would fit with the actual results of each season going back to 2005. On average, F/+ correlates to about 0.93 with a team's overall percentage of points scored (a more telling number than simple win percentage), up from the mid- to high-0.8s, and it correctly predicts the winner of approximately 82 to 84 percent of games. There are about 10 points per game that no attempted adjustment could account for, but this makes sense; 10 points is the equivalent of about two turnovers, and turnovers are, in large part, random.
Like I said, these were some pretty massive changes. I started them in early-December as time allowed, and I'm only now getting everything polished off.
On the above rankings pages, then, you'll see the following specific changes:
F/+: The S&P+ number has been adjusted to account for both the new per-drive data, the new weight of PPP+ in the S&P+ formula, and the new weight of opponent adjustments. Therefore F/+, Off. F/+ and Def. F/+ have all changed.
S&P+: You will now see the overall S&P+ ranking (with new calculations based on per-drive data, PPP+ weighting and opponent adjustments), the offensive and defensive rankings, and the per-play and per-drive rankings for both offense and defense.
Off. and Def. S&P+: You will see the following categories listed (and defined on each page): S&P+ (changed to account for drive data and new PPP+ weighting), Play Efficiency (the per-play measure that used to make up S&P+ by itself), standard downs, passing downs, rushing and passing breakouts (as shared before, only with new weighting and opponent adjustments), Drive Efficiency (the new per-drive measure, calculated in very similar ways, and a measure I'm simply calling DNP, or Difference in Net Points. This is simply the raw difference between your expected points (based on starting field position for a given drive) and the number of points you actually scored. It is not opponent adjusted -- I actually thought it was more powerful to share the raw number for this so you can see the number of points an offense achieves (or a defense prevents) per drive compared to what was expected based on field position.
This is a lot to digest, obviously, but the bottom line is that I'm really excited about these changes, and I think they make FO's college data as a whole more in-depth, accurate, descriptive and interesting. Hopefully you feel the same.
(Ed. Note: These changes also give me some ideas for tweaking the NFL stats in the future, although the timing on that will depend on a lot of other issues. -- Aaron Schatz)
So what else would you like to see moving forward? As the offseason progresses, I will be working to set up pages for line stats (not only Adj. Line Yards and Adj. Sack Rates, but also some of the figures you've grown accustomed to on FO's NFL stat pages: Power Success, Stuffed Rate, etc. I also want to more freely share some of the rusher and receiver data I have compiled, but there is so much data there that I haven't figured out the best cutoffs for sharing. Beyond that, however, is there more you would want to see here?
9 comments, Last at 19 Feb 2013, 5:11pm by cfn_ms