FO Basics: Our College Stats
by Brian Fremeau
Over the next couple weeks, we're going to run a series of articles we're calling FO Basics. We get a lot of questions about our work, but there are also a lot of readers who don't ask questions. We hope this series will help answer some questions and clarify some confusing things for even those readers who don't respond on the message boards.
- August 30: Where our stats come from, and the difference between charting stats and play-by-play stats.
- August 31: A summary of research from our first seven years.
- September 1: Our college stats, how they differ from our NFL stats and from each other.
- September 6: The importance (and limitations) of watching games on tape.
- September 7: Regression towards the mean -- what it means, and how we use it.
College Stats vs. NFL Stats
As detailed in the last two installments of FO Basics, a tremendous amount of research over the last seven years has been poured into the development of the NFL stats found here at Football Outsiders. DVOA and its brethren statistics have been meticulously fine-tuned with a singular goal -- to better identify the metrics that most lead to winning football. One might think that the same principles and statistics that produce championship professional teams should apply to college teams as well. Why, then, do we not use DVOA for college teams? Why do we take a different approach altogether?
In general terms, the same principles that lead to victory in the NFL do apply in college football. We haven't duplicated all of the same NFL research efforts at the college level, but we've certainly made similar observations. Like the pros, the best college football teams also dominate weak opponents and run when they win. Fumble recovery is a random phenomenon at the college level just as it is in the NFL. Starting field position value is a critical component of offensive success. The list goes on.
The difference between our college and pro statistics has less to do with the nature of the game, and everything to do with the much larger number of teams in college football. There are 120 FBS (formerly Division I-A) football programs and there is a vast gulf of power between the best and worst teams. In the NFL, the league is designed for parity, granting underachieving teams the ability to ascend into power via the draft and free agency and through scheduling. In college football, the top programs are exponentially wealthier than lesser conference teams, and they use those resources to remain in power for long periods of time.
Many of the statistics we use here rely on comparisons to average. In the NFL, since all teams have the same opportunities and resources for success, "average" is easily understood and applied. It is a different animal altogether in college, though, where "average" might mean completely different things to different observers. A 6-6 record in the SEC isn't anything like a 6-6 record in the MAC. But how different is it? That's a difficult question when there are few if any games played between those conferences. With only 12 regular season games, college football teams are far less connected to one another via common opponents than are NFL teams.
The breadth and distribution of team strength in college football is even difficult for avid fans to comprehend completely. On a neutral field, a team ranked approximately 25th nationally has a better chance of losing to a team ranked between 40th and 80th than it does of defeating a team ranked in the top 15. Check out the research here. It is completely counterintuitive to the way most fans understand the relative strength of teams, usually visualized as a linearly ordered sequence rather than a normally distributed bell curve. We approach our college football team metrics with all these concerns in mind. And we take two very different approaches to the problem.
Two Rating Systems
It may surprise new readers to know that Bill Connelly and I produce our respective rating systems and supporting metrics entirely independent of one another. We actually collect our own data individually and don't share our complete formulas with each other. We are both in pursuit of identifying the "true" value of each team via stats, just like DVOA. Our methods find consensus about many teams and disagreement about many others (we'll get to this phenomenon shortly). But why take two approaches in the first place?
The short answer is that we each followed our instincts when we first started working with these stats. Each of us has been developing our respective methodologies for several years, and we came at the problem in our own unique ways. My first possession efficiency observations were made in 2002 while watching Notre Dame waste good drives via turnovers, poor execution, and questionable drive-ending decisions in an excruciating loss to Boston College. Following the game, I started collecting data to quantify drive success efficiency and FEI developed from there. A few years later, Bill started collecting play-by-play data in order to better understand and quantify Missouri football and the Big 12 at large, expanding the scope of his project soon after through the creation of S&P+.
Part of my motivation to use drives instead of play-by-play initially was a data management decision. There are nearly 20,000 individual possessions played annually by the 120 FBS teams, and drives provided more than enough data points to collect and try to wrap my head around at first. I know the limitations of drive data versus play-by-play data, and Bill's data splits have far more potential to be used for situational and player performance evaluations going forward. But as he and I have discussed, we think there is great value in the two distinct concepts.
We are each trying to define success and measure team performances against whatever that definition is. At the play level, offensive success is determined based on benchmarks that help lead to scoring according to down/distance effectiveness and points per play explosiveness. At the drive level, offensive success is determined based on benchmarks of starting/ending field position, scoring expectations, and drive-ending results.
It isn't simply a matter of play data describing drive data in more detail, but a distinct approach to measuring the success of an offense meeting its goals. Is it the goal of every play to achieve a certain percentage of available yards? Is it the goal of every drive to reach the end zone? What does failure of either goal mean for a team? Which observations of past results better predict future ones? There aren't necessarily black and white answers to these questions, and but we think there's validity in trying to find out more than we already know.
How Our College Rating Systems Differ
Perusing this summer's College Football Almanac, or the FEI and S&P+ ratings on this site, you may notice some startling differences between team rankings. In 2009, the average difference in overall FEI and S&P+ ranking for each team was 14 spots. Twenty teams differed by at least 24 spots.
|Largest 2009 Ranking Differential Between S&P+ and FEI|
|Texas A&M||Big 12||30||68||38|
|Middle Tennessee||Sun Belt||47||77||30|
|Kansas State||Big 12||104||75||29|
What exactly is going on here? Is there something about the play-by-play and possession efficiency success raw measures that is responsible for these differences? Is there something incongruous about our respective data sets? Are our opponent adjustments the culprit? To find out, we decided to pull the data together from one team in 2009 and check our work against one another. Our test subject was Oklahoma, a team ranked seventh in S&P+ and 21st in FEI. The ranking difference was less dramatic than some of the other teams listed above, but the results were revealing.
|2009 Oklahoma Sooners Game-by-Game Final Ratings Comparison|
|S&P+ Ratings||FEI Ratings|
There are six sets of game ratings provided in this table, three for S&P+ and three for FEI. The S&P and Total GE (Game Efficiency) columns (gray) represent the complete, raw team efficiency measured by our respective systems. Notice that the ranking order of Oklahoma's individual game performances this year are similar across both systems. Whether we used drive data or play-by-play input data, we arrived at relatively the same conclusion about the order of Oklahoma's unadjusted team performances. The Sooners were at their most efficient against Texas A&M, Tulsa, Baylor, and Oklahoma State, and least efficient against Miami, Texas, and Texas Tech.
The next set of columns (blue) represent our first adjustments to the raw data. Bill and I each discount garbage plays and possessions, though we do so in unique ways. Bill's system eliminates non-close-game plays based on the scoring margin of the game at the time of the series. In my system, I count only those possessions up until a point in the game after which the outcome is no longer in doubt, based on remaining possessions and scoring margin. This distinction between our approaches to garbage time means that we each may be "counting" non-garbage plays and drives that the other considers to be garbage time. Our garbage-time-adjusted columns -- Close S&P and Non-Garbage GE -- are slightly different than the total efficiency measures, but still do not impact the distribution of Oklahoma's total game performances too dramatically.
The final set of columns (green) represent our fully adjusted individual game measures, S&P+ and FEI. As a reference, the final team rating of the given opponent is provided. This is where our unique systems clearly differentiate themselves from one another. Bill's system makes a far more subtle adjustment to Oklahoma's adjusted game performances than mine. In part, this may be a reflection on the relative weight our systems give to strength of opposition. It also has to do with the way we weight game data against a team's strongest opposition. My FEI ratings grant extra relevance to data points against top teams, which is why Oklahoma's adjusted FEI game ratings for Miami and Texas make the biggest leaps from non-adjusted data. Bill also weights top performances, but does so using a completely different formula.
And since all 120 team ratings are interrelated, the disruption of our raw efficiency consensus can be especially dramatic for certain teams and even entire conferences. Some teams have recently been boosted via FEI's method and other teams have been boosted by S&P+. One of us may be more right than the other, but until we know for sure, we think there's something to be said for splitting the differences.
The Creation Of F/+
F/+ is a simple combination of our final FEI and S&P+ outputs. We created F/+ as a way to smooth out the extremes of our individual systems, but this past offseason we got around to testing these systems against one another. We discovered that F/+ had a stronger correlation to next year success than any other method at our disposal, so we based our entire College Football Almanac projections around it.
We're not naïve to believe it is the end-all of our research, and Bill and I will each continue working with our individual systems to improve their effectiveness. But there just might be something to be said for acknowledging the value of our two-pronged approach, and the value of examining success via multiple lenses. College football's complexity demands it.
Bill and I have each done extensive research in other extensions of our FEI and S&P+ systems, from unique strength of schedule and projection methods to new play-by-play data slices to better understand the nation's top players and teams. We may introduce new F/+ specific metrics, but most of this research will continue to be hosted in our individual FEI and Varsity Numbers weekly columns. F/+ game predictions will be part of the weekly Seventh Day Adventure series throughout the season.