FO Basics: Our College Stats

by Brian Fremeau
Over the next couple weeks, we're going to run a series of articles we're calling FO Basics. We get a lot of questions about our work, but there are also a lot of readers who don't ask questions. We hope this series will help answer some questions and clarify some confusing things for even those readers who don't respond on the message boards.
The schedule:
- August 30: Where our stats come from, and the difference between charting stats and play-by-play stats.
- August 31: A summary of research from our first seven years.
- September 1: Our college stats, how they differ from our NFL stats and from each other.
- September 6: The importance (and limitations) of watching games on tape.
- September 7: Regression towards the mean -- what it means, and how we use it.
College Stats vs. NFL Stats
As detailed in the last two installments of FO Basics, a tremendous amount of research over the last seven years has been poured into the development of the NFL stats found here at Football Outsiders. DVOA and its brethren statistics have been meticulously fine-tuned with a singular goal -- to better identify the metrics that most lead to winning football. One might think that the same principles and statistics that produce championship professional teams should apply to college teams as well. Why, then, do we not use DVOA for college teams? Why do we take a different approach altogether?
In general terms, the same principles that lead to victory in the NFL do apply in college football. We haven't duplicated all of the same NFL research efforts at the college level, but we've certainly made similar observations. Like the pros, the best college football teams also dominate weak opponents and run when they win. Fumble recovery is a random phenomenon at the college level just as it is in the NFL. Starting field position value is a critical component of offensive success. The list goes on.
The difference between our college and pro statistics has less to do with the nature of the game, and everything to do with the much larger number of teams in college football. There are 120 FBS (formerly Division I-A) football programs and there is a vast gulf of power between the best and worst teams. In the NFL, the league is designed for parity, granting underachieving teams the ability to ascend into power via the draft and free agency and through scheduling. In college football, the top programs are exponentially wealthier than lesser conference teams, and they use those resources to remain in power for long periods of time.
Many of the statistics we use here rely on comparisons to average. In the NFL, since all teams have the same opportunities and resources for success, "average" is easily understood and applied. It is a different animal altogether in college, though, where "average" might mean completely different things to different observers. A 6-6 record in the SEC isn't anything like a 6-6 record in the MAC. But how different is it? That's a difficult question when there are few if any games played between those conferences. With only 12 regular season games, college football teams are far less connected to one another via common opponents than are NFL teams.
The breadth and distribution of team strength in college football is even difficult for avid fans to comprehend completely. On a neutral field, a team ranked approximately 25th nationally has a better chance of losing to a team ranked between 40th and 80th than it does of defeating a team ranked in the top 15. Check out the research here. It is completely counterintuitive to the way most fans understand the relative strength of teams, usually visualized as a linearly ordered sequence rather than a normally distributed bell curve. We approach our college football team metrics with all these concerns in mind. And we take two very different approaches to the problem.
Two Rating Systems
It may surprise new readers to know that Bill Connelly and I produce our respective rating systems and supporting metrics entirely independent of one another. We actually collect our own data individually and don't share our complete formulas with each other. We are both in pursuit of identifying the "true" value of each team via stats, just like DVOA. Our methods find consensus about many teams and disagreement about many others (we'll get to this phenomenon shortly). But why take two approaches in the first place?
The short answer is that we each followed our instincts when we first started working with these stats. Each of us has been developing our respective methodologies for several years, and we came at the problem in our own unique ways. My first possession efficiency observations were made in 2002 while watching Notre Dame waste good drives via turnovers, poor execution, and questionable drive-ending decisions in an excruciating loss to Boston College. Following the game, I started collecting data to quantify drive success efficiency and FEI developed from there. A few years later, Bill started collecting play-by-play data in order to better understand and quantify Missouri football and the Big 12 at large, expanding the scope of his project soon after through the creation of S&P+.
Part of my motivation to use drives instead of play-by-play initially was a data management decision. There are nearly 20,000 individual possessions played annually by the 120 FBS teams, and drives provided more than enough data points to collect and try to wrap my head around at first. I know the limitations of drive data versus play-by-play data, and Bill's data splits have far more potential to be used for situational and player performance evaluations going forward. But as he and I have discussed, we think there is great value in the two distinct concepts.
We are each trying to define success and measure team performances against whatever that definition is. At the play level, offensive success is determined based on benchmarks that help lead to scoring according to down/distance effectiveness and points per play explosiveness. At the drive level, offensive success is determined based on benchmarks of starting/ending field position, scoring expectations, and drive-ending results.
It isn't simply a matter of play data describing drive data in more detail, but a distinct approach to measuring the success of an offense meeting its goals. Is it the goal of every play to achieve a certain percentage of available yards? Is it the goal of every drive to reach the end zone? What does failure of either goal mean for a team? Which observations of past results better predict future ones? There aren't necessarily black and white answers to these questions, and but we think there's validity in trying to find out more than we already know.
How Our College Rating Systems Differ
Perusing this summer's College Football Almanac, or the FEI and S&P+ ratings on this site, you may notice some startling differences between team rankings. In 2009, the average difference in overall FEI and S&P+ ranking for each team was 14 spots. Twenty teams differed by at least 24 spots.
Largest 2009 Ranking Differential Between S&P+ and FEI | ||||
Team | Conf | S&P+ Rank |
FEI Rank |
Delta |
Nevada | WAC | 42 | 93 | 51 |
Troy | Sun Belt | 36 | 84 | 48 |
Texas A&M | Big 12 | 30 | 68 | 38 |
Colorado State | MWC | 59 | 97 | 38 |
SMU | Conf USA | 102 | 67 | 35 |
Baylor | Big 12 | 53 | 87 | 34 |
Temple | MAC | 89 | 55 | 34 |
Duke | ACC | 103 | 69 | 34 |
Middle Tennessee | Sun Belt | 47 | 77 | 30 |
Kansas State | Big 12 | 104 | 75 | 29 |
Northwestern | Big Ten | 75 | 47 | 28 |
Hawaii | WAC | 87 | 114 | 27 |
UAB | Conf USA | 69 | 96 | 27 |
Utah | MWC | 22 | 49 | 27 |
Rutgers | Big East | 78 | 52 | 26 |
Northern Illinois | MAC | 91 | 65 | 26 |
BYU | MWC | 11 | 35 | 24 |
Syracuse | Big East | 61 | 85 | 24 |
Navy | Ind | 62 | 38 | 24 |
Georgia Tech | ACC | 29 | 5 | 24 |
What exactly is going on here? Is there something about the play-by-play and possession efficiency success raw measures that is responsible for these differences? Is there something incongruous about our respective data sets? Are our opponent adjustments the culprit? To find out, we decided to pull the data together from one team in 2009 and check our work against one another. Our test subject was Oklahoma, a team ranked seventh in S&P+ and 21st in FEI. The ranking difference was less dramatic than some of the other teams listed above, but the results were revealing.
2009 Oklahoma Sooners Game-by-Game Final Ratings Comparison | ||||||||||||||||||
S&P+ Ratings | FEI Ratings | |||||||||||||||||
Week | Opponent | W/L | OU Pts For |
OU Pts Against |
S&P | Rk | Close S&P |
Rk | Opp Rank |
S&P+ | Rk | Total GE |
Rk | Non- Gbg GE |
Rk | Opp Rank |
FEI | Rk |
1 | BYU | L | 13 | 14 | -.027 | 9 | -.027 | 9 | 11 | 286.1 | 5 | -.011 | 8 | -.011 | 8 | 35 | .183 | 9 |
3 | Tulsa | W | 45 | 0 | .461 | 2 | .647 | 1 | 67 | 316.1 | 3 | .459 | 2 | .714 | 1 | 88 | .441 | 2 |
5 | Miami | L | 20 | 21 | -.095 | 11 | -.095 | 11 | 12 | 245.4 | 10 | -.012 | 9 | -.014 | 9 | 10 | .458 | 1 |
6 | Baylor | W | 33 | 7 | .356 | 4 | .321 | 4 | 53 | 255.9 | 9 | .286 | 3 | .247 | 5 | 87 | -.025 | 11 |
7 | Texas | L | 13 | 16 | -.046 | 10 | -.046 | 10 | 5 | 292.7 | 4 | -.026 | 10 | -.027 | 10 | 6 | .427 | 3 |
8 | Kansas | W | 35 | 13 | .185 | 6 | .286 | 5 | 55 | 277.5 | 7 | .217 | 5 | .345 | 3 | 62 | .405 | 4 |
9 | Kansas State | W | 42 | 30 | .230 | 5 | .260 | 6 | 104 | 216.0 | 11 | .163 | 6 | .180 | 6 | 75 | .010 | 10 |
10 | Nebraska | L | 3 | 10 | .067 | 8 | .067 | 8 | 18 | 272.8 | 8 | -.061 | 11 | -.065 | 11 | 20 | .322 | 7 |
11 | Texas A&M | W | 65 | 10 | .601 | 1 | .399 | 2 | 30 | 337.7 | 2 | .462 | 1 | .402 | 2 | 68 | .283 | 8 |
12 | Texas Tech | L | 13 | 41 | -.355 | 12 | -.516 | 12 | 23 | 172.4 | 12 | -.320 | 12 | -.421 | 12 | 18 | -.029 | 12 |
13 | Okla. State | W | 27 | 0 | .405 | 3 | .387 | 3 | 31 | 412.8 | 1 | .234 | 4 | .266 | 4 | 42 | .323 | 6 |
B | Stanford | W | 31 | 27 | .146 | 7 | .146 | 7 | 40 | 280.3 | 6 | .038 | 7 | .041 | 7 | 19 | .357 | 5 |
There are six sets of game ratings provided in this table, three for S&P+ and three for FEI. The S&P and Total GE (Game Efficiency) columns (gray) represent the complete, raw team efficiency measured by our respective systems. Notice that the ranking order of Oklahoma's individual game performances this year are similar across both systems. Whether we used drive data or play-by-play input data, we arrived at relatively the same conclusion about the order of Oklahoma's unadjusted team performances. The Sooners were at their most efficient against Texas A&M, Tulsa, Baylor, and Oklahoma State, and least efficient against Miami, Texas, and Texas Tech.
The next set of columns (blue) represent our first adjustments to the raw data. Bill and I each discount garbage plays and possessions, though we do so in unique ways. Bill's system eliminates non-close-game plays based on the scoring margin of the game at the time of the series. In my system, I count only those possessions up until a point in the game after which the outcome is no longer in doubt, based on remaining possessions and scoring margin. This distinction between our approaches to garbage time means that we each may be "counting" non-garbage plays and drives that the other considers to be garbage time. Our garbage-time-adjusted columns -- Close S&P and Non-Garbage GE -- are slightly different than the total efficiency measures, but still do not impact the distribution of Oklahoma's total game performances too dramatically.
The final set of columns (green) represent our fully adjusted individual game measures, S&P+ and FEI. As a reference, the final team rating of the given opponent is provided. This is where our unique systems clearly differentiate themselves from one another. Bill's system makes a far more subtle adjustment to Oklahoma's adjusted game performances than mine. In part, this may be a reflection on the relative weight our systems give to strength of opposition. It also has to do with the way we weight game data against a team's strongest opposition. My FEI ratings grant extra relevance to data points against top teams, which is why Oklahoma's adjusted FEI game ratings for Miami and Texas make the biggest leaps from non-adjusted data. Bill also weights top performances, but does so using a completely different formula.
And since all 120 team ratings are interrelated, the disruption of our raw efficiency consensus can be especially dramatic for certain teams and even entire conferences. Some teams have recently been boosted via FEI's method and other teams have been boosted by S&P+. One of us may be more right than the other, but until we know for sure, we think there's something to be said for splitting the differences.
The Creation Of F/+
F/+ is a simple combination of our final FEI and S&P+ outputs. We created F/+ as a way to smooth out the extremes of our individual systems, but this past offseason we got around to testing these systems against one another. We discovered that F/+ had a stronger correlation to next year success than any other method at our disposal, so we based our entire College Football Almanac projections around it.
We're not naïve to believe it is the end-all of our research, and Bill and I will each continue working with our individual systems to improve their effectiveness. But there just might be something to be said for acknowledging the value of our two-pronged approach, and the value of examining success via multiple lenses. College football's complexity demands it.
Other Data
Bill and I have each done extensive research in other extensions of our FEI and S&P+ systems, from unique strength of schedule and projection methods to new play-by-play data slices to better understand the nation's top players and teams. We may introduce new F/+ specific metrics, but most of this research will continue to be hosted in our individual FEI and Varsity Numbers weekly columns. F/+ game predictions will be part of the weekly Seventh Day Adventure series throughout the season.
Comments
6 comments, Last at 16 Sep 2010, 6:56am
#1 by jpeta // Sep 01, 2010 - 9:00pm
Brian (and if you're reading, Bill),
You've both written over the last couple weeks about fine tuning the projection system to account for new research (program history, coaching changes, etc.) beyond what was in the FOA. As you know the FOA only contained F/+ ratings for BCS teams and some other selective teams. As a multi-year follower of the separate FEI and S&P rankings on a weekly basis, I'd like to know if we'll get to see the full projections this year for all 120 teams for all three systems? Love both you work. Joe
#2 by Bill Connelly // Sep 01, 2010 - 10:16pm
The final F/+ Top 120 will be unveiled Friday in Varsity Numbers. We finished it up (more or less) tonight, barring any further injuries/suspensions. Brian can answer where/if FEI projections will be made public. Honestly, there are no S&P+ projections -- all the things I'd have done to set them up, I already did to the F/+ projections. Using the same method for S&P+ seemed repetitive.
#3 by zlionsfan // Sep 01, 2010 - 10:17pm
I'd love to see full projections for everyone.
Of course, I'd love to see I-AA projections too, but it's hard enough simply to get box scores for those guys, and the audience for that data is probably, um ... two. Maybe three. Wait, more than that ... Brown is a I-AA school. (Forget that FCS stuff, no one likes those stupid acronyms.)
#4 by Joseph // Sep 02, 2010 - 10:40am
Personally, I think that, until you can better your individual systems to make them "more like reality", using the combination of both is the best.
It would be interesting to see if combining, for example, PFR's SRS rating system, with DVOA would produce a more accurate model, similar to your F+ system.
#5 by TravisF (not verified) // Sep 03, 2010 - 3:52pm
Brian and Bill,
You seem to be working on stats that focus more on the better team to know who has a better chance of winning or how efficient a team is with it's possessions.
Do you do any work that may level the playing field between teams that play different styles from different conferences? College basketball has been using possessions and pace to help show that just because one team scores 90 points per game, they may take 120 possessions to do it, while another team scores 80 on 80 possessions. I see that's a measure of efficiency, but it's just an example.
Another question is do you run any numbers on that could settle the SEC D vs Big 12 O debates or dominant performances?
For example (all stats that follow are my own calculations for FBS games only), Florida gave up 135 yards passing and 89 yards rushing per game leading up to the SEC Championship. Alabama averaged 202 yards passing and 203 yards rushing, while the national average was 221 and 152, respectively. With Florida's defense being well above average, it wasn't a crazy idea to think that Bama was likely to get 130 yards passing and 130 yards rushing. When they came out and put up 239 yards passing and 251 yards rushing for 490 total yards, that may have been more equivalent to a 650 yard game against an average defense.
I'd enjoy reading more about those type of stats if you have them. If you don't, how do I join the club!
#6 by Subrata Sircar // Sep 16, 2010 - 6:56am
Another concern with college football is that you're pretty much guaranteed to turn over 25+% of your personnel each year as they exhaust eligibility. Given the way eligibility tends to work, that's almost certain to be largely the more productive folks, too. Put another way, just when you've got them housebroken and trained, they leave :<)