Writers of Pro Football Prospectus 2008

Most Recent FO Features

McKinnonJer14.jpg

» Week 7 Quick Reads

Did Jerick McKinnon prove against Buffalo that he can be a feature back for Minnesota? Plus the best passers, runners, and receivers of Week 7.

26 Sep 2008

Varsity Numbers: The '+' Concept

by Bill Connelly

The most frustrating thing about college football stats is that while one team is putting up good numbers against a good team, some other team is putting up great numbers against a terrible team. It's impossible to get too much information from statistical rankings because of this.

On August 30, Graham Harrell threw for 536 yards against Eastern Washington; meanwhile, Sam Bradford threw for 395 yards against Cincinnati on September 6. Last year, Colt Brennan threw for 416 yards and 6 touchdowns against Northern Colorado on September 1, while Tim Tebow threw for 304 yards and 2 touchdowns against South Carolina on November 10. By all basic statistical accounts, Harrell's and Brennan's stats were insanely good and looked better than Bradford's and Tebow's performances on the ESPN scroll. However, could Brennan have put up Tebow's numbers against SC? What would Bradford have done against Eastern Washington? Thanks to the "+" concept, we can start to approximate that.

When I started entering all this play-by-play data, one of my main goals was simply to apply some of the basic sabermetric ideas to football. If they make sense in one sport, they should make sense in another, no? The idea behind my EqPts (and therefore Points Per Play) measure came from two baseball concepts: EqR, the Equivalent Runs concept that takes a series of offensive stats and determines how many runs those stats should have produced on average, and Expected Runs, the matrix that shows you, on average, how many runs you can expect out of specific "__ runners on, __ outs" situations. And of course the S&P measure (Success Rate + Points Per Play) was an admitted and obvious take-off of OPS.

Well, the "+" concept is a co-opt of the ERA+ and OPS+ (also known as Adjusted ERA or Adjusted OPS) figures. It starts with saying, basically, that not every 3.68 ERA or 0.890 OPS is created equal. Was it during the deadball era? Was it in a hitter's park or the Polo Grounds? You try to put everybody on as even a playing field as possible to evaluate their stats. That idea should work for football too, right?

The goal of the "+" concept is to adjust for what's expected against different opponents. There is no need to take things like "park factors" into account just yet; the "+" figures I am using for now simply compare a team's output to the average output of the opponent's opponents. For every major measure I use, both the ones I created and the ones I borrowed ("borrowed" sounds much better than "stole") from others -- Success Rate, PPP, S&P, Line Yards/Sack Rates, etc. -- you could create "+" measures that compare an offense's or defense's performance to what their opponents typically averaged.

Here's How it Works

Since I'm a Mizzou fan, and I'm still a bit bitter that the Tigers were left out of last year's BCS bowls, I'll use last year's Rose Bowl as an example. Last January 1, USC thumped Illinois, 49-17. Without taking special teams and turnovers into account, USC put up 48.44 EqPts and a 1.092 S&P to Illinois' 12.11 EqPts and 0.536 S&P. How did USC's offensive performance compare to what a typical team did to Illinois? As one could probably surmise, it compares quite favorably. For the season, the Illinois defense gave up an average of 18.95 EqPts and a 0.669 S&P. For EqPts, USC gained 2.56 times what the average Illinois opponent gained, 256 percent of normal. Meanwhile, its S&P was 163 percent what the typical Illinois opponent managed. One of the main ideas behind the "+" concept is that 100 = 100 percent of normal. Therefore, USC's EqPts+ against Illinois was a stellar 256, and its S&P+ 163.

Meanwhile, if you flip the equation, you can come up with a defensive score as well. (You have to flip the equation so a good defensive performance also results in a score above 100. Keeping everything on the same scale is good for sanity.) Illinois' average offensive numbers were 25.15 EqPts Per Game and a 0.815 S&P; that means USC put up a Defensive EqPts+ of 208 (25.15 divided by 12.11 = 2.08) and a Defensive S&P+ of 152 (0.815 divided by 0.536).

Wasn't that fun? Quite simply, we can more accurately measure how good teams really were. The "+" concept is obviously similar to the VOA number that FO has mastered. If the "100 is good, above 100 is good, below 100 is bad" idea is hard to remember or grasp, you could easily think of a 163 S&P+ as something similar to a 63% VOA. Whatever floats your boat. I know that FO wants to move toward a collegiate DVOA figure if at all possible, but in the meantime consider this a crude substitute. Get used to the "+." You're going to be seeing a lot of it in future Varsity Numbers columns.

The bottom line is that the "+" concept gives you a way to factor in teams' strengths of schedule to their overall stats. Technically you could do this same thing with rushing yards, actual points, or any of the other standard box score stats, but since I've been doing all of this measuring of EqPts, success rates, etc., and since I'm very much sold on the quality of these measurements (and I want you to be as well), by god I'm using them.

Some Rankings

The best way to illustrate what the "+" concept can do is to probably show you some rankings.


Top 10 Offenses in EqPts Per Game
Rank Team EqPts Per Game
1 Florida 171.35
2 Oregon 158.70
3 Louisville 155.60
4 West Virginia 155.43
5 Tulsa 154.38
6 Missouri 150.59
7 Texas Tech 150.24
8 Kentucky 150.22
9 LSU 148.80
10 Navy 148.31

Now, none of the names on that list are particularly surprising, but how do these rankings compare to pure scoring and yardage rankings?


Top 10 Offenses, with Traditional Rankings
Team Points Per Game Rank Yards Per Game Rank
Florida 3 14
Oregon 12 10
Louisville 18 6
West Virginia 9 15
Tulsa 6 1
Missouri 8 5
Texas Tech 7 2
Kentucky 15 24
LSU 11 26
Navy 10 22

And what about some of the teams who ranked high in the "regular" rankings but didn't appear in the top 10 above?


Other Notable Offenses, with EqPts+ Rankings
Team Points Per Game Rank EqPts+ Per Game Rank
Hawaii 1 13
Kansas 2 14
Boise State 4 18
Houston 4* 37
Oklahoma 5 13
* This is Houston's ranking in yards per game.

As you would expect, teams with tougher slates -- i.e,. a lot of SEC teams -- were held in higher regard using the "+" concept. And Houston played a really weak schedule.

So what about the S&P+ measure? That takes both efficiency and explosiveness into account. It also eliminates the built-in advantages of spread offenses that huddle rarely and run a lot more plays when it comes to points per game.


Top 10 Offenses in S&P+
1 Florida 157.54

2 West Virginia 136.17

3 Navy 133.04

4 Texas Tech 132.23

5 Louisville 130.88

6 Hawaii 130.47

7 Oregon 129.5

8 Missouri 128.67

9 LSU 128.33

10 California 126.42

First of all, kudos to Florida and to Heisman voters. Tim Tebow quarterbacked what was simply the best offense in the country according to these numbers, and since he basically was the rushing game ... yeah, Tebow gets some dap.

And just for fun...


Rushing S&P+, Offense Passing EqPts+, Offense
1 Florida Florida
2 West Virginia Louisville
3 Oregon Tulsa
4 Navy Oklahoma
5 LSU Hawaii

Some dap for Navy there as well -- of course, they put up big-time rushing numbers running Paul Johnson's option system, but they apparently did it against a series of respectable rushing defenses. He's already seeing success on the ground at Georgia Tech too.

On to defense.

But first, a caveat: It's very much possible for an offense to put up something like 0.32 EqPts in a given game. Since you're flipping the equation now, the opposing team's offensive average is in the numerator, and the 0.32 would be in the denominator. If you take that team's average (say, 15.0) and divide it by their 0.32 output for that game, you're going to get an insanely high defensive EqPts+ score (4687.5, to be exact), and obviously that would significantly skew averages. So I installed a cap: no "+" score for an individual game can be higher than 300. I'm open to suggestion on whether or not there's a better cap to use, but that's what I've applied to date.


EqPts+, Defense S&P+, Defense
1 Ohio State (134.91) Ohio State (163.65)
2 USC (123.62) USC (144.67)
3 LSU (122.75) LSU (144.45)
4 Virginia Tech (119.97) Virginia Tech (140.65)
5 Oklahoma (119.77) Rutgers (140.34)
6 TCU (118.16) Oregon State (140.24)
7 Rutgers (117.95) Oklahoma (134.54)
8 South Florida (117.72) Penn State (133.84)
9 Penn State (117.67) Boise State (133.01)
10 Auburn (117.41) Arizona State (131.68)

And...


Rushing S&P+, Defense Passing S&P+, Defense
1 Ohio State Ohio State
2 Oregon State Rutgers
3 UCLA Utah
4 Penn State Arkansas
5 Wyoming Virginia Tech

This obviously opens the door for a ranking system. This is nothing official by any means, but what happens if you add together the two EqPts+ measures and the two S&P+ measures? You get the following list:


Rank Team EqPts+ plus S&P+
1 LSU 544.33
2 Ohio State 535.40
3 Florida 531.91
4 USC 525.97
5 Oklahoma 519.91
6 West Virginia 516.41
7 Oregon 512.18
8 Missouri 502.32
9 Boise State 492.76
10 Kansas 491.05

Aside from the Boise State outlier (and the fact Georgia is inexplicably No. 27), that's really not a bad list. It's a start, anyway. Next week, we'll talk about which stats are the most correlated with actual victories, and that will give us a good idea of which categories need to be included in any sort of "+" ranking.

Tweaks

One thing I'll look into doing when I have more time on my hands is seeing which of the following methods is best for doing this.

Method A: What I've described above. Team A's S&P vs Team B divded by Team B's average allowed S&P.

Method B: Comparing it more intricately to what would have been expected by saying "Team A rushed 27 times against Team B and passed 38 times. That should have produced 18.67 EqPts and a 0.734 S&P." It's more specific (and time-consuming) than the current method, and while it wouldn't change much when it comes to the major categories (Overall EqPts+, et. al.), it might be better for the categories with smaller sample sizes ("Third Down Non-Passing Downs Rushing S&P+" and things like that) where one really good or really bad game (or even one play) can skew the numbers.

Conclusion

As I said in my first column, I tend to think of three main reasons for getting as deep as possible into sports statistics: 1) to understand the game better, 2) to evaluate/rank stuff, and 3) to predict stuff. The "+" concept applies mostly to (2), and possibly a bit toward (3) -- if a team has an Offensive EqPts+ of 113, you could take their upcoming opponent's Defensive EqPts Per Game figure, multiply it by 1.13, and get a pretty decent read on how many points they may score. We'll go into a bit more detail on that in a future column. (All the degenerate gamblers' ears just perked up.)

There are two problems with the "+" concept, however:

1) As with most measures, it requires a pretty healthy sample size before it really becomes applicable. You probably need four or five games under your belt before your averages really mean anything.

2) It requires a full set of data/results from all 120 FBS teams (plus the six "tiers" of FCS teams into which I merge all FCS opponents) to give a 100 percent complete look, and right now that's not possible. I'm keeping up with all BCS conference results (40 to 50 play-by-plays) on a week-to-week basis, but there are still 15 to 20 non-BCS conference games that get left by the wayside. I'm working on ways to up my productivity in that regard, but until that happens I'm limited. For recent Mizzou previews (example here), I've been taking the 2007 "+" numbers and making manual adjustments as I see fit -- obviously not the most statistically pure way to go about it. But it's all I can do until I've got everybody's data.

In all, though, I like to consider the "+" concept a pretty strong step in the right direction, especially when some programmer takes pity on me and creates a play-by-play parser or something. Next week we'll begin to look at some of the ways the "+" can be used to both evaluate teams in detail and tell us what's most important when it comes to simply winning games.

Feedback

More responses to comments from last week's column...

Good statistics are either explanatory or predictive -- they either show why something happened, or predict what will happen in the future. These stats are neither.

Nor were they supposed to be. Think of last week's column as a table setter for future columns.

What you should be doing is considering punts as turnovers, because that's what they are.

This is actually an interesting idea. I might have to tinker with this. No matter how I do it, the basic point impact will be the same, but how it's applied to an offense's point total can still be tweaked.

Might the other 3 "missing" points be from OVERTIME possessions, as both teams start on the other's 25 yd line? Not sure, but I bet a 25 yd run/pass TD on 1st down wouldn't generate 6/7 EqPts but would on the scoreboard.

This is an absolutely fantastic point. Starting a possession at your opponent's 25, with no special teams occurrence or turnover to get you there, wipes a few points (3.706, to be exact--that's the point value of the opponent's 25) off the board. There aren't simply a ton of overtime games, so I doubt this accounts for the full missing two points per game, but it has to account for something, and I slapped my forehead awfully hard when I read this.

Posted by: Bill Connelly on 26 Sep 2008

9 comments, Last at 30 Sep 2008, 1:05am by swc

Comments

1
by Pat (filler) (not verified) :: Fri, 09/26/2008 - 4:28pm

For the season, the Illinois defense gave up an average of 18.95 EqPts and a 0.669 S&P. For EqPts, USC gained 2.56 times what the average Illinois opponent gained, 256 percent of normal. Meanwhile, its S&P was 163 percent what the typical Illinois opponent managed. One of the main ideas behind the "+" concept is that 100 = 100 percent of normal. Therefore, USC's EqPts+ against Illinois was a stellar 256, and its S&P+ 163.

I should've mentioned this in a previous thread, because I was worried this was what you were leaning towards, and I don't think it's entirely appropriate for college football.

You're essentially saying that "performance is linear" - that is, if a team allows 20 EqPts per game, and you score 30 EqPts on them, that's the same as if a team allows 10 EqPts per game, and you score 15 on them.

I don't really think that's true. But more importantly, it's testable. You've got a database of the results of college football games, right? Select all teams (team set A) that allow, on average, 20-ish EqPts per game. Now select all teams that played those teams (team set B), and scored 30-ish EqPts per game on them. Now select all teams that those teams played (team set C), and bin them into "EqPts allowed per game", and plot the EqPts that teams from team set B scored, in each bin.

It'd be really interesting to see. Of course, to get something like "EqPts+", you'd have to make those graphs for all "team set Bs," but making just one of those plots might give an idea of what the others might look like.

It also raises the question as to whether or not it's right to treat "a game" as the fundamental unit of football, as opposed to "a drive" (or "a play") - that is, does it make sense to compare how many points a team scores in two games that aren't the same length? I dunno.

3
by Bill Connelly :: Sat, 09/27/2008 - 11:45pm

This is why I wanted to write for FO. I wanted good suggestions, and I think there's quite a bit of good stuff here. Here are my initial responses.

You're essentially saying that "performance is linear" - that is, if a team allows 20 EqPts per game, and you score 30 EqPts on them, that's the same as if a team allows 10 EqPts per game, and you score 15 on them.

To me, scoring 15 points on a 10-point defense is really just about as impressive as scoring 30 on a 20-point defense. But even if you disagree with that, I do think a slightly different way of figuring EqPts+ (the way I referenced above--saying "this many runs and passes against these opponents should have produced this many EqPts and an S&P of ____" and comparing the actual output to it) would make things a little clearer in that regard.

It also raises the question as to whether or not it's right to treat "a game" as the fundamental unit of football, as opposed to "a drive" (or "a play") - that is, does it make sense to compare how many points a team scores in two games that aren't the same length?

In my lifetime, Lou Holtz has officially said one thing I actually took to heart: "You coach a different team every week." It's true in the NFL, but it's really true in college--look at the USC team that took the field against Ohio State and the one against Oregon State. I understand the sample size problem with delving too deeply into stats from 12-14 units, but with college football I think it's sometimes a necessary evil. If in one week, your offense is unstoppable over 50 plays, and the next it's horrendous over 75, the bad performance is worth 50% more than the good performance on a per-play basis. (I realize it's more likely to be the opposite--horrendous over 50, great over 75, but the point is still the same in that case.) But really, this only comes into play with the EqPts+ measure. S&P+ is, in the end, based on plays, especially the reconfigured way to measure it. Like I said, plenty of tweaking to do, but I'm pretty satisfied with the first step...

6
by Pat (filler) (not verified) :: Mon, 09/29/2008 - 5:59pm

In my lifetime, Lou Holtz has officially said one thing I actually took to heart: "You coach a different team every week."

Well, I agree there. But the problem with treating all games equally is that they're not all the same. They're not all the same length, for one, and so if a team wins a 50-play game 20-10, and then wins a 75-play game 31-14, was their offense better? They played 50% more plays, and scored 50% more points (roughly). If you consider plays the fundamental unit in football, they played exactly the same.

What if it's a 50-play game that's 12 drives long, versus a 75-play game that's 12 drives long? Is there really a difference there? Do you expect a scoring difference in that case?

What I'm really trying to bring up is a very basic problem in football: pace is difficult to deal with. In some sense, it's choice: running the ball a lot will bleed clock, and shorten a game. But in another sense, it's also execution: running the ball all the time, and generating 3-and-outs will lengthen a game just as much as passing.

I think you have to control for pace in some sense when you're averaging team performances, and I think the best way to do that is by number of drives. You just can't score 30 points as easily in an 8-drive game as you can in a 15-drive game, and how many drives an offense gets is determined by their defense, not them.

2
by Anonymous (not verified) :: Sat, 09/27/2008 - 3:17pm

"Good statistics are either explanatory or predictive -- they either show why something happened, or predict what will happen in the future. These stats are neither.

Nor were they supposed to be. Think of last week's column as a table setter for future columns."

So you're spending all these words to give us stats that:

1) Don't accurately explain what did happen.
2) Don't accurately predict what will happen.

And the reason? So future stats can be built on them? Why is there any expectation that useless statistics should suddenly become useful when complex adjustments are applied to them? You might as well be using opponent-adjusted equivalent cheer-leader breast-size. Hell, you'd probably get more page-views, too.

4
by NHPatsFan (not verified) :: Sun, 09/28/2008 - 10:01pm

Umm....If the man's done what seems to be implied - built an entire sabermetric analysis of college football, it would seem that there's a substantial amount of work to explain the numbers.

Unlesss you just want a number, the bigger the better, and who gives a crap what it means. Okay, the Michigan Wolverines are a kajillion times better than Ohio State. Feel better?

Give the man some time to tell his story.

Anybody have any idea how to define or describe the entire mountain of literature written to develop and define the original baseball sabermetric stats?

5
by Anonymous (not verified) :: Mon, 09/29/2008 - 12:06pm

That's exactly the problem: "there's a substantial amount of work to explain the numbers." The valuable thing is when there's a substantial amount of numbers to explain the work, so to speak. A statistic isn't measured by how hard it is to calculate, it's measured by how much information it provides.

Definition of baseball sabermetrics:

A group of stats which are highly predictive (context-free run-scoring/prevention) or explanatory (contextual win expectancy).

That was hard.

Sites like Baseball Prospectus use these tools appropriately -- they talk about how a closer is having a monster year based on WXRL (basically, how much he increased his team's chance of winning) and they talk about how a team will do next year based on their predicted runs scored/allowed. FO's stats are purely context-free (and thus, hopefully, predictive -- this is less clear in football, which is a more contexet-driven game than baseball).

There is still a clear line between "good statistical analysis" and...this. Any statistician who needs "time to tell his story" because his numbers have no value in analysis has, simply, failed.

7
by swc (not verified) :: Mon, 09/29/2008 - 7:28pm

Anonymous Disagreeing Guy:
Good statistics are either explanatory or predictive -- they either show why something happened, or predict what will happen in the future.

I think there's a critical third category that you're missing here, that is most important to college football. Given the fact that there are 120 D1-A teams that each play 12-14 games in a season, there is always a cloud of questions surrounding the end of the season. Did the "best" team win the national title? Which is the best conference? Do the polls reflect the results from the field? So college football needs descriptive statistics to show "what happened" so those of us who aren't satisfied with the narratives supplied by ESPN and other members of the media can make sense of the previous season. Also, unlike baseball and even NFL football, the variability of your opponents' strength needs to be taken into account as much as is possible. Using opponent-adjusted averages instead of raw stats is useful to that end.

Re: The fundamental unit of football
I'm satisfied with "game" as the fundamental unit of football. The reason "game" works, is that regardless of the number of plays or drives a game represents what a team can do in 60 minutes of football. The number of plays by each team in a given game is entirely dependent on the offenses and defenses playing that game. In that case, plays per game should vary just as yards or points per game would.

You just can't score 30 points as easily in an 8-drive game as you can in a 15-drive game, and how many drives an offense gets is determined by their defense, not them.

This is correct, but the defense doesn't play the drive. The defense can only guarantee a maximum of 4 plays per drive, after that the offense has to get first downs and actually move the ball. Similarly, drives are incredibly variable, much more so than games also, the goodness of a drive is variable with respect to game situation. If I score on a 15 play drive that took 9 minutes, that might be fantastic if it's early in the game or if my team is ahead by a small margin and trying to eat the clock. On the other hand it would be a disaster if we were down by 14 and there was only 1 minute left afterward. Sometimes a drive is successful if it just results in poor field position for your opponent. With a game, success is clear. Did I get more points/yds/fds than my opponent? Did I perform well with respect to my team's average performance and the average performance allowed by my opponents?

8
by Anonymous (not verified) :: Mon, 09/29/2008 - 7:43pm

We're largely agreeing.

"So college football needs descriptive statistics to show "what happened" so those of us who aren't satisfied with the narratives supplied by ESPN and other members of the media can make sense of the previous season."

That's EXACTLY what explanatory stats are -- stats that tell you what happened. They may not be predictive (consistent with themselves, or other stats, week to week), but they tell you *what happened*. A good example from baseball is pitcher BABIP. It's *incredibly* important to a pitcher -- the difference between a .100 BABIP and a .600 BABIP is the difference between a Cy Young award and the humiliating end of a career. Yet it's totally out of the pitcher's control. 100% explanatory, not at all predictive.

Another example is fumbles. It turns out that fumble recovery is almost 0% predictive. Nonetheless, it's incredibly explanatory -- imagine trying to get to, and win, a super bowl, if you didn't recover a single fumble (or, on the other hand, if you recovered ALL of them).

"Also, unlike baseball and even NFL football, the variability of your opponents' strength needs to be taken into account as much as is possible. Using opponent-adjusted averages instead of raw stats is useful to that end."

100% agree. It's actually quite important in the NFL, too, though you're right that it's not as important.

That's not the point here, though. Opponent-adjusting a stat doesn't make it better, it eliminates defects. You can polish silver to make it even more shiny, but polishing a turd leaves you with a polished turd. Opponent adjusted score differential would be a great stat for college football (or opponent adjusted drive efficiency, etc.). The argument I'm making here is that opponent adjusting these stats leaves you with a polished turd.

9
by swc (not verified) :: Tue, 09/30/2008 - 1:05am

Opponent-adjusting a stat doesn't make it better, it eliminates defects. You can polish silver to make it even more shiny, but polishing a turd leaves you with a polished turd. Opponent adjusted score differential would be a great stat for college football (or opponent adjusted drive efficiency, etc.). The argument I'm making here is that opponent adjusting these stats leaves you with a polished turd.

First off, eliminating defects from something, by rule, makes that thing better and if a polished turd is the best I can get then I'll take that over plain old turd any day. Going back to "explanitory" stats, I'm not sure I get your complaint. Every statistic has "what happened" information (Team A scored 33 points per game) but not "why it happened" information (which I thought was the point of the original post). The goal of this exercise is just to add some additional information. You could look at a teams EqPts (or whatever else) and then look at their opponents EqPts given up and draw your own conclusions or you could do a little math and roll that data into a single easily digested value. As far as predictive value goes, I'm not convinced that any statistic or linear combination thereof can predict with any reasonable accuracy the future performance of a college football team.