by Bill Connelly
It struck me a while back: I have been writing Varsity Numbers for a long time now! Plenty of other FO features have been around even longer, but a lot has changed since September 5, 2008.
My way of thinking about college football and its stats has shifted significantly through the years; often I'll make what I feel is an improvement in the way I look at something, but other times ideas simply fall by the wayside. What I am going to try to periodically do this season is look back at some older columns -- throwback Numbers -- and look at what has changed or what I may have forgotten.
Some excerpts from that first column:
November 10, College Park, Maryland. Tight end Jason Goode catches two touchdown passes (of ten and seven yards) from Chris Turner as Maryland upsets No. 8 Boston College, 42-35. Good for Mr. Goode. However, what contributed more to Maryland's touchdowns -- Goode's two receptions or the 43-yard catch by Darrius Hayward that set up the first score and the 45-yard catch by Isaiah Williams that set up the second? What if we could apply a point value to all four catches?
"Darrius Hayward." Nice. No "Bey" yet.
This is one of the lasting ideas behind what I do. We have seen a shift toward using more per-play stats in print/web and television analysis, and I don't think people view touchdowns as end-all, be-all anymore, but I'm still not sure we give proper credit to the impact of big plays. College offenses are operated by 18- to 22-year-old males with lots on their minds; they're going to make mistakes, and while efficiency (in the form of success rate) seems to be a better predictor of ongoing success, big plays create easy scores, and easy scores win games.
So that was the goal. I determined the points scored on every possession of every game and assigned those points to each play of the possession. From there I was able to assign a 'point value' to every yard line based on the average number of points you could expect to score from there. And with that I was able to assign an Equivalent Point Value (EqPt) to every play. [...]
I have since shifted more toward the idea of net points, which was of course gleaned in part from The Hidden Game of Football. I don't project far into the future, though -- I basically only look at the offense's expected points on a given drive (which includes a play at a given yard line) and the opponents' expected points on the next drive. That means that, instead of a scale that reaches from about 0.7 (the 1-yard line) to over 6.0 (the opponent's 1), the values stretch into the negative side of the ledger near your own goal line.
What's great about the EqPts idea is that you can break a game down and build it back up again through point values.
I still love this. When I create my weekly Advanced Box Scores at Football Study Hall, simply looking at play-by-play EqPts and the EqPt field position value of turnovers (where your drive was when you turned the ball over and where your opponent's ended up), I can frequently get projected scoring margins that are nearly dead on with the actual margins. (For example, last week's BYU-Nebraska game.)
Special teams are still a blind spot here, and if there's some sort of big return or special teams miscue, the projected margins can end up rather skewed. But this is an idea that still makes me happy.
I explored EqPt values in three different ways: 1) looking at yard line, 2) looking at down and yard line, 3) looking at down, distance and yard line. I will talk about (2) and (3) in the future -- that sort of point derivation has a purpose and gives you a lot of credit for passing downs success and go-to ability on third downs. But when tying EqPts to real points, looking at yard line was (unexpectedly) the method most directly attributable to real point totals. So that's what I am following for now.
Later in 2008, I would write a bit about what I was calling second- and third-level EqPts. Basically, it pinpointed teams that were a little too reliant on timely plays and weren't efficient enough, but I think we've come up with cleaner ways to do that since then.
It's important to note that, when correlated to wins and losses (another idea we'll explore deeply another week), PPP is more valuable than success rates (and is therefore more valuable than S&P in its current formulation).
A while back, I started tinkering with how to strip the efficiency piece from the explosiveness piece and found that this was not quite right -- efficiency matters most in the long run.
On an individual player basis, S&P can show flaws that simpler figures like Total Yards or Yards Per Carry/Catch would not. Before Mizzou played Arkansas in last year's Cotton Bowl, I was analyzing the Hogs' offense and found a chink in Darren McFadden's armor: While his 1,700-plus yards and 5.6 yards per carry were certainly impressive, his 0.81 S&P (keyed by a pedestrian-for-such-a-star-RB 46.8 percent success rate) was not. It would have ranked No. 13 among running backs in the Big 12. ... [H]is 0.34 PPP were rather unimpressive. And this is, in part, why a theoretically explosive offense like Arkansas ground to a halt against a lot of good defenses like Auburn (they lost 9-7) and Missouri (38-7). The offense was perceived as better than it actually was, and its low overall success rate (44.6 percent) and S&P (0.81, the same as stagnant Texas A&M) betrayed it throughout the season.
I've moved away from using S&P for individual rushing, passing, or receiving totals, but perhaps that's a mistake.
I say there are three main reasons why you get into sports statistics: 1) You want to understand the game better (whichever game that is); 2) you want to rank stuff; and 3) you want to predict stuff. I'm already very pleased with the progress of (1), and I'm rounding into shape with (2), though it's not been the highest priority of mine, but (3) is still a total mystery, as I have only one year of data at my disposal: 2007. By the end of the 2008 season, I believe I'll have much of both 2008 and 2006 rounded into shape, and that will be outstanding.
Thanks to both the passage of time and help from Marty Couvillan at CFB Stats and SportsSource Analytics, I'm now playing with 10 years of data. I think we've made pretty good progress on (3) now, too, though I think numbers have helped me with (1) above all else.
But for those of you who caught on to Football Outsiders later in the game and wished they'd have been able to watch (and take part in) some of the development as it was first happening, hop on. There are plenty of good seats still available on the College Football Stats Geek Bandwagon.
Seven years later ... still pretty true! Hop on!