How much do we tend to know after five weeks? Bill Connelly compares five-week data to full-season data to find out if we should be worried about TCU and Baylor.
05 Sep 2008
by Bill Connelly
1) If you grow up in a small Oklahoma town where your classmates worship Jamelle Holieway, Lydell Carr, The Boz, and the rest of the 1985 Oklahoma Sooners more than even Tony Dorsett and Danny White, you're probably going to develop an unhealthy obsession with college football (even if the love affair with OU doesn't stick).
2) If you grow up equally obsessed with both sports and numbers, you're probably eventually going to develop an unhealthy obsession with sabermetrics.
3) If you're unhealthily obsessed with both college football and sabermetrics, you're probably eventually going to spend most of a winter and spring entering 140,000 plays from 800 Division I football games into an Excel sheet (okay, a series of Excel sheets) just so you can go all sabermetric on college football.
As my life unfolds according to these truths, I find myself here, getting ready to explain any number of the ripe-for-dissection concepts I established as I slogged through play-by-play data from games like North Dakota State at Central Michigan (NDSU springs the upset in dominant fashion!), Georgia Southern at Colorado State (Rams win in this meeting of... uh... natural geographic rivals!), and West Virginia Tech at Western Kentucky (Hilltoppers recover from a 46-point loss to Florida with an EIGHTY-SEVEN-POINT WIN over not only an NAIA team, but an awful NAIA team!).
Much like a freshman in college, I got the itch to experiment when I moved out on my own. I'm a Mizzou guy, and I had posted on Mizzou message boards for years. That became unbearable, so I moved out and started a blog (Mizzou Sanity) that would get called up to the Sports Blog Nation big leagues a few months later (Rock M Nation). Trying to come up with original content, I started to compile box score data for Big 12 teams and correlate the major categories to wins and losses to see if there were any particular box score stats that were more important than others. When that didn't satisfy my burgeoning appetite, I started searching around for what other football stat nerds were doing, and naturally Football Outsiders was the first site I found. Success rates, line yards, value over average ... oh baby. I was intrigued. When I realized that play-by-play data was the holy grail (and that nobody else really had compiled play-by-play data for college in the way that I wanted), I got started, spending much of the summer of 2007 entering Big 12 play-by-plays.
Six months later, I'd managed to fine-tune my data entry process from 45 minutes per game to 8, and I was rolling. I had started with some of the basics -- success rates, line yards, along with things like percentage of success, percentage of first downs, passing downs vs. non-passing downs, what constitutes a passing down -- and kept expanding. As I discovered new cool things, I began posting (unsolicited) diaries on one of SBN's now-former flagship sites, Sunday Morning QB (now Dr. Saturday). A little bit of affirmation followed, and I started working double-time.
In the coming weeks, you'll learn about WinCorr, '+' Rankings, and other ideas I either created or stole from sabermetric concepts over the last 18 months or so. (In addition, over time we might adapt more FO concepts to college and vice versa.) For now, we'll start with my main metrics: EqPts, Points Per Play (PPP), and S&P.
I enjoyed having Jerome "2 carries, 3 yards, 2 TDs" Bettis on my fantasy team that last year of his career, but let's be honest: Just about anybody could have come in and plunged in from the 1. Getting the ball to the 1 was the much bigger accomplishment, no? In this way, touchdowns are basically the Runs Batted In of football -- get enough of them, and it's damn impressive, but you need quite a bit of help racking up a big number.
Case in point: November 10, College Park, Maryland. Tight end Jason Goode catches two touchdown passes (of ten and seven yards) from Chris Turner as Maryland upsets No. 8 Boston College, 42-35. Good for Mr. Goode. However, what contributed more to Maryland's touchdowns -- Goode's two receptions or the 43-yard catch by Darrius Hayward that set up the first score and the 45-yard catch by Isaiah Williams that set up the second? What if we could apply a point value to all four catches?
So that was the goal. I determined the points scored on every possession of every game and assigned those points to each play of the possession. From there I was able to assign a 'point value' to every yard line based on the average number of points you could expect to score from there. And with that I was able to assign an Equivalent Point Value (EqPt) to every play.
(Yes, I'm counting a touchdown as seven points instead of six. To me, it's not the kicker's job to make the PAT, it's his job not to miss it. I'm flexible on this, but that's how I'm viewing it for now.)
I should note that this is indeed slightly different from something FO came up with a few years ago. While they were looking at where the next points were coming from (which is actually a great idea I wish I'd stolen), I'm looking at how many points one could expect on a specific possession. Either way, what we find is that not every one-yard gain (or 10-yard, or 20-yard gain) is worth the same.
(Ed. Note: Actually, we stole it ourselves, from Hidden Game of Football.)
What's great about the EqPts idea is that you can break a game down and build it back up again through point values. Add an offense's EqPts to the value of the penalties, turnovers, and special teams events of the game, and you get a pretty damn accurate look at how the game should have gone down given average luck for both teams. We'll get to those other point values in another column, but for now let's look at what we can do with EqPts.
It stands to reason that the closer you get to the end zone, the higher the slope. Every yard you can gain once you get inside your opponent's 35-yard line or so increases the odds of at least getting a makeable field goal out of the possession. And naturally, the one yard between your opponent's 1 and the end zone is worth more than any other yard, as until you've actually punched it in, anything can happen. So Jason Goode's two touchdowns above were still worth quite a bit.
So if we look at plays based on points instead of yards, we're incorporating quite a few concepts at once: yards, scoring ability, the penchant for actually punching the ball into the end zone, even the ability to win the field position battle (gains deep in your field position aren't worth much at all). And we can pretty much use "EqPts Per _____" in all the same ways we'd use "Yards Per ___": Per game stats, per play stats, etc. It's an all-encompassing figure. Like Okkervil River's last album, or Clemson every year, it's got a lot of potential.
One final digression before I dive into some rankings: I explored EqPt values in three different ways: 1) looking at yard line, 2) looking at down and yard line, 3) looking at down, distance and yard line. I will talk about (2) and (3) in the future -- that sort of point derivation has a purpose and gives you a lot of credit for passing downs success and go-to ability on third downs. But when tying EqPts to real points, looking at yard line was (unexpectedly) the method most directly attributable to real point totals. So that's what I am following for now.
OK, so what differences do we see looking at EqPts Per Game versus Yards Per Game or Points Per Game?
|Top 10 NCAA Teams in EqPts Per Game|
|Team||EqPts per game||Rank||Yards per game rank||Points per game rank|
So Houston led the nation in useless yards, I guess, and Florida really knew how to punch the ball into the endzone (see: Superman Tebow).
The number one thing we can do is look at explosiveness through Points Per Play (PPP), the EqPts equivalent to yards per carry/catch/pass. This can be a great resource for individual players, but I'll stick with the team theme today. This is a perfect measure of how dangerous an offense is...
|Top 10 NCAA Teams in Offensive Points Per Play|
â€¦or how good a defense is at preventing easy scores.
|Top 10 NCAA Teams in Defensive Points Per Play|
(This does show that, while Tim Tebow's "20 rushing TD, 30 passing TD" accomplishment may not impress me tremendously -- if Chase Daniel didn't have a trustworthy goal-line running back, he could have almost done the same thing -- the offense at Tebow's helm was still ridiculously good, and he clearly deserved the Heisman. As a Mizzou fan, that's a big admission.)
One of the more common and telling stats in baseball is OPS (On-base percentage Plus Slugging average). It measures both efficiency/consistency and power/explosiveness. Well, if PPP equals explosiveness, what equals efficiency? Being that you're all readers of Football Outsiders, you should know that the answer to this is obvious: Success Rate. When I dove into play-by-plays, I wanted to make sure not to use others' tools too much because I wanted to see what I could come up with on my own. But the Success Rate is such a perfect tool that I had to apply it to the college game.
The first thing I had to do with Success Rate was adjust it for the college game. Yards are a bit easier to come by in college, so I made the following adjustments:
A lot of the offenses high on PPP are also strong in Success Rate (Florida, Texas Tech, Hawaii, Missouri, Oklahoma State, Tulsa, and Kansas were in the Top 10 in both categories), but there were a few new teams on the list: Navy (No. 4, 51.3 percent), Nebraska (No. 8, 48.4 percent) and Western Kentucky (No. 10, 48.3 percent). These teams had efficient offenses but were held back somewhat by their big-play ability, or lack thereof.
So if we have Success Rate playing the on-base percentage role and PPP as slugging average, then combining them would give us an OPS-style number, no?
|Top 10 NCAA Teams in Offensive S&P|
This way, teams like Navy, who eat a lot of clock between plays and would therefore never be as high in total yards as some hurry-up offenses, get credit for being as dangerous as they were last year.
It's important to note that, when correlated to wins and losses (another idea we'll explore deeply another week), PPP is more valuable than success rates (and is therefore more valuable than S&P in its current formulation). In the future I'll be looking to weight S&P more toward the points than the success rates. We're still in the gestation period here -- lots of development left. But this works for now.
On an individual player basis, S&P can show flaws that simpler figures like Total Yards or Yards Per Carry/Catch would not. Before Mizzou played Arkansas in last year's Cotton Bowl, I was analyzing the Hogs' offense and found a chink in Darren McFadden's armor: While his 1,700-plus yards and 5.6 yards per carry were certainly impressive, his 0.81 S&P (keyed by a pedestrian-for-such-a-star-RB 46.8 percent success rate) was not. It would have ranked No. 13 among running backs in the Big 12. To be sure, a lot of this was because defenses focused on him so much (his counterpart Felix Jones managed a 1.17 S&P due in part to preoccupation with McFadden), but his 0.34 PPP were rather unimpressive. And this is, in part, why a theoretically explosive offense like Arkansas ground to a halt against a lot of good defenses like Auburn (they lost 9-7) and Missouri (38-7). The offense was perceived as better than it actually was, and its low overall success rate (44.6 percent) and S&P (0.81, the same as stagnant Texas A&M) betrayed it throughout the season.
In other words, there's a reason why Darren McFadden didn't deserve the Heisman, no matter how good he looks in the open field. It wasn't his fault that the Arkansas passing game was four shades of terrible, and it wasn't his fault that defenses did whatever they could to force other players to beat them, but you have to be more than simply a threat to win the Heisman. You have to still produce as well. And while McFadden obviously produced quite a bit (LSU and South Carolina come to mind), he was held in check a few too many times.
College football play-by-play stats are a bit limited because gathering film of every D1 game would be next to impossible. Plus, there are simply way more games than there are in the NFL. So as much as I'd love to track the success rates of rushing plays aimed at specific defensive linemen, and as much as I'd be able to come up with a "This linebacker made a play 13.4 percent of the time he was on the field" measurement, it's not going to happen. But from a simple line like this...
(1st and 10) Daniel, Chase pass complete to Maclin, Jeremy for 5 yards to the ISU8 (Rubin,Ahtyba;Parker,Rashawn).
... much can still be gleaned.
I say there are three main reasons why you get into sports statistics: 1) You want to understand the game better (whichever game that is); 2) you want to rank stuff; and 3) you want to predict stuff. I'm already very pleased with the progress of (1), and I'm rounding into shape with (2), though it's not been the highest priority of mine, but (3) is still a total mystery, as I have only one year of data at my disposal: 2007. By the end of the 2008 season, I believe I'll have much of both 2008 and 2006 rounded into shape, and that will be outstanding. But for those of you who caught on to Football Outsiders later in the game and wished they'd have been able to watch (and take part in) some of the development as it was first happening, hop on. There are plenty of good seats still available on the College Football Stats Geek Bandwagon.
9 comments, Last at 08 Sep 2008, 9:41am by zlionsfan