From Ray Rice to Russell Wilson to Matt Millen to DeAngelo Williams, it was a month of regret over past mistakes around the NFL.
29 Oct 2010
by Bill Connelly
I thought it would be fun to start an e-mail exchange with Ken Pomeroy of College Basketball Prospectus and KenPom.com fame this week. What follows is the conversation we had over the course of a couple of days.
Bill Connelly: So you decide tomorrow to create your own football ratings system. Where do you start? And why do you do it? Do you feel standard stats in football are as lacking as they sometimes are in basketball?
Ken Pomeroy: In many ways, there's more potential in football than hoops. At least on the team level. In basketball, we're pretty much limited to evaluating scoring plays. If a team scores frequently, we assume its has a good offense. Over the course of a full season, this a good assumption, but over the course of a game, or a half, this may not be true. A team could have taken horrible shots but gotten lucky and made a lot of them, and it is impossible to determine that from the play-by-play. In football, you can at least look at things like total yards or yards per play after a game and get a sense of who won the line of scrimmage and how that matches up with the final score. I think you can tease out a lot more of the luck from an individual football game than from a basketball game.
So I'm envious of the additional granularity in football play-by-play data, and exploiting that data would be where I would start. Then, the question is how to measure what an offense or defense is trying to do. I think the main things there are accounting for strength of the opponent and field position. If I was going to start parsing play by play data, the first thing I would look at is percent of possible yards gained/allowed. If a team starts on its own 10 and gains 45 yards before the drive ends, then they gained 50 percent of their possible yards. Your "success rate" seems to be a more advanced version of that concept (which is part of the reason why I haven't spent my time diving into this).
And if you're asking why I, Ken Pomeroy (and not some generic spreadsheet lover) would do it, it's because I would snap after hearing one too many analysts draw erroneous conclusions by using bad stats. This almost happened one day in 2008. If there was another me that wasn't already consumed by basketball, he would have started writing code that day.
Connelly: My breaking point came in reading one too many preseason mags saying "(Random Star Linebacker/Safety) from (Random Bad Defense) is an All-American candidate because he had 175 tackles last season." I just kept yelling "Yeah, but how many were good tackles, and how many were just 'somebody had to make the tackle 17 yards downfield' tackles?" at the pages, but the words never changed. So I decided to do something about it. I discovered FO's Success Rate measure for the NFL, which led to me entering play-by-play data for some teams, which led to me entering play-by-play data for ALL teams, which led to me creating my EqPts measure, which ... so on, and so forth. When I started to realize that nobody else was apparently dumb enough to dive into the 800+ FBS games for a given season, I sensed an opportunity for an audience, and I worked as fast as I could.
You referenced accounting for strength of opponent. On a basic level, the way I personally do this is to compare a given team's output with the opponent's averages, then adding a multiplier related to the teams against whom the opponent derived those averages. This wasn't intentional, but it is somewhat similar in intent to Jerry Palm's "your strength + your opponents' strength + your opponents' opponents' strength" method. In some form or another, we all compare output to expected output, but what do you see as some of the landmines involved with determining and applying a strength of schedule adjustment?
Pomeroy: When I am thinking about schedule-related adjustments, I want the adjustments to be as independent as possible from the team I'm adjusting. What I am saying is that if TCU truly had the best defense in the country, I would not want the fact that they played quite a few bad offenses to prevent them from attaining a top rating. That's where determining expected performance gets tricky with the pool of wildly different teams that exists at the collegiate level.
There are a few defenses that would be capable of shutting out Wyoming and Colorado State as TCU did. How do we account for that? And if TCU gives up a touchdown or two to those teams should that matter that much when their offense is scoring at will? I think you have to ignore possessions that don't impact the outcome of the game, but for teams like TCU and Boise, that may mean ignoring 75 percent of the season, which is difficult to accept if one wants an accurate evaluation of a team. Fortunately on the hoops side we have quite a few more games to work with. Now granted, I haven't incorporated such features into my system yet. It's only been in the last six months that I've started calibrating win probabilities for such a purpose. But it seems like the sensible way to go.
Connelly: Brian Fremeau and I both have our own ways of deciding what plays/drives should count toward the ratings. I just opened up my criteria a bit for S&P+. Right now, any Q1 play that takes place when the game is within 28 points (so, most of them) count. Any Q2 play within 24 points, any Q3 play within 21 points, and any Q4 play with 16 points also count. This means that a team is punished, in a way, for not taking care of business quickly against bad teams.
I discovered the need to look into some slight adjustments based on what seemed like some odd ratings results. Even post-adjustment, Oregon still ranks only 29th in the current S&P+ rankings. As you have tinkered with your formulas and approaches over the years, how have you tended to react to situations like this, where a team or small handful of teams just doesn't seem right? And what is the most egregious example you can recall, where a team ranked strangely high or low in your basketball rankings?
Pomeroy: I can challenge the Oregon case. In 2006, Gonzaga was thought to be a Top 10, maybe even a Top 5 team by the experts, and they were ranked in the 40s and 50s most of the season in my ratings. Even the casual fan remembers the scene with Adam Morrison crying on the court (actually before the game was completely decided) after Gonzaga lost in the Sweet 16 to UCLA in what could only be described as an epic collapse after the Zags dominated the Bruins for 38 minutes. At that point, I was crying, too. At least on the inside, because Gonzaga's run revealed a fatal error in my system.
There's strong evidence that in college basketball, there is little fundamental difference between a one-point loss and a one-point win when it comes to indicating a team's strength relative to its opponent. Therefore, my system doesn't treat those outcomes much differently. Gonzaga was different though -- they repeatedly coasted against weaker competition only to pull out a close win late. Normally, the system sees this as luck, but in Gonzaga's case it probably wasn't. The thing is, I have not changed my system since then. Gonzaga was a tremendously interesting exception, but an exception nonetheless. Every tweak I made in the offseason to put Gonzaga in its rightful place made the system as a whole worse. That's the thing about making tweaks -- I always rerun the system on past seasons, and when I did that with Gonzaga changes, it made the Zags predictions better, but the predictions were worse for all other games.
The thing about cases like that is that they are great learning experiences. It forces you to examine what's different about that team from others with similar profiles like 2003 Dayton and 2010 New Mexico, who also were extremely successful in close games, but who appeared to truly benefit from randomness in those instances much more that '06 Gonzaga. There are opposite cases, too, like '08 Gonzaga and '10 BYU, who both bludgeoned mediocre opponents repeatedly, which is normally an indication of a strong team, but both were significantly overrated by my system. Anyway, the point is that I don't cry about this stuff anymore. I look at my system as providing a framework for understanding the game better, and there's often interesting stuff to be learned from the outliers, provided there aren't many of them.
Connelly: In the end, I felt comfortable making the "close game" change since I had been thinking of doing that anyway. If somehow Oregon wins the national title and ends up 19th in S&P+, however, clearly I will need to investigate my approach further, or whether Oregon was just an all-time statistical oddity. With their down-the-stretch strength of schedule, I am still banking on them being just fine (if they stay undefeated), but we shall see.
One of the most useful aspects of your site during the season is the win probability data on each team's schedule page. It's a great way to monitor what is coming up and which way your team and its future opponents are moving. Brian did something similar in this year's Football Outsiders Almanac. I have a question, however, about momentum.
Atop every ratings page you link to an explanation of your data, in which you explain that you designed your system to be predictive. When I first started, I was coming from more of an evaluative standpoint, but over time I have realized that evaluation and prediction are really quite close to the same thing. To me, the only difference is that while evaluation will absolutely need to look at a full season's data, prediction might need to take momentum into account in one way or another. What kind of role, if any, do you feel momentum plays throughout the course of a season? Picks are derived from full-season data, but if a team gets progressively better or worse over the course of a season, then once the picks are off, they are likely going to continue to be off as time goes on.
Pomeroy: As long as we are defining momentum as a team changing its ability level and not a team gaining some magical feeling of invincibility from a big win, then yes, it's pretty important to account for that. But people get fooled by thinking they see momentum all the time. Last season in my sport, we had a team, South Florida, that looked awful for the first half of January, then turned around and looked fantastic for the last half of January. In reality, the Bulls' true ability was about halfway in between. But reporters have to write stories and TV guys have to talk about stuff and so people were finding reasons for the turnaround, and of course almost all of those reasons were nonsense. South Florida just happened to string a couple of their best performances together consecutively. It happens. Football fans can see what happened to Texas after beating Nebraska or South Carolina after beating Alabama as counterexamples to momentum being real.
But if you are going to do predictions, it stands to reason that more recent information is better information. So I do give each successive game 3 percent more weight for each team. And that goes back to running the system on past seasons and seeing what figure works best. Some teams will outpace that, but it's difficult to tell the South Floridas from the teams that are truly improving/declining over the course of a few weeks. And cranking that figure higher gets into dangerous territory because the last half of the season is almost exclusively conference games, and I need the non-conference games to have some weight in order to properly calibrate the relative strength of each conference. For example, I've tried running the ratings on just the last two months of the season and I get some seriously goofy output because the schedule is so incestuous over that time. So in the spirit of getting as much information as possible into a ratings system, I think it's fairly important to include data from early season games into the mix in some way.
Connelly: And when it comes to football, you only have 12-14 games to work with, so you can't get rid of games anyway. The main problem I've run into when it comes to momentum is simply that you're playing once a week. To a certain degree, I almost think that you lose most of your momentum, good or bad, by the time the next Saturday rolls around. (In other words, I clearly haven't made much progress here yet.)
All right, rapid fire to finish!
1. What sport do you feel has the most untapped statistical potential? (It has to be tennis or golf, right?)
2. It's January 10, 2011. What are you watching: the BCS championship game, or Notre Dame-Marquette basketball on ESPN2?
3. What's your favorite sports blog (any sport)?
4. How many hours of work per week would you say you put in during college basketball season?
5. Who's the best team in college football right now?
Pomeroy: 1. I started some work on golf last year, but it's collected dust since then. Unfortunately, the PGA Tour is not very friendly with distributing its vast collection of data to the masses. And I was not being fully truthful when I said if there was another me, I would have begun work on college football. It would be a toss-up between that and golf. There's a lot of work to be done there.
2. This is actually great news. Usually college hoops clears its slate for the title game, so I'm glad to see there's an interesting matchup as an alternative. I'll probably monitor both games. I'm guessing the title game will finish after midnight, so there will be room to safely catch both if need be. I'm not going to get stressed.
3. Brian Cook at MGoBlog will always have a place in my heart for his screed on Dave Berri. If only he wouldn't talk about Michigan so much.
4. It's pretty much whatever spare time I have. It's usually 20-30 hours per week.
5. I'll play contrarian and take Alabama even with a loss. I still think they are better than anybody. Not so much better that they can't lose to good teams on the road, of course. The fun thing is that there are 7-8 teams at the top that aren't separated by much, and each of them have looked beatable at some point.
As a reminder, you can see Pomeroy's work at KenPom.com. His College Basketball Prospectus 2010-11 will be available sometime next week. It is essential reading for those with even a passing interest in basketball.
And I'm not going to lie. Put a gun to my head, and I'm probably picking Alabama too.
This week's official F/+ rankings took into account both some interesting FEI and S&P+ shifts and the slight S&P+ formula change I made this week. A two-loss team is no longer No.1!
You can now see the full 1-120 F/+ ratings each on Football Outsiders' stats pages by clicking here.
|F/+ Top 25 (After Eight Weeks)|
|2||Boise State (6-0)||+26.8%||3||+1||256.9||2||.251||4|
|3||South Carolina (5-2)||+25.5%||1||-2||251.9||3||.251||3|
|5||Ohio State (7-1)||+24.7%||2||-3||258.2||1||.204||17|
|9||Virginia Tech (6-2)||+20.6%||12||+3||238.2||11||.221||9|
|F/+ Top 25 (After Eight Weeks)|
|13||Michigan State (8-0)||+19.0%||9||-4||231.8||17||.220||10|
|19||Oregon State (3-3)||+16.6%||16||-3||229.8||22||.182||20|
|20||Oklahoma State (6-1)||+15.0%||22||+2||233.7||13||.132||27|
|F/+ Top 25 (After Eight Weeks)|
26. N.C. State (5-2), 27. Florida State (6-1), 28. Illinois (4-3), 29. Hawaii (6-2), 30. Clemson (4-3), 31. California (4-3), 32. Michigan (5-2), 33. Kentucky (4-4), 34. Mississippi State (6-2), 35. North Carolina (4-3), 36. Nevada (6-1), 37. Florida (4-3), 38. West Virginia (5-2), 39. Central Florida (5-2), 40. Navy (5-2), 41. Texas A&M (4-3), 42. Arizona State (3-4), 43. Notre Dame (4-4), 44. Washington (3-4), Cincinnati (3-4), 46. East Carolina (5-2), 47. Texas (4-3), 48. Ole MIss (3-4), 49. South Florida (4-3), 50. Colorado (3-4).
It is fun that one system (FEI) doesn't count Virginia Tech's loss to James Madison, and the other one (S&P+) does. The Hokies rank almost the same in each.
Well, hello there, Georgia. Long time, no see. Tight losses to two F/+ Top 25 teams (South Carolina and Arkansas), combined with a recent hot streak (average score of their last three SEC games: Georgia 42.7, Opponent 15.0), make the Bulldogs a bit of a hot commodity. Can they continue their momentum against Florida this weekend?
Once again, here is how the BCS standings would look if they were made up of 60 percent AP Poll, 40 percent F/+ rankings.
1. Boise State
7. Ohio State
8. Michigan State
10. South Carolina
We have reached such an odd state of reality that, at the current moment, our computer rankings help Boise State in the standings.
USC over Oregon. Spread: USC +7 | F/+ Projection: USC by 4.7. We have reached "put up or shut up" time in this ongoing spat between Oregon and the S&P+ ratings. Either the Ducks start to prove their bona fides in a way that even the most curmudgeonly ratings system appreciates, or they start to slump.
Missouri over Nebraska. Spread: Nebraska -7.5 | F/+ Projection: Nebraska by 0.5. The stats have had a good read on Missouri the last couple of weeks, though Nebraska has been all over the place. This was a great rivalry, then a nothing rivalry, and now it is becoming great again. It's a shame the series has to end right now.
Kentucky over Mississippi State. Spread: Mississippi State -6.5 | F/+ Projection: Mississippi State by 3.0. While the SEC East may not have a true national power this season, the conference's depth shines through with the fact that a solid Kentucky squad is actually one of the teams who have been all but eliminated from SEC East contention. The Wildcats are just a couple of plays from being 6-2 right now. Meanwhile, Mississippi State has won five in a row. This is an overlooked but intriguing matchup.
Indiana over Northwestern. Spread: Northwestern -3 | F/+ Projection: Northwestern by 2.9. This says much more about the numbers' lack of appreciation for Northwestern than it does about any respect for Indiana.
Because it's Homecoming season around college football ...
"A Sort of Homecoming," by U2
"Home," by Fighting Gravity
"Home," by John Popper
"Home Again!" by Menahan Street Band
"Home Is Where the Hatred Is," by Gil Scott-Heron
"Homecoming," by Green Day
"Homecoming," by Kanye West
"Homecoming King," by Guster
"Homeland and Hip Hop," by Immortal Technique
"Homeward Bound," by Simon & Garfunkel
Perhaps the first time in history that Guster, Immortal Technique, and Simon & Garfunkel have appeared in succession.
Every evening when I drive home from work, I pass by the Missouri practice fields. Depending on the time, I often drive by as practice is going on. I get a bit scared just looking at the guys manning the cameras in the hydraulic lift, even on a still day. I never truly thought somebody would ever suffer a fatal accident, though. It just didn't seem like something that could actually happen. I ache for the Notre Dame family trying to find its way after the death of Declan Sullivan. That is a very cruel way for a young man to have to die.
12 comments, Last at 31 Oct 2010, 11:33pm by Jeff Fogle