Writers of Pro Football Prospectus 2008

16 Dec 2010

Varsity Numbers Cools Off

by Bill Connelly

In analyzing bowl matchups, every one of us does the same thing: We think of teams as they were at the end of the season instead of as they were for the whole season. In terms of injuries or defections, this makes sense. If the personnel is different at the end of the season than it was earlier on, then you have to take that into account. But I have long wondered if we overstate end-of-season momentum. With a break of anywhere between three and six weeks before a team's bowl game, does momentum really play any part in bowl performance, or do all teams return to room temperature? Do hot teams cool off over the December breaks while cold teams warm back up?

The original intent with this column was to determine if momentum is maintained over the holiday break. The result, however, did not exactly match the premise. It turns out that everybody cools off, whether they were hot or not. Bowl breaks typically produce relatively bad football.

Using single-game S&P+, I looked at all bowl teams from 2005-2009 to gauge how they performed on a week-to-week basis. I looked at their overall performance, and I looked how they did in three sections of the season: early (first five weeks), middle (middle four to five weeks) and late (final five weeks). Did each team's bowl performance more closely resemble its performance as a whole, or its performance in the late portion of the season?

Late performance wins by a nose, but when talking about teams' bowl performances, nobody really wins.

Median Performance Change in Bowl Game, According to Single-Game S&P+ (2005-09)
Off. S&P+ Def. S&P+ Overall S&P+
Change from Late-Season Performance -4.7% -12.5% -8.9%
Change from Full-Season Performance -6.9% -14.9% -9.0%

Now is a good time for a reminder that S&P+ has no sort of zero-sum effect. It is entirely based on opponent averages, so both teams can achieve above (>200.0) or below the norm in a given game. And, as it turns out, both teams can improve or regress at the same time.

According to single-game S&P+ performances, both offenses and defenses regress after the bowl break. Surprisingly, defenses cool off even more. I long supposed that offenses were more vulnerable to long breaks, but as usual, my inclinations were apparently incorrect.

Let's dig a little deeper into the offensive and defensive numbers to see where regression is most common.

Median Offensive Performance Change in Bowl Game (2005-09)
Rushing
Success
Rate+
Rushing
PPP+
Passing
Success
Rate+
Passing
PPP+
Change from Late-Season Performance -6.1% -9.7% -3.5% -11.2%
Change from Full-Season Performance -3.2% -10.8% -4.0% -6.6%

While offenses regress at a small rate in terms of efficiency, their explosiveness gets hampered quite a bit by the bowl layoff. It is the same story with defense, only magnified.

Median Defensive Performance Change in Bowl Game (2005-09)
Rushing
Success
Rate+
Rushing
PPP+
Passing
Success
Rate+
Passing
PPP+
Change from Late-Season Performance -9.6% -17.8% -9.3% -20.3%
Change from Full-Season Performance -12.0% -20.1% -12.0% -22.5%

After a long break, everybody is dumbed down a bit. Explosive (i.e. high-PPP) offenses find it tougher to make big plays, while even excellent defenses are more vulnerable to big plays themselves. In other words, it gets sloppy, and the big plays become much more random.

This makes sense, of course. They say weather is the great equalizer, but really, early December is. Players are studying (in theory) for finals, and when they're not, they're probably a bit bored. Plus, Team A is really excited to be playing in the Random Bank Sponsor Bowl while Team B is disappointed that it was disrespected and passed over for the Regional Fast Food Restaurant Bowl in favor of a rival. You never know who is going to handle the break well and who isn't. And sloppiness is, on average, the result.

There's a reason why the AP used to determine its final rankings before bowl season. Bowls are poor judges of a team's quality, but we put more weight into teams' performances -- both for final poll results and next season's expectations -- now than ever. I'm not proposing that we go back to the old way, of course; there is already ridiculous outcry when a Heisman winner struggles in his bowl game and people clamor for a different winner. Plus, I typically lean toward whatever results in a larger sample size for evaluation. There are just drawbacks to doing it, is all. Bowl results typically muddy the evaluation waters more than they help clear them.

If you are a fan who is always looking for the latest reason to say, "See? This is why we need a playoff," this might or might not give you ammunition. With a big, 16-team playoff, the layoffs between games would be minimal. You could have the first two rounds take place in December, with the semifinals and finals taking place after a layoff of just a week or two. That could minimize sloppiness. Of course, with a Plus One playoff or a six- or eight-team bracket, you would not really be able to avoid a layoff.

Possible Bowl Projections?

What would happen if we used the information above -- offenses regressing by five to 10 percent from their late-season performance, defenses regressing by 10 to 15 percent -- and applied it to bowl predictions? We'd get some creative results. If I were to use these numbers to make bowl predictions instead of the F/+ ratings that, let's face it, are almost guaranteed to hover right around .500 (I can't wait to make offseason adjustments to the predictions process), there would be some pretty eye-popping projections.

  • BYU over UTEP by 35.9
  • Northern Illinois over Fresno State by 40.7
  • Nebraska over Washington by 1.4
  • Arkansas over Ohio State by 12.7
  • Boise State over Utah by 51.5!
  • Auburn over Oregon by 17.7

For all we know, things will unfold exactly as projected there. But a) we are not far enough along in this data to determine the difference between two- or three-week breaks and the six-week breaks, and more importantly, b) the extreme outliers in the data are staggering and scare me off from making any confident projections with this information.

The median performance for bowl teams might be an 8.9-percent S&P+ regression from their late-season play, but the standard deviation is enormous. Of the 320 teams that played in bowl games between 2005 and 2009, 40 regressed by 30 percent or more in their bowl game, and 92 regressed by 20 percent or more. Meanwhile, 30 improved by 30 percent or more while 48 improved by 20 percent or more. For every game that goes according to plan, there will be at least one utterly baffling result. This is college football, of course, and there are always baffling results, but the bowl season draws them out at a much higher level.

If anything else, these numbers could be used to look at which games might be played at the highest (i.e. highest combined S&P+) and lowest level of performance:

  • Highest Projected Quality of Play: Wisconsin-TCU, Stanford-Virginia Tech, Oregon-Auburn, Alabama-Michigan State, Georgia-UCF, South Carolina-Florida State, Navy-San Diego State.
  • Lowest Projected Quality of Play: Army-SMU, South Florida-Clemson, Illinois-Baylor, Kansas State-Syracuse, Ohio-Troy, Northwestern-Texas Tech, Mississippi State-Michigan.

Bowl Overachievers and Underachievers

Since we have this single-game performance data, let's look at which teams have typically overachieved or underachieved during bowl season. Below are teams who have played in at least three bowls in the last five seasons (not a very high standard these days, considering a staggering 81 of 120 FBS teams have played in at least two in that span).

Team Bowls (Since 2005) Avg. S&P+
Rutgers 5 268.0
LSU 5 267.2
Florida 5 266.6
USC 5 253.3
Tulsa 4 252.8
TCU 5 252.6
Kansas 3 243.7
Nebraska 4 243.2
Georgia 5 240.1
Utah 5 233.7
Iowa 4 230.3
Oklahoma State 4 229.1
Wake Forest 3 228.8
Connecticut 3 228.7
California 5 228.5
BYU 5 227.9
Southern Miss 5 222.5
Missouri 5 221.8
South Florida 5 221.2
West Virginia 5 216.9

When it comes to bowl performances, three teams stand ahead of the pack: LSU, Florida, and ... Rutgers? Say this for Greg Schiano and his staff: They know how to motivate their team through the bowl layoff. The same goes for Tulsa's Todd Graham and his predecessor, Steve Kragthorpe. Not that this helps Schiano and his 4-8 Scarlet Knights this time around.

If you are looking for another game to add to the Watch List, Iowa and Missouri are the only two teams on the list above that are facing off this bowl season -- they play in the Insight Bowl on December 28. They have both had the same coach during the entire 2005-09 span, and aside from Missouri's Texas Bowl outlier last season (they were whipped by Navy), these two teams play at a pretty high level each December/January.

So which teams are most pre-disposed to getting fat and falling out of performance shape over the holidays?

Team Bowls (Since 2005) Avg. S&P+
Northern Illinois 3 153.3
Georgia Tech 5 161.6
Nevada 5 166.5
Memphis 3 173.7
Arizona State 3 173.7
Central Florida 3 175.1
South Carolina 4 175.7
Tennessee 3 177.0
Miami 4 177.8
Cincinnati 4 182.6

Thankfully, none of these teams play each other.

So what about the BCS Championship Game participants? In four bowl games, Auburn has averaged a semi-miserable 186.3 single-game S&P+, barely missing out on the bottom 10. In five bowls, Oregon has played at a thoroughly mediocre 204.8 level. This means next to nothing of course -- both teams had different coaching staffs for three of the five years in the sample, and neither had their current starting quarterbacks -- but it was interesting.

Playlist

Since we are all on break from college football right now (until Saturday) ...

"Break," by Jurassic 5
"Break 'Em On Down," by Big Joe Williams
"Break Free," by Dave Matthews Band
"Break My Heart," by Common
"Break This Time," by Alejandro Escovedo
"Break You Off," by The Roots
"Break Your Heart," by Barenaked Ladies
"Breakdown," by Handsome Boy Modeling School (or Guns N' Roses, or Mos Def, or Tom Petty & The Heartbreakers)
"Breakerfall," by Pearl Jam
"Breakout," by N*E*R*D

Posted by: Bill Connelly on 16 Dec 2010

15 comments, Last at 17 Dec 2010, 1:29am by Jeff Fogle

Comments

2
by Kevin from Philly :: Thu, 12/16/2010 - 1:25pm

"Jailbreak" by Thin Lizzie?

"These are the Breaks" by Kurtis Blow.

7
by Bill Connelly :: Thu, 12/16/2010 - 2:45pm

Ugh, I feel like less of a music snob for leaving off Kurtis Blow. That is unforgivable.

1
by Joseph :: Thu, 12/16/2010 - 1:22pm

What was Ohio St's #? Since LSU and Florida beat them handily, it doesn't surprise me that they are high.

8
by Bill Connelly :: Thu, 12/16/2010 - 2:47pm

Ohio State has averaged a 194.9 over the past five seasons, good for 42nd out of the 63 teams who have played at least three bowls since 2005. Just ahead of Minnesota and Texas Tech, just behind UCLA and Air Force.

3
by Chappy (not verified) :: Thu, 12/16/2010 - 1:28pm

Dumb question: how much of the decline in team play is simply due to the fact that bowls pit generally higher quality teams against each other. Don't most teams play a bowl opponent that are above their average opponent?

I guess my point is that it seems fair to see if late-season or full season performance is more predictive of bowl performance, but it seems like your reasoning for the factors behind poor bowl play might be a bit speculative. Maybe I don't understand the metric or the comparison you are making?

5
by Bill Connelly :: Thu, 12/16/2010 - 2:44pm

The goal behind the S&P+ metric is that you would play at a certain level no matter who you played. If you posted a 250.0 game against Ohio State, you would have played at a 250.0 level against Ohio too (but you'd have likely had much more good fortune on the scoreboard). It is forever being further calibrated to match that sentiment, but that is the goal. The teams responsible for the best bowl game performances did so against a mix of big and small competition. Missouri beating Arkansas by 31 in 2007 made the list, as did Nebraska beating Arizona in 2009. But so did TCU beating Northern Illinois by 30 in 2006 and Tulsa beating Bowling Green by 56 in 2007.

And beyond that, to the extent that there is a rise in level of competition, this isn't necessarily the case when I isolate late-season play. You're playing teams from your own conference late in the year, and you're often playing pretty damn good teams or rivals. (And if you're a MAC or Sun Belt team, the odds are decent that you're playing a team of that caliber in the bowl too, since their bowl slots aren't very impressive.) So there is not a huge change in level of competition overall, but there is still a dropoff in overall performance. Hope that helps.

4
by TimTheEnchanter (not verified) :: Thu, 12/16/2010 - 1:42pm

I had two immediate thoughts when going through this

First and foremost, you seem to be attributing the fact that play declines to the layoff. It is entirely possible that the decline is due to the fact that they are exhibition games rather than the layoff itself. The urgency of the regular season (Every game matters! [rolls eyes] ) in positioning for bowls, conference standings etc., no longer applies. You either win or lose a bowl game and nothing else really depends on it. I wonder if the championship games have suffered similar dropoffs, but they would be too small a sample size to draw conclusions.

Second, my thought was along the lines of Chappy's comment. Is there possibly some bias to the measuring system such that when teams with winning records (often similar winning records) play each other the ratings tend to be lower than average (thus implying that the ratings would be higher when there is more of a competetive mismatch or both teams are bad). Not sure why this would be, but it could be a correlated factor when looking strictly at bowl games.

6
by Bill Connelly :: Thu, 12/16/2010 - 2:45pm

I glossed over the motivation factor, but it's certainly real. There's just no good way of isolating which teams played poorly because they were rusty, which didn't care, etc. I lumped it all under "the bowl layoff" because of that.

11
by Jeff Fogle :: Thu, 12/16/2010 - 3:47pm

There might be a way to do this using common sense and Vegas spreads...though I'll grant that's not going to be perfect. Might qualify for "good way" though.

Let's say we assume that "rustiness" is an influence, but not a killer influence. You're not at full speed...but you're not helpless either. And, maybe rustiness goes away by the second quarter once everyone has their sea legs back.

Then, we assume that teams who miss expectations BY A MILE didn't fail because of rustiness (because rustiness is worth only a tenth of a mile or something), but did so because they didn't care. We could exclude those "didn't care" games from the overall sample, and maybe get a better sense of what rustiness means.

I was going to use double digit covers as a test run...but there were actually a bunch of those last year. And, Ohio State's 14-point cover over Oregon in the Rose Bowl obviously wasn't because Oregon didn't care about the Rose Bowl. TCU failed to cover by 14 vs. Boise, and it wasn't because they didn't care about the Fiesta Bowl.

I'm going to use a threshold of 17 points for a favorite missing the spread in a loss, but lift it to 20 points for a dog missing the spread (because some favorites can post powerful performances that isn't related to the dog not caring). And, I'll allow myself a "stopped caring so much when they fell behind" out for a game like Arizona/Nebraska where you can't assume Arizona didn't care...but you can't attribute their blowout loss to rustiness either. I think there's possibly a "cared for awhile but then threw in the towel" sentiment that does sometimes happen in bowl games too.

Anyway...

FAVORITES FAILING BY 17 OR MORE
Fresno State -10 lost to Wyoming by 7 (OT, though, so not a regulation qualifier)
Oregon State -3 lost to BYU by 24
Nevada -11 lost to SMU by 35
Houston -4 lost to Air Force by 27
Missouri -6 lost to Navy by 22
South Carolina -4 lost to Connecticut by 13

UNDERDOGS FAILING BY 20 OR MORE
Arizona +2 lost to Nebraska by 33

Arizona was the only dog that missed by 20 or more. We have six favorites who missed by 17 or more, five of which were in regulation. And, I have to say...that's a pretty good list from my memory of last year where I was thinking, "Man, these guys don't even care." Particularly Oregon State, Nevada, Missouri, and SC.

Covers.com has an easy to navigate scoreboard calendar where you can go back over the last few years. They even show cover margin within the game box on the page so it's easy to find the qualifiers.

Something to think about. Maybe when you take out the disasters, the scope of rustiness changes in a meaningful way (or mostly disappears). Maybe not.

PS: I'll still formally protest use of the word "momentum" in all stathead discussions! (lol).

13
by Bill Connelly :: Thu, 12/16/2010 - 4:06pm

A quick objection: I (obviously) watched every second of Missouri-Navy last year, and the problem wasn't even remotely that they didn't care. It's that their gameplan was a massive failure, and all their guys were overthinking and reacting to the flexbone. That's the main problem with generalizing -- even when you could quantify things in a certain way (Mizzou was a big favorite and they lost big, therefore they didn't care), it's so subjective and possibly incorrect. I was tossing around ways to do this the last couple of nights and couldn't come up with anything I really trusted. You're certainly on the right path ... but it just wasn't quite enough for me to pursue it too far.

14
by Jeff Fogle :: Thu, 12/16/2010 - 7:05pm

Maybe we can change the phrasing to "not excited" instead of "didn't care." Or, "not enthused enough to bear down when faced with adversity" instead of "didn't care."

Quotes from ESPN's game summary:

"Ricky Dobbs listened to the chatter from Missouri's defense and knew Navy had the Tigers beat."

"Dobbs sensed that Navy was going to take down another heavyweight when he heard the frustrated Tigers complaining about the Midshipmen's low blocks."

"'You could hear them talking all the time, 'Hey, who's trying to end my career?' and stuff like that,'" Dobbs said.

On the other side of the ball, the offense scored just 13 points. After the superstar receiver broke a big play for a TD on the second play of the game (the only TD Missouri would score), the 25th ranked passing offense in S+P last year...facing the 70th ranked passing defense in S+P last year...went punt, punt, punt, punt, fumble, field goal, interception, field goal, turnover on downs, interception.

We change "didn't care" to "not enthusiastic," or "got discouraged after things didn't go there way," and then we can look at the numbers.

Can you at least glance at the S+P scores from the games listed above...see if the negatives were SO bad from the losers that they weren't counteracted by the positives of the winners (maybe Navy played a little better than normal, but Missouri played A LOT worse than normal)...and compare that to the full slate from last year. That could at least rule out concerns that the outlier games are creating some of the illusion of universal rustiness. Or, it could back up the notion.

The article said EVERYBODY gets rusty on both sides of the ball. (You put the word in italics...but I'm not sure how to do that so I used all caps, lol--edit to cut and paste your sentence, "It turns out that everybody cools off"). Seeing the outlier games would help us judge what kind of impact those had last year. If nothing's there, end of pathway. If there is, I can go dig up the games from other recent years and see if outliers were overly influencing the assessment.

Or, maybe pursue something like:
10% of teams were WAY below norms
60% are a little below norms
30% are above norms

Then we get "most teams are rusty, a third are fine, and a few are disasters" in a way that's more accurate than "everybody" gets rusty. And maybe there are insights that can be drawn about certain coaches in one direction or another.

9
by DrewB (not verified) :: Thu, 12/16/2010 - 2:52pm

While none of the worst performing teams according to S&P+ play each other, Notre Dame's new coach was the coach of Cincinnati the majority of the time since 2005. Notre Dame v Miami in the Sun Bowl could have the potential to be the sloppiest game. Where did that matchup fall in the projected quality of play?

10
by Bill Connelly :: Thu, 12/16/2010 - 3:18pm

That one's actually right in the middle.

It would probably make sense if I just posted the whole list:

1. Wisconsin-TCU (Combined Proj. S&P+: 521.0)
2. Stanford-Va Tech (476.9)
3. Oregon-Auburn (449.7)
4. Alabama-Michigan State (441.8)
5. Georgia-UCF (436.5)
6. South Carolina-Florida State (435.3)
7. Navy-San Diego State (434.9)
8. Ohio State-Arkansas (432.7)
9. West Virginia-N.C. State (429.7)
10. Northern Illinois-Fresno State (429.4)
11. Nevada-Boston College (427.6)
12. BYU-UTEP (426.7)
13. Hawaii-Tulsa (424.5)
14. Oklahoma State-Arizona (422.8)
15. Notre Dame-Miami (421.6)
16. Utah-Boise State (411.0)
17. UConn-Oklahoma (407.6)
18. North Carolina-Tennessee (405.0)
19. Southern Miss-Louisville (404.2)
20. Pittsburgh-Kentucky (402.3)
21. LSU-Texas A&M (395.7)
22. Air Force-Georgia Tech (383.2)
23. Florida International-Toledo (382.6)
24. Nebraska-Washington (381.5)
25. Middle Tennessee-Miami (OH) (380.0 -- can't believe this isn't last)
26. Florida-Penn State (369.4)
27. East Carolina-Maryland (369.1)
28. Missouri-Iowa (365.5)
29. Mississippi State-Michigan (362.9)
30. Northwestern-Texas Tech (361.6)
31. Ohio-Troy (360.1)
32. Kansas State-Syracuse (352.5)
33. Illinois-Baylor (349.7)
34. South Florida-Clemson (342.3)
35. Army-SMU (332.0)

12
by Dad2sca :: Thu, 12/16/2010 - 4:01pm

First, this is great stuff. I think it would be compelling to calculate this using the head coach - variances in team performances by coach is something I have always felt would be revealing. I have searched for coach career performance vs the spread in bowl games and never found anything. I think that would be good data - this data by coach would be infinitely better than that. I have to think certain coaches manage the layoff cycle better than others.

15
by Jeff Fogle :: Fri, 12/17/2010 - 1:29am

I finally found it!

Last year we were talking about conference strengths. Over the summer we were talking about how the arrow of time has changed the quality of football. I was remembering something Bill James had said about that, using characteristics of the game outside of stats to help get a sense of things. I could never find the darned reference...because, with Bill James, some important stuff is tagged to a write-up of some guy or team...and there's no way you'll remember where the heck you read it a few years later.

Anyway...reading through the Historical Baseball Abstract for fun again, I finally came across it (page 876 of the revised edition that came out a few years back, tagged to Bob Lemon).

In his words now:

"I have a theory that the quality of play in major league baseball, over time, could be tracked by what we call 'Peripheral Quality Indicia"--PCI for short...

1...Hitting by pitchers
2...The average distance of the players, in age, from 28
3...The percentage of players who are less than 6'0" or greater than 6'3"
4...Fielding percentage and passed balls
5...Double Plays
6...Usage of pitchers at other positions
7...The percentage of fielding plays made by pitchers
8...The percentage of games that are blowouts
9...The average attendeance of seating capacity of the game location
10...The condition of the field
11...The use of playlers in specialized roles
12...The average distance of teams from .500
13...The percentage of games which go nine innings.
14...The standard deviation of offensive effectiveness
15...The standard of record-keeping
16...The percentage of managers who have 20 years or more experience in the game.

If you studied that list, you would find that all of these things increased or decreased predictably as the quality of competition improved."

Back to me now...James then talked about improvements when comparing kids playing, to high school baseball, college baseball, minor league, then major leagues. And, he also talked about improvements through the arrow time through the majors.

Given the lack of connectivity in college football, it seems imperative to me that the stathead world develop Peripheral Quality Indicia to help add context and understanding. I suggested using NFL draft picks as a PQI for conference strength (the SEC has a lot more guys drafted than the WAC or Mountain West to pick an obvious example).

I think many of the items on that list could be adapted for both arrow of time and conference strength applications (particularly size issues, deviations in offensive effectiveness). I'd add in the number of minority players for arrow of time issues. Once they were allowed in, African Americans won a significant percentage of jobs...which wouldn't have happened if they weren't better than the players they were competing with for those positions.

Anyway...wanted to get it down on the record here before I forgot. Took me a year to find it! Something we can think about during the bowls, as we get to see a little more connectivity, though it's polluted some by motivational issues. If there are future efforts at ranking "best ever" like we saw this past summer, we can apply the same logic there.

Hope everyone feels free to toss out ideas for logical peripheral indicators for quality...