30 Dec 2004
by Aaron Schatz
Every couple of weeks, instead of responding to every question in the discussion threads, I put together this mailbag responding to the best questions and comments either on the website or emailed to me. That way good questions, and the answers as well, do not get lost in a sea of comments. (It also helps me refer in the future to answers I've given in the past). Be aware that I reference plenty of our innovative stats here, not to mention their unfamiliar terminology, so if you are a recent addition to the readership you might want to read this first.
Our first few questions are all related to a measure we call variance, which measures the game-to-game consistency of a team's performance as judged by total DVOA rating for each game. The lower the variance, the more consistent the team. Some of the results are a bit surprising, particularly when compared to conventional wisdom. First, a question about how variance works:
Chuck Oliveros: I'm new here, so forgive me if I'm showing my ignorance but I find myself wondering about the value of some of these statistics. Looking at the variance values, they seem awfully large. For example, Buffalo has a 33.4% variance. Does that mean that the true value of their estimated wins could be 10.5 plus or minus 33.4%? If so, Buffalo's ratings don't tell us a lot about the team except that it could be mediocre or it could be pretty good. In general, I find myself wondering about the small sample size problem. There are only 16 games in the NFL season, so an anomalously good or bad game will have a large impact on a team's final rating. Also, when you have teams who have clinched after Week 15, like Philly and Atlanta, you are going to end up with 12.5% of their season ratings being based on games in which they are, by choice, playing at less than their best. If this issue of small sample size is addressed somewhere, please point me to it.
Chuck, or anyone else new to the site, don't feel bad about asking questions. I try hard to explain the basics of what we're doing with these stats in the simplest language possible, but it is hard to do sometimes. I also try to reference the glossary and the basic methods explanation but without re-explaining every last detail every time I write about DVOA or our other stats, which means that if you are new, things aren't necessarily self-explanatory.
Variance looks much more complicated than it is. I have a spreadsheet with 32 rows, one for each team, listing the DVOA for that team for each game that season. Then I just run the Excel function VARIANCE on those 17 cells (the bye week is an empty cell) and that gives us variance as listed on the team efficiency stats page. It measures how much a team's performance went up and down from game to game over the course of the season. It doesn't represent the "standard error" in DVOA, or anything like that, and it has nothing to do with estimated wins, which is a different animal entirely. (DVOA gives every play equal weight, but estimated wins gives a little extra importance to specific aspects that historically mean more wins, like red zone defense, first quarter offense, and performance in the second half of close games. It's explained more here).
As far as the sample size problem, I address that somewhat at the start of the essay that describes our methods. Since we're breaking things down play-by-play, we're not talking about 16 events per team in a season, but rather over 1500 events per team in a season. Does football still suffer from a small sample size problem compared to, say, baseball? Sure. Does that mean doing statistical analysis is pointless? No, don't be silly. I say this a lot, but I'll say it again: The goal isn't to come up with a single number that is a perfect measure of a team's ability to an infallible predictor of the future. The goal is just to come up with better analysis than what's been out there before.
And as you'll see below, the problem of teams resting their starters in the final game is a lot smaller than you think it is -- or at least, it was before this season.
Tim: New England ranks fourth in variance. Does this explain why they allow almost every team they play against to stay in the game ( excluding the recent Jets performance ) or does this explain the counterpoint of "finding a way to win" force fed to me by every football announcer with a mic?
Jeff: The Patriots are so high in variance because their defense has been murderous in some games (Buffalo, Baltimore, New York), and horrid in others (Cincinnati, Pittsburgh). The offense has been more consistent from week to week, but has had its share of bad weeks too (Miami, Pittsburgh). Sometimes the offense and defense gel together to make for a decimating performance (Buffalo), and once, both fell apart and were dominated the whole way (Pittsburgh).
Actually, no. I mentioned this in Tuesday's DVOA commentary but I'll say it again: New England's high variance this season is almost entirely due to two games. One was the upset loss to Miami. The other was the big win over Buffalo, which rates as the best performance in a single game by any team this season because they mauled a team that will end the season in the top five. 9 of their other 13 games fall within a very close range from 40% DVOA to 80% DVOA. They have a reputation for letting bad teams stay close, but this year that really hasn't been the case. Only three of their wins have been by seven points or less, and two of those were over highly-rated teams, the Jets and Colts (the other one was 35-28 over Cincinnati). Their three wins over NFC West teams were all easy victories, plus they blew out the Browns and the Bills twice.
Jon Riegel: I am absolutely floored at the bottom two teams in variance -- Jacksonville at #31, which just dropped a 21-0 game against Houston, and the enigmatic New Orleans Saints at #32.
This seems extra strange because New Orleans had its best game of the season last week, and Jacksonville had its worst game of the season. As a result, both teams saw their variance increase. But despite this increase, they are still the two most consistent teams of 2004.
I addressed New Orleans in the last mailbag. The conventional wisdom that says they are inconsistent is just dead wrong. When people say "New Orleans is inconsistent" what they mean is "Gee, we always thought Aaron Brooks and Deuce McAllister were so talented, yet they've never really played like superstars, and the Saints keep taking defensive players with high draft picks, but their defense is awful, and there just doesn't seem to be an explanation for why a team with so many allegedly talented players keeps finishing 7-9, so they most be inconsistent." The Saints generally beat worse teams by close margins and lose to better teams by big ones. The last couple weeks have been exceptions, as was an early-season loss to Arizona.
Jacksonville was ridiculously consistent until they laid that egg against Houston. They won some close games with league-average performance, and then they lost some close games with league-average performance. If you want to see the definition of consistency, check out one of my always enjoyable game-to-game DVOA graphs:
As you can see, 11 of Jacksonville's 15 games this year have fallen within a narrow space between -20% DVOA and 20% DVOA, and their San Diego loss is just outside that area. Then all of a sudden this week... (insert sound of crashing airplanes).
Lenny Dee: Looking at the WEIGHTED DVOA Tampa should be favored over Arizona. I'd love to hear your thoughts on the game.
First of all, can anyone tell that I'm not sure whether or not I should put the word "weighted" in all capitals or not? I seem to switch back and forth.
I don't normally want to devote the mailbag to game previews, but I'll just make a few comments. Tampa Bay is a clearly better team than Arizona, both on offense and defense. Arizona's poor offensive rating is partly caused by the three games with King and Navarre at quarterback, but Tampa Bay's offensive rating is also lower than their current performance because of Johnson's quarterback struggles early in the season. I would feel pretty comfortable picking the Bucs, despite the fact that Tampa Bay seems to have an uncanny ability to pull a loss from the jaws of victory. The problem is that in Week 17 I have no clue who is going to play and how much. Every year weird things happen in the final week -- remember Lee Suggs going totally insane in Cincinnati last year? This is one of the reasons I'm worried about my appearance in the final round of the Two Minute Warning head-to-head contest this week. How do you use past performance statistics to determine the five best opportunities against the spread when so many teams are going to be sitting their starters? Seriously, I'm going to tell Roland to have this thing end in Week 16 next year.
This next question was asked in regards to that problem, with so many teams with nothing to play for this season compared to only a few in past seasons:
FYO: The most interesting stat to look at would be the DVOA of teams playing "garbage" games (i.e. those where they have locked their playoff spot and have nothing more to play for) relative to their average DVOA for the season. How much worse do teams play in these situations, compared to their normal standard of play?
I went back and looked at the Week 17 performance of the eight teams that Mike wrote about in his article on this subject in Pro Football Forecast, plus the Week 16 performance of the Eagles and Falcons this season. Mike's article didn't include the 1999 Colts; Len Pasquarelli writes this week that the Colts mailed in their final game of that season and went on to lose in the playoffs, which is why they may not rest their starters completely this year, but if you go back and look all the starters did play, just not very well. I don't think that counts as resting your starters.
It turns out that none of these teams played that badly in their final game, despite resting their starters, until last year's Broncos. The total season DVOA listed here includes the "resting starters" game.
|*Week 16, not Week 17, for this year's teams.
Note: 1999 Dolphins and Redskins played each other,
as did 2001 Eagles and Buccaneers.
So before 2003, there really was no need to adjust DVOA based on what happened in that final, "rest the starters" game. I haven't published the 1999 ratings yet but, trust me, the Rams are the top team despite a below average final game. Philly's backups outplayed Tampa's backups in 2001, which may have helped Philly to have a higher DVOA than Tampa for the season, but I'm not going to argue when Philly is the team that went further in the playoffs. I can't look back before 1999 yet, but it looks like something changed last year, suddenly teams aren't just playing backups, they are giving up entirely.
It remains to be seen what will happen this week -- will the seven teams with their playoff spots assured play their starters, will they play their backups but have the backups play well like the first few teams on this list, or will they play their backups and have the backups get destroyed like last year's Broncos and this year's Falcons and Eagles? It is possible those three games, while stronger in our memories, are actually historical flukes, and next week the good teams that sit their starters will still play reasonably well, or at least average.
Mike Yeomans: I read this in Don Banks' online column in support of Tiki Barber's place on his midseason MVP ballot (5th, by the way): "All told, he has produced first downs on 46 of his 169 touches, or 27.2 percent of the time." Is 27.2 percent good? Have you heard of this stat before? I haven't seen anything like it mentioned at Football Outsiders. It was at the end of a list of self-evident stats (yards/carry, 20-yard runs, 40-yard runs) and to me it smacked of wanting to divide the two numbers because they were next to each other.
I ran some numbers to see what they looked like. Curtis Martin and Priest Holmes were on top, Rudi Johnson and Clinton Portis on the bottom. It seems that bad running backs tend towards the bottom, but the good ones are spread all over. It looks like the ones who don't catch passes (Staley, Dillon, George) have a hard time, but Portis and Tomlinson are at the bottom as well. Importantly, the two people Banks left off his MVP ballot, Martin and Holmes, both beat Barber. And anyone who says Lee Suggs is better at anything than Corey Dillon is named Suggs.
Now, I don't have access to the kinds of resources you might have (my stats class computer program expired when I stopped taking it six months ago), so my own analysis can only go so far. Is there anything to this FD/touch ratio?
This question has been sitting around a while, so the numbers referenced in Mike's question are from midseason. But I was finally able to get back to him a week ago and figured I would share my answer with the class.
On one hand, I'm glad that Don Banks took the time to measure players by first downs. It is amazing how the first down is ignored when we talk about players -- all we talk about are yards and touchdowns. On the other hand, how many first downs a guy gets is often subject to the same biases as yardage -- in particular, the situations in which he gets the ball.
We're a number of weeks past this Banks article so the numbers are different, and to save time I used rushing only, not receiving. I took all the running backs with at least 80 carries through Week 15 (that's when I answered Mike's question, so that's when these numbers were done) and I looked at first downs per carry. These were the leaders:
|Player||Team||Percent of Carries
Then I looked at all the RB with at least 80 carries this season and I looked at average yards per go when they carried the ball. Here are the 10 guys who average the fewest to go:
|Player||Team||Yards to Go
on Average Carry
Look similar? Eight of the guys are on both lists. It turns out that these two stats correspond really closely with each other (-.75 correlation coefficient). In other words, if you get to run the ball with fewer yards to go, you get more first downs. Duh.
I guess the moral of the story is that first downs are important, and a stat like FD/carry is as important as yards/carry. You just have to understand the biases involved, and know that of course a guy like Wheatley will rank high on this. That's why I like my numbers, where you compare your 3rd-and-1 carries to other 3rd-and-1 carries, not to 1st-and-10 carries.
Eric: Do you have the numbers for how Philly's offense did in the last 8 games last year, once McNabb got healthy (either mentally or physically). If I remember correctly, they were a pretty dominant team offensively, without wideouts of note. Not sure what the quality of their opponents was.
Yes, you are correct. Philadelphia showed remarkable offensive improvement over the second half of last season. The dividing line actually isn't eight games but six. In the first six games, they had negative offense five times, and an offensive DVOA of -20.4%. Over the final ten games, they had negative offense only once, and an offensive DVOA of 37.5%. So perhaps the Eagles will do better without Terrell Owens than we think. On the other hand, the Eagles were a better rushing team last season. Ironically, the "sit all the starters" thing might help them with that, because in his limited time this season Dorsey Levens has been more successful than Brian Westbrook, and perhaps this will convince the Eagles to go back to a committee structure where they can use both backs to maximize their strengths.
Devin McCullen: Aaron, I know you didn't just pull the weightings out of your ass, but are you really so confident that Weighted DVOA is a superior measure of team quality?
Well, in this great big universe of ours, we can't really be sure of anything. What is God? What is truth? Where did I leave my wallet? These are truly the questions that have plagued the minds of philosophers for generations.
Sarcasm aside, am I confident that WEIGHTED DVOA is a better measure of how well a team is playing right now than total season DVOA? Yes. Am I confident that WEIGHTED DVOA is a perfect measure of team quality, or that the coefficients I use to weight the weeks lead to the most accurate possible results? No, don't be silly. We're improving our formulas all the time.
Pat Chase: Have you ever looked at taking games from the end of the previous season to come up with a weighted DVOA for the earlier weeks in the season? I think teams change less from year to year than we might imagine, and a larger sample size could allow you to come up with a decently predictive weighted DVOA. Also, what's your opinion of home field advantage? From a quick look at the past few weeks it looks like the home team makes up roughly 14 points of weighted DVOA, although I didn't look at nearly enough games to say that with much certainty.
To answer your question about an early season rating that includes part of the end of the previous season, that's one of the many many things on the "golly I hope I get to this in the offseason" list.
To answer your question about home field advantage, you're pretty close to right. I think I went over this in a previous mailbag, and this is another issue where I'm hoping to do more work in the offseason, but in general, here's what I've found:
a) Home field advantage is worth about 17% DVOA.
b) No team generally has more of a home field advantage than any other if you look over 4-5 years instead of just one season. Even Denver. Which I have a feeling is wrong. I mean, DENVER. They have to have a bigger home field advantage, right? They do in every other sport.
Jason: Just out of curiosity, but has there been a DVOA calculation of the Bills offense with McGahee and without?
Nope, but I can do one. I suppose I could try to split up the early games where Henry was still starting, but McGahee came in at certain times, but I'm not sure that's worth the effort. Instead, let's just split the season between games where Henry got the majority of carries (Weeks 1-5) and games where McGahee got the majority of carries (Weeks 6-16):
Buffalo Offense DVOA
The rushing performance didn't improve much, but the passing performance improved a lot. It isn't McGahee as a receiver, since his receiving DVOA is roughly the same as Henry's. If it has anything to do with McGahee at all, perhaps he's better at picking up the blitz. Mostly, I'm guessing it is Lee Evans maturing and the offensive line gelling.
Joshua Kranz: Hey Aaron, enjoy your rankings. Two questions. First, you factor in the quality of the defense in assessing each QB's performance. However, sometimes a certain defense faces an inordinate number of good QB's during the year, while another team might skate by and face a McCown brother every other week. for example, the AFC South has four great QB's: Manning, Leftwich, Carr and McNair/Volek. Okay, maybe Carr is pretty good and not great, but he and Andre Johnson were terrific the first half of the year. Then take a division like the NFC East -- besides McNabb, you have mediocrity at QB, or the AFC North -- Roethlisberger has played well, but Boller, Palmer/Kitna and Garcia/Holcomb/McCown haven't been world-beaters exactly. So how do you factor in a particular defense's difficulty of schedule in determining how good or bad a pass defense they really are?
Second, a defense that is usually behind entering the fourth quarter, i.e. a losing team, often sees one running play after another down the stretch, which might keep their pass defense stats artificially low. Maybe you already factor all these issues in your defense weightings. Just curious.
Let's hit your questions in order. On the first question, I think you are asking me if the defensive adjustment is based on the quality of the defenses (a "first-level" adjustment), or the quality of the defenses adjusted for the offenses they faced themselves (a "second-level" adjustment). The answer is the former, but if we can write a program over the offseason that computes things quickly I'm interested in trying the latter. I will note that if the "first-level" adjustment skews things incorrectly, I think this is a much smaller problem when you add together 16 games than it is for a single game like I am doing at ESPN.com. Things like this tend to even out over 16 games.
As for your second question, There is an adjustment for the fourth quarter score gap, but you would be surprised how little the "running out the clock" effect changes total statistics, for both running and passing. I know I was surprised. I'd also add that my adjustments are based on value per play, not total value, so facing fewer pass plays does not actually keep defensive stats artificially low. (On the other hand, since the QB ratings at ESPN are based on total value, throwing a lot of passes does raise the rating as long as the player is above replacement level. Billy Volek when he threw 60 passes, for example.)
Jack: If I recall correctly, San Diego last year suffered tremendously from poor passing performance on 3rd downs, resulting in a terrible win-loss record, and of course the chorus of people calling for head of Drew Brees and a new quarterback. I just wondering if any one this year is in similar circumstances?
Right, when we did the 2004 DVOA projections we got this strange result that said that San Diego would be the sixth-best offense of 2004. We thought it was insane, and yet things turned out pretty close to that. The reason wasn't just poor passing performance on third downs, but poor passing on third downs compared to first and second downs as opposed to poor passing overall. (Lest you think that this projection worked flawlessly, by the way, Miami was projected to rebound in offense for the same reason. Whoops.)
Next year's projection system will be even better, I hope. I'll have more years of DVOA to play with. Not only does that mean more data points, it also means I can include variables that not only represent splits of DVOA, but change from year to year in specific splits of DVOA. But while I'll have a better projection system in a few months, I figured we could find out what the old system says about 2005, so spurred by your question I ran numbers through Week 16 and did projections for 2005, offense only. The team that it projects as "the San Diego of 2005" is actually kind of obvious. Think of a young quarterback who took a step backwards this season, and has had trouble converting third downs compared to last year. I'll let you guess for a second while I list some other interesting results:
That's it for this mailbag. We'll do the last one of the season during the week before the Super Bowl.
1 comment, Last at 29 Dec 2005, 2:35pm by Miles