FO Mailbag: Do FO's Opponent Adjustments Matter?
Jason Martin: I recently read an article discussing the defense-adjustments you make in DVOA. If you look specifically at this graph, you'll there is a definite correlation(-.66), where ideally there would be none. The original article goes into a few possible explanations, but I was wondering if you had any thoughts or could expand on them. The current explanation for how it is DVOA is defense-adjusted seems like it would eliminate any correlation with strength of schedule.
Jason, your e-mail specifically addresses the question of how strength of schedule correlates to opponent adjustments. (Remember, DVOA is adjusted for both offensive and defensive opponent strength, even though the acronym just has "D.") However, the article on igglesblog makes two points.
First, it argues that the opponent adjustments in DVOA don't mean anything because there's a ridiculously high .985 correlation between DVOA and VOA. Well, of course there's a high correlation! DVOA essentially is the same thing as VOA with some small adjustments. The quality of play is a lot more important than the opponent adjustments. The opponent adjustments tend to even out over the course of the season -- opponent adjustments mean a lot more when looking at one week's performance in Quick Reads than they do looking at a full-season performance.
In addition, opponent adjustments are generally smaller than conventional wisdom might expect. Even the best defenses have bad days. Even the worst defenses have good days. When you average those in with the regular season, the penalty for playing a bad defense is going to be a little smaller than you would expect, and the bonus for playing a good defense is also going to be a little smaller.
The second argument is that there is a really high correlation between strength of schedule and DVOA in 2011. And there is! Igglesblog graphed this using just the ranks and got a -.66 correlation, but the correlation is even bigger if you use the actual ratings. The correlation between strength of schedule (i.e. average DVOA of opponent) and a team's total DVOA in 2011 was -.72. That's just absurdly huge.
It's also a one-time thing.
If we look at the last ten years, there is only one year where the correlation between strength of schedule and DVOA is even HALF as strong as 2011.
Yes, it is negative more often than it is positive. Perhaps we need to look into improving our opponent adjustments this offseason by using third-order adjustments rather than second-order (or fourth-order, or fifth-order -- honestly, I'm not sure how complex my Excel spreadsheet can get before it starts taking three hours to run each week). However, the massive (negative) correlation between schedule strength and DVOA is entirely unique to the 2011 season. I have no idea why things ended up that way. But they did.
37 comments, Last at 28 Jan 2012, 9:30pm
#1 by dereksarley // Jan 25, 2012 - 12:20pm
Aaron, thanks for the response. The SoS thing is what I get for posting on the cheap these days, rather than digging further in to past years.
I think there's a general misconception about the "small adjustments" made in going from VOA to DVOA. I can't count the number of times I've heard -- or even made -- the argument that we can wave away schedule strength issues because "DVOA takes care of all that." Given the small size of the adjustments made, we really can't say that.
Of course, as I hope I made clear in my piece, it's not the FO folks who have been making this mistake, because DVOA without context is never treated as a magic bullet here. It's the rest of us who have made / are making this mistake.
#2 by Kal // Jan 25, 2012 - 1:35pm
I don't think it's that easy. If you look at the Conf Champions thread, you get NE's VOA of something like 10% but their DVOA goes to 32%. That's not a small adjustment; it's 200% more in magnitude and a huge difference overall. Similarly, Baltimore's goes from 2% to 10%. Again, fairly substantial jump.
So even if we're making that mistake, the data is implying it as well
#3 by Joseph // Jan 25, 2012 - 2:24pm
Kal, I personally recommend breaking each of the 3 components (O, D, ST) down. Even as a total, for NE, that's +22%; for BAL, that's +8%. Look at what Aaron mentions--on a one-week basis, there will be some rather large adjustments; over the course of the season, they will somewhat cancel out. I think if you could find one of the recent overall season DVOA graphs, you would see that over the course of the season, VOA & DVOA rarely vary than more than 5 percentage points.
In other words, my bet is that BAL's 8% is mostly from the defense having an "above-average" game against Brady (in other words, making him look solid vs. All-Pro), and for NE, it mostly comes from Brady having a solid game against a great defense instead of making him look like T.J. Yates. (IMO, the other part of NE's positive bump comes from holding Ray Rice in check.)
In other words--the percentages don't MULTIPLY--they ADD.
#11 by Kal // Jan 25, 2012 - 4:09pm
I realize the percentages don't multiply. That doesn't really matter. a 22% jump in percentage points is a huge change compared to what it was and what it represents. We've talked about DVOA and how it sort of relates compared to things like point spreads, and IIRC, it is something like for every 10 points of DVOA difference it is a 3 point spread. That means the difference between DVOA and VOA is equivalent to a line changing 3 points - and that's conservative.
That's a pretty big change!
I know exactly where and why it comes from, but that doesn't really matter; what matters is the notion that DVOA doesn't change that much from VOA. That's patently false, and this is a good example of that.
#14 by jbird1785 // Jan 25, 2012 - 5:05pm
If I am understanding all this correctly, I think the idea is that over the course of a season, opponent adjustments don't matter as much. The difference between the hardest and easiest schedules in the league this year was 12.1% DVOA. That is the difference between playing Tennessee all year versus the Redskins all year. It's a non-trivial difference, but it won't provide the huge swings in difference between DVOA and VOA that you see on a per game basis.
#4 by MJK // Jan 25, 2012 - 2:33pm
Great response to this question. A couple of comments:
* When you say " The quality of play is a lot more important than the opponent adjustments"... is that because, over the course of a season, a given team tends to not be consistently good or bad, or because over a given season, a team plays roughly the same number of good and bad teams, usually?
Also, I'm not sure that the high correlation between VOAf and DVOA is necessarily indicative that quality of play is more important than opponent adjustments (although I well believe that is true). But the thing is, the D-adjustment to VOAf is NOT something that is applying unbiased perturbations to VOA in both directions...rather, it is a scaling. In other words, if you play well and have a high VOA, the best that the D adjustment can do is say "well, that's less impressive and so we're going to scale it back". If you play badly and have a bad VOA, the best the adjustment can do is say "well, your opponent was tough, so it's not *as* bad as it looks". But the D adjustment will never penalize you for playing well, or reward you for playing bad.
In other words, you have a set of values (VOAf). You're going to apply some operator to them (the D adjustment). That operator can make good values appear better or less good, and make bad values appear worse or less bad, but cannot flip the sign of the value, so to speak (I don't mean the literal sign of the VOA, since that's just a function of a user-selected baseline). It can never say that playing well is bad or playing badly is good. So you will always get a positive correlation between VOAf and DVOA.
* Interesting that the correlation in the strength of schedule and DVOA is negative more often than it is positive. There *is* some machinery in the scheduling to try to make the good teams play harder schedules, so if that machinery is working and if DVOA really does reflect how good a team is rather than how good it appears to be, then we should expect a positive correlation. I mean that three games are picked according to your division placement the previous year. If every team stayed the same goodness from year to year, this would make better teams (higher DVOA) play slightly tougher schedules. Of course, a team can vary a lot year to year, so you get a team like New England getting "punished" for winning their division last year being made to play Indy this year, so it's not surprising that this machinery doesn't really work...
* On going to third order DVOA...I've long suspected that DVOA may not be converged, even with second order. You can gauge how much it is not by looking at the deltas. Look how much the ratings change going from VOAf to first order DVOA. You can computs some kind of L1 or L2 norm to do so. Then look at the same metric going from first order DVOA to second order DVOA. Then compute third order DVOA and look at how much the norm changes again.
Hopefully, you'll see evidence of convergence. If the change from 2nd to 3rd is, say, 100 times smaller than the change from 1st to 2nd, that implies you're pretty converged. If it's still changing by about as much, it means you need to iterate further.
* You mention you're taxing your excel spreadsheet. You should really get away from Excel. Matlab, Python, or even custom written software in a real language would all probably serve you better as you continue to evolve what you do.
#6 by zlionsfan // Jan 25, 2012 - 2:37pm
minor correction: only two games depend on last season's finish, the games against other divisions in your conference besides your own and the one you play in its entirety.
#7 by PaddyPat // Jan 25, 2012 - 2:54pm
The random fluctuation, positive to negative correlation isn't really surprising. With few exceptions, it's pretty hard to predict the strength of divisions year to year, and much of the schedule is a rotating division matchup. Moreover, doesn't it just make sense that teams that get cupcake schedules are going to gel better than teams that have bruiser schedules? Take a great team and throw the kitchen sink at them, and if you knock them down a few times, you can kill their confidence, create disharmony in the locker room, etc. etc. Take a weak team and give them a bunch of huge wins, and they're going to settle down, play with confidence, etc. It actually makes more sense to me that we would see a slight positive correlation.
#13 by Pat (filler) (not verified) // Jan 25, 2012 - 5:01pm
There *is* some machinery in the scheduling to try to make the good teams play harder schedules, so if that machinery is working and if DVOA really does reflect how good a team is rather than how good it appears to be, then we should expect a positive correlation
It's entirely possible that hard schedules are more damaging to a team. Damaging physically - not just because, well, a better opponent might actually hit your quarterback a ton - but possibly also because a weak opponent allows garbage time to have starters avoid injury. That means that a team that plays a weaker schedule could end up being a slightly better team in the end, even after any opponent corrections, because, well, they simply didn't get damaged as much as a team playing a harder schedule.
There's nothing you really need to do to correct for that : you correct for the opponent's strength in assessing performance so that you have a good baseline for comparison between two teams. In other words, if one team is facing Darrelle Revis in the secondary and the other is facing Julian Edelman, well, you expect the team facing the worse corner to be able to abuse him more than a good corner. And so when that happens, you're not surprised, and you know it doesn't mean that when that team goes up against Revis they'll be able to abuse him, too.
However, if one team's QB gets nailed 20 times by the Ravens and the other team never gets touched facing the Packers, you'll let the team's defensive line quality adjust in your head for how good the offensive lines are, and you wouldn't expect the team that faced the Packers to keep the QB clean and upright versus the Ravens... but it doesn't change the fact that the QB, after facing the Ravens, might just perform worse against the *next* team because he's injured.
#19 by Jim Glass (not verified) // Jan 25, 2012 - 10:37pm
Interesting that the correlation in the strength of schedule and DVOA is negative more often than it is positive. There *is* some machinery in the scheduling to try to make the good teams play harder schedules, so if that machinery is working and if DVOA really does reflect how good a team is rather than how good it appears to be, then we should expect a positive correlation.
I don't think so. It's hard to see how a positive correlation could be.
The fact that stronger teams play easier schedules and weaker teams play tougher ones results trivially from the fact that teams cannot play themselves. As teams cannot play themselves, in lieu of doing so the strongest teams must play the weaker and the weakest the stronger.
NFL teams play double round robins in divisions. Stylized division play:
........... W-L ... Opp W-L ... S-o-S
Best team . 6-0 .... 6-12 ..... .333
2nd best ... 4-2 .... 8-10 ..... .444
3rd best .... 2-4 .... 10-8 ..... .556
Weakest ... 0-6 .... 12-6 ..... .667
That's a pretty visible spread from top to bottom in opponent W-L and strength of schedule.
To the extent that teams play the same schedule outside of their division, this will of course carry over. Multiply by eight across the league. It is significant.
Same in divisions with closer spreads from top to bottom. Say team records are 4-2, 3-3, 3-3, 2-4. Then the top team has a s-o-s in the divison of .444 and the bottom team of .556, clearly visible too. That's the starting point for each team going into inter-division play. Multiply by eight, that's plenty enough to be visible in full-season numbers, as it is. The actual average divisional case is between these two examples.
And that is not the end. E.g. 10 teams in other divisions had to play the 16-0 Patriots, but the Patriots didn't have to play themselves even once! The Patriots had an easier-than-average schedule *outside* their division right there. And the converse, of course, for the 1-15 Dolphins. They didn't get even one gimme game against themselves, teams in other divisions had 10 while those in their division had 6 more.
The strength of schedule numbers of *every* sensible team rating system show the effect of this. By Pythagorean -- which has nothing to do with DVOA -- the seven strongest teams this year had opponents with an average 47% winning strength and the seven weakest teams had opponents with an average 53% winning strength. The 6% difference is worth one win in a 16-game season.
That's just a true fact, a system finding such a thing indicates no problem with its strength-of-schedule adjustment, the finding is totally expected. It just means the NFL's "machinery" to make stronger teams play tougher schedules doesn't amount to very much, which is a true fact too. And expected, considering the double round robin format of division play. To fully equalize strength of schedule of strong and weak teams in a short 16-game schedule would require *major* scheduling changes by the NFL.
You should really get away from Excel. Matlab, Python, or even custom written software in a real language would all probably serve you better as you continue to evolve what you do.
Maybe, but I doubt it. With a season of only 16 games (fewer going back in time) NFL stat analysis is doomed forever to suffer the curse of small sample size. The temptation to use the latest power software to parse a small sample size to the N-zillionth degree is powerful, and fun to indulge, but one should be very skeptical about the value of the results one can expect to get. K.I.S.S. can be as wise a policy in statistical analysis as anywhere else, but is seldom encountered there. Over-parsing even just with Excel can lead to strange results (like: "the Jets are the #1 team in the league") when all the cruder KISS-type analysis (such as Pythagorean) strongly warns it probably ain't so.
There's no way around small sample size. I've never seen any meaningful question regarding the NFL that couldn't be handled by Excel and its associated programming language, if it can be handled at all. With MLB and its 300,000 game database, or if trying to compute some sense into plus-minus in basketball, maybe power analysis is useful. But the NFL resides within simple stat bounds. For instance, the fact that stronger teams play easier schedules and vice versa is just a simple forced artifact of teams not being able to play themselves in a short schedule based on a double round robin. What's to compute?
This all IMHO, FWIW, of course. Use Matlab to produce some breakthrough analysis of the NFL and I will send kudos and sing praises.
#22 by Alternator // Jan 26, 2012 - 12:36am
I like this post, and only want to add one thing:
Matlab or similar could allow n-th order DVOA adjustments where Excel cannot, because Matlab handles the math better. That's one of the few things that might genuinely improve the numbers with very little extra 'work.'
#34 by Pat (filler) (not verified) // Jan 27, 2012 - 12:14pm
Your argument regarding the scheduling would be right if the NFL didn't have the "division leaders play division leaders" correction in there.
If you take a look at your silly example, for instance, and assume all divisions are the same, so the 6-0 team goes 2-2 against the other 4 division leaders they play, and beat everyone else (6 teams). So they finish 14-2, team B picks up 4 losses from the division leaders, go 2-2 against the second place teams, and pick up 4 wins versus the bottom two teams, finishing 10-6.
The difference between the 1st and 2nd place team schedules are the in-division difference (8 wins) and the 2 1st place teams versus the 2 2nd place teams (... 8 wins). So, nominally, the schedules equalize out completely. Same argument works with tighter-packed divisions, but *not* if you've got a mix of weak and strong divisions.
However, the conclusion you make is right, because the NFL's scheduling correction only attempts to correct for strength inside a division, not strength across the league.
#23 by NHPats (not verified) // Jan 26, 2012 - 8:16am
I expect the learning-curve and rework penalty for migrating from Excel might be gawdawful high.
An alternative might be to write more efficient functions...perhaps even in other languages...and wrap them in VBA function calls (if you need an in-cell function) or as menu options (from a custom menu) if you need feature-type stuff.
If you haven't read it already, Professional Excel Development, by Bovey et al, is your friend.
#5 by starzero // Jan 25, 2012 - 2:36pm
i know nothing about programming, but i wonder whether someone could develop a program, or even if there is some sort of stats software, that would let you run these equations more easily, without all the excel processing.
#8 by MJK // Jan 25, 2012 - 2:59pm
Well, it's called Matlab.
Or, if you don't want to pay for a Matlab license, use Python and the pylab library (both freely available) that emulates (mostly) the Matlab interface.
#9 by jbird1785 // Jan 25, 2012 - 3:27pm
There is also R at http://www.r-project.org/.
I think it hs been mentioned either by Aaron or someone else that while there are better tools, the FO staff isn't particularly familiar with them. You could argue that going away from a tool your whole organization is familiar with and even built upon is not a good idea unless it is hampering progress. Non-converging opponent adjustments may be enough hinderance, though.
#12 by zlionsfan // Jan 25, 2012 - 4:11pm
right. I have no doubt that there are plenty of ways to put this info into a system that would be better, faster, stronger ... from a programmer's perspective. If the specs are "give us a system to calculate DVOA, etc. with interfaces that we're comfortable using", then you're basically describing what they already have.
There's also the question of time/money; it's as much a decision about resources as it is about knowledge. If they have to choose between spending money on ROBO-DVOA and on Project X that we don't know about, well, DVOA works right now, doesn't it? And it's one thing for a reader to say "I'll volunteer my time to 'improve' your system" and another to put in the time to make it handle more complex calculations while offering the same familiarity that Excel does, all without interfering with the 2012 Almanac, offseason projects, etc., even if that's what Aaron wanted.
#24 by Aaron Schatz // Jan 26, 2012 - 9:35am
I think I'm going to put zlionsfan's answer here in the site FAQ. So true.
#26 by Jimmy // Jan 26, 2012 - 11:33am
I want ROBO-DVOA! And a pony!
#10 by Aaron Schatz // Jan 25, 2012 - 3:29pm
The problem is that I don't have the time to learn how to use these programs, or to write big long programs in them. I have time to do what I'm comfortable with. And any time someone offers to re-do my database for me, they always want to make all these changes that make it the way they want it instead of the way I want it.
#16 by zenbitz // Jan 25, 2012 - 6:28pm
WHY DON'T YOU JUST TAKE YOUR FILTHY CATHOLIC MATCH GIRL LUCRE AND HIRE A STATISTICAN!
#18 by omaholic // Jan 25, 2012 - 8:12pm
More importantly, just bring back Catholic Match Girl!
#31 by qsi // Jan 27, 2012 - 2:40am
I have a lot of sympathy for your predicament here, Aaron. Been there, done that, in a way. Obviously I don't know how complex your current spreadsheet is, or how it is set up, but my experience is that at some point you're better off biting the bullet and moving away from a purely cell-based approach in Excel.
We have a set of pretty complex spreadsheets that have evolved over 15+ years, and they all started off with in-cell formulas and computations only. This brought successive versions of Excel to their knees, and more importantly, the spreadsheets became unwieldy and error-prone. Formulas with INDIRECT, OFFSETs, V/HLOOKUPs, MATCHes, array operations, dynamic range names, etc... you get to a point where Excel formulas become write-only.
There were two stages to our remediation of this (under commercial pressure to keep producing results all the while): firstly, we moved a lot of the cell-based calculations to VBA, and secondly (later), to Matlab. This way we maintain the ease of use of Excel as the user interface and front-end, while benefiting from having readable, maintainable, flexible code in a way that cell formulas never can achieve.
In retrospect we should have done this much sooner, but if it ain't broke (yet)... it also helped that we were familiar with both VBA and Matlab, which made the transition less daunting. It looks like you still have some way to go before hitting the point where the spreadsheets become too cumbersome, so my advice would be to start thinking ahead for the next few years so that when the crunch comes you're better prepared. And it will come if you keep innovating at the pace you have been in the past.
(Do the spreadsheets really take hours to calculate? I am hoping that was hyperbole because it would scare me to have pressing F9 result in hours of calculation... :))
#33 by MJK // Jan 27, 2012 - 11:48am
Second this. We've run into similar situations. Excel is a great tool, at first, but you get to a point when it only takes you so far. Complex calculations with masses of data really needs the ability to write code, not just have cells and formulas.
I would really recommend seeing if anyone on the FO staff is familiar with Python (or, if you hire anyone in the upcoming years, look for that on their resume). Python is a scripting language that is very easy to code in, can be (but doesn't have to be) object oriented, and has had libraries written for it that allows it to directly interface with Excel with minimal (or possibly no) use of VBA. So it would be backwardsly compatible with your existing spreadsheets, and you could continue to use Excel as a front end to input data (and continue to use whatever scripts or tools you already have for reading play-by-play data into Excel). Best of all, Python is completely open source and free, so you don't have to spring for Matlab licenses costing thousands of dollars (and its capabilities are being enhanced pretty much daily by programmers around the world). It's also (with a little bit of care in how you code) platform independent.
#37 by MurphyZero // Jan 28, 2012 - 9:30pm
And I would add that given the volumes of data that you are likely dealing with, you may need a database program of some kind, potentially with VBA/Matlab/Other calls as well. I too have similar Excel spreadsheets + extensive VBA code to do very complex tasks, but with some of my needs, I need to go straight to Access. Now that several of those have become big beasts, and I am currently considering whether or not to switch that to something else...which would likely take months. Meanwhile we need to create another database that will take considerable time...
My point being, from this experience, if you really need the output, sometimes you have to bite the bullet and do it right, (which includes it being the way you need it to do your job--that means you'll need to be involved in its creation) sooner rather than later. But it sounds like you'll need some training as well, either in Excel or database to be able to continue to expand your efforts. So like the prior poster said, start planning and get ahead of the curve.
#15 by AnonymousA (not verified) // Jan 25, 2012 - 5:39pm
Opponent adjustments are actually a collaborative filtering problem, and two rounds of hill climbing (what I suspect DVOA uses) will do a mediocre job at that. It won't be awful -- it'll tell you something. But it won't converge to the point that you could use in, say, a physics simulation.
That said, collaborative filtering is hard. Netflix gave away $1M for a 10% improvement in their collaborative filtering algorithm relatively recently. I suspect that any of the "better" algorithms would have a cost in Aaron-time which is prohibitive.
Another thing worth noting is that CF problems are notoriously easy to overfit. It's probably better that DVOA not take really easy/hard schedules as seriously as it should, rather than spitting out nonsense because it's discovered a non-existent pattern. The answers to this problem are well known but, once again, have a cost in Aaron-time that is likely prohibitive.
#17 by Jim Glass (not verified) // Jan 25, 2012 - 7:10pm
Folks, the explanation for why the strongest teams play the easiest schedules is trivial: Strong teams can't play themselves.
In 2007 the Pats were 16-0 and Dolphins were 1-15. They were in the same division and played each other twice. So playing an otherwise identical schedule, the Dolphins' opponents had 30 more wins than the Pats' opponents.
In simplest terms, the difference in their schedules was the Pats got to play the 1-15 Fins twice while the Fins were playing the 16-0 Pats twice. Being that they were in the same division *they had to*, it is forced.
The strongest teams *do* have the easier schedules than the weakest teams because no team can play itself. In lieu of doing so the strongest must play the weaker and the weakest must play the stronger.
So there is nothing at all wrong with data and s-o-s adjustments that show the strongest teams play easier schedules and the weaker teams play tougher schedules -- it is a true fact.
"If the opponent adjustments actually worked the way people thought they did -- and which I thought they did until not that long ago -- the correlation here would be much closer to zero"
No, it wouldn't be close to zero. It would be just like what the graph there shows, with the strongest teams playing the easier schedules and vice versa. Because that is the reality of what they do.
The strongest teams get easier schedules with a correlation of about 60% year after year.
I'm really surprised that this can be a topic of any controversy at all on a stat site like this, being this is so simple and obvious.
#25 by MJK // Jan 26, 2012 - 11:20am
"I'm really surprised that this can be a topic of any controversy at all on a stat site like this, being this is so simple and obvious."
The thing about statistics, or mathematical modeling, or science in general, is that it can be really easy to miss the forest for the trees. We're all guilty of it sometimes.
Your explanation is, of course, very simple, plausible, and almost certainly true. And it simply didn't occur to me (and also, obviously, to the author of the blog post).
However, according to the numbers that Aaron lists, there have been a number of seasons recently where SoS *DID* have a positive correlation with DVOA. Which implies that the "can't play themselves" effect obviously does not dominate over random fluctuation of schedule strength.
So it's no doubt an effect that contributes to the fact that negative correlations are (presumably) more common (though the sample size presented is a bit too small to say that with certainty), it's not like we're saying "duh, the sky is blue" here.
#20 by Anonymous23457546548 (not verified) // Jan 25, 2012 - 11:58pm
You guys are dense. Read this and the comments:
Hint: Its a very simple reason
#21 by Anonymous23457546548 (not verified) // Jan 26, 2012 - 12:00am
Sorry, Just saw Jim Glass' comment. He had it covered.
#27 by RickD // Jan 26, 2012 - 2:59pm
I'm fairly sure I talked about this in the past couple weeks. Jim Glass is correct. Until teams start playing against themselves, you're going to expect a negative correlation between DVOA and strength of schedule.
I honestly don't understand what logic would be being used to argue that this is supposed to be a flaw of DVOA, but I'm getting a whiff of innumeracy here.
Consider the following thought experiment. Let's have a room full of people of varying heights. Ask each person to record the heights of a fixed number of people that they look at (selected randomly). You know what? There's going to be negative correlation between the height of the person making the observations and the average height said person records.
This is really obvious.
#28 by Eddo // Jan 26, 2012 - 6:05pm
I think that Burke's example (and others' echoing of it) is a good one, and why you'd expect to see a negative correlation between true team strength and strength-of-schedule when every team plays ever other. I'm not sure it holds for an NFL schedule, though.
The first thing I asked was: is that really what DVOA is measuring? Remember what FO is doing. They don't know the true strength of a team.
For this thought experiment, let's accept that VOA itself (note: no "D" there) as being a perfect indicator of how strongly a team has played (note: not how strong the team is itself). To get to that team's true strength, or DVOA, we have to apply schedule adjustments based on the VOAs of the team's opponents.
So now, we have three things:
1. How a team has played all year, within the context of its schedule.
2. How well the teams on said schedule have played.
3. A metric that adjusts (1) for (2).
To put some numbers on it, let's take six teams that play three games each. (Down the left hand side is the team, the numbers going across are its VOA rating against each opponent (that is, A went 3-0, F went 0-3). AVG is the average of each team's VOAs. SoS is the average VOA of each team's opponents.)
. _A__ _B__ _C__ _D__ _E__ _F__ | __AVG__ | __SoS__
A .... +10% +20% +10% .... .... | +13.33% | - 0.56%
B -10% .... + 5% .... .... +10% | + 1.67% | + 0.56%
C -20% - 5% .... .... +15% .... | - 3.33% | + 3.89%
D -10% .... .... .... + 5% + 5% | + 0.00% | + 0.56%
E .... .... -15% - 5% .... +10% | - 3.33% | - 3.89%
F .... -10% .... - 5% -10% .... | - 8.33% | - 0.56%
How well the teams played doesn't perfectly negatively correlate with the SoS's.
Now, add the SoS figures to each team's AVG:
B: + 2.22%
C: + 0.56%
D: + 0.56%
E: - 7.22%
F: - 8.89%
The overall "true strength" numbers definitely don't have a perfect negative correlation to strength of schedule. In fact, the best team and worst team had the same overall schedule strength!
I think that the whole idea of negative correlation because the teams don't play themselves works, with a big if - if every team plays every other.
If that's not true, I don't think you would expect positive or negative correlation.
#29 by Independent George // Jan 26, 2012 - 6:35pm
Not every team plays each other for 16 games, but they do for 6 of them within the division; over 1/3 of the schedule is a closed system. It won't be a perfect correlation as per the models, but there will be correlation.
#30 by nat // Jan 26, 2012 - 7:54pm
The NFL scheduling is really quite nifty. If, for example, you were last in you division.... Obviously you can't play yourself. So you're short two games against last place teams. But you get to play two extra games against last place conference foes. In the end, every team plays four games against each level of team.
Normally that would make the 'don't play yourself' effect small, since you're just selecting from teams that placed at your level in their division. This year a number of teams changed a lot in quality, possibly due to luck or maybe Luck. So being the Patriots meant you got to play the Colts as well as not playing yourself. That's an eleven game swing in SoS.
#32 by Eddo // Jan 27, 2012 - 10:54am
Yes, that's true. If you ran DVOA and SoS figures for intra-division games only, you would expect negative correlation. Actually, that would be really interesting to see.
However, at the whole-league, whole-schedule level, that doesn't hold true.
#36 by Pat (filler) (not verified) // Jan 28, 2012 - 3:42pm
For the whole league, the strength-of-schedule games correct for the intra-division disparity. Put another way, at the end of the season, every team has played exactly the same number of teams ranked #1, #2, #3, #4 in a division from the previous season.
The reason you expect a negative correlation for the whole league is because they're not perfectly matched divisions. The schedule for a team that goes 16-0 will obviously not be able to play any other 16-0 teams. :)
#35 by db22 (not verified) // Jan 27, 2012 - 8:27pm
I did some matlabbing and I came to the following conclusions.
What should the mean correlation be: -.170
What should the standard deviation in the correlation be: .245
How can it be positive? Those occasional years when really good teams are bunched into the same divisions. To be extreme, if the AFC East and NFC East had the 8 best teams, and the divisions play each other in a given year than you had have the 8 best teams playing the 10 games a piece against top 8 teams.
What does that mean?
Doing this correlation is a worthless way to criticize the model. All the year by year correlations Aaron posted fall well within two standard deviations of the mean.