Talk amongst yourselves
8/29: ATL IDP, BUF WR, CLE IDP, PHI WR, TB IDP
8/28: BAL WR/TE, CIN RB, DEN QB/WR, HOU WR, NE WR, SF QB/WR
8/27: ARI WR, BAL RB, DAL QB, LARM WR/TE
* * * * *
The 2016 KUBIAK fantasy football projection workbook updates all preseason for only $20 -- or get it absolutely free with a $10 first-time deposit at FanDuel.com. Purchase it here!
Click here to buy PDF version.
Click here to buy PDF version
29 Jul 2010
"...Football Outsiders’ predictions are so bad, they are literally worse than having no football knowledge at all. It's like negative information, draining us of insight."
Posted by: billsfan on 29 Jul 2010
14 replies , Last at
03 Aug 2010, 3:42pm by
Brian Burke just seems to have an axe to grind with FO. He's also written articles to criticize the 370 carries predicting RB decline. I think he feels the need to drum up some controversy and content for new articles since his website does not have nearly as much content as FO, and he doesn't have things like game charting data to use for his analysis.
Anyways, it's really poor analysis to judge a prediction system on a single year without comparing to other real prediction systems (his site doesn't seem to offer it's own preseason predictions to compare against, which is convenient).
In fact one of the things that put FO on the map was through a preseason prediction challenge setup by King Kaufman of salon.com, in which several experts offered predictions for division winner and wild card berths. In 2004 FO basically crushed the field:
Long story short, don't trust analysis that relies on obviously cherry picked data and trumped up comparisons.
"I think he feels the need to drum up some controversy and content for new articles since his website does not have nearly as much content as FO, and he doesn't have things like game charting data to use for his analysis."
Considering he's the only contributor on his website, it's not surprising he can't match the quanity of content; however, the depth of content he has amassed over the years is staggering IMHO. If the game charting data is what sets FO apart from Brian's site, I'm concerned (I've said this many times about the Game Charting Project), given that Brian appears to be much more statistically savy than FO.
"Anyways, it's really poor analysis to judge a prediction system on a single year without comparing to other real prediction systems (his site doesn't seem to offer it's own preseason predictions to compare against, which is convenient)."
Convenient? Brian's assertion has consistently been that pre-season predictions are almost impossible to make with any reasonable degree of accuracy, and until he develops (if he develops or even desires to develop) a system that is accurate, failing to make predictions is consistent with his assertion.
"In fact one of the things that put FO on the map was through a preseason prediction challenge setup by King Kaufman of salon.com, in which several experts offered predictions for division winner and wild card berths. In 2004 FO basically crushed the field:"
RE: cherry picking, hello pot, meet kettle.
I am a fan of both sites. I think this is a shot across the bow to some degree and I applaud FO for posting this Extra Point, because I think for a long time FO has ignored the statistical work of others. This article does serve as fair warning if you are purchasing FO merchandise to assist with making predictions that you intend to invest money in (i.e. betting). Finally, (a person claiming to be) Ben Riley's comment in Brian's post note that FO is a collection of writers with an interest in statistics (something Aaron as stated over the years, both on the site and in personal correspondences). My experience is that Brian is a solid statistician with an interest in writing, and both sites have enhanced my knowledge, love and appreciation for the game of football.
It wasn't FO, it was just me. I think they still have a policy about studiously ignoring this guy.
I'm a fan of both sites, too--I've been reading FO since "The Establishment Clause," and I absolutely love Brian's in-game win-probability graphs. The more people there are doing statistical analysis on football, the better it is for junkies like us. If he had a pre-season book, I'd probably buy that one, too. August is such a painfully long month.
(I also like the Eagles)
Oops, my bad for not paying attention to detail.
I honestly can't recall what got me over to FO - I think it was when they were contributing to FOXSports (c. 2005/6).
A key factor to keep in mind is that FO is in the business of making money, while at present, Brian seems to be conducting his analyses based on what interests him, and I think a semi-comparable coparison is research conducted by pharmaceutical companies v. research conducted at universities.
I really wish that FO would pay more attention to the statistical analyses side of things, so that their assertions could be evaluated more critically.
Agree that the in-game win-probability graphs are the coolest thing on the web. Also love his win-probability analysis of various game decisions by coaches.
Burke may very well be more statistically saavy than most of FO, but that just makes picking one season's worth of predictions to "analyze" all the more egregious. He knows it's a completely invalid way of measuring the value of a prediction system, but he goes ahead and does it anyways, heaping lots of inflammatory language on top.
My picking out the 2004 predictions was just an example refuting Burke's assertion, not a proof of why DVOA is the end-all, be-all of NFL predictions. Burke on the other hand took the same number of data points (one), and used it to tear apart an entire system.
If he really wants to demonstrate that DVOA has little of no predictive value then he needs to compare it over multiple season against other prediction systems, which can include human experts. There is plenty of data out there, as far as I know King Kaufman just selected predictions posted from various well known experts for use in his contest.
As it is, between Burke's article and Kaufman's that I linked to all we really know is that DVOA can produce dominant predictions in a good year, and in a bad year it can look worse than a simplified system. Because this system was created after the season and no doubt chosen specifically to make DVOA look bad, we have no idea how many such systems were actually tested to find one that would fit the criteria. Without comparing to true predictions made before the season the comparison is completely unscientific and invalid.
The idea that preseason predictions are impossible to make is a non-sequitur, these predictions are made every year and can be easily compared. Even if they aren't very reliable in an absolute sense, simply being better than the rest provides value.
I would truly be interested in reviewing unbiased comparisons on both human experts and computer based systems, as this is the only way to really see what systems offer value.
"but that just makes picking one season's worth of predictions to "analyze" all the more egregious"
True, except he has analyzed multiple years of predictions, and notes this in his comments section:
Brian Burke said...
Regarding previous years:
Thursday, July 29, 2010
That seems pretty thin. I don't really see the comment he made on his own site that actually says he looked at 4 years worth of data (his response to Chip just seems to talk about his simplified win prediction formula?).
I'd really like to see tables of multi-year data with error metrics for each method and year all laid out. I can understand not using this sort of scientific presentation on a NYT article, but his own website should use more complete and transparent presentation.
If I'm doing the math right, we have 4 years of data since 2004 (2004, 2005,2006, 2009), and the 2004 data is in a different metric than Brian's data, so it's kind of an apple to the other three oranges. It is entirely possible that the mean absolute error in 2004 was significantly better than the average mean absolute error of 2.6 for the other three years. In my opinion, and I believe conistent with what I think Brian's point is, any system whose mean absolute error is greater than one is of limited value.
As I've looked at the historical utility of DVOA to predict wins I've set the limit at +/- one game due to the huge value of any single game in the season. The team that is playing at a 10-6 level (DVOA ~ 12.00%) looking to make the playoffs may indeed make it with a 9-7 (-1 win than predicted) record, but almost certainly won't with an 8-8 record.
Additionally, I've conducted analyses of DVOA as a predictor of final season records based on week 8 DVOA, and unfortunately, since 2000, DVOA does not predict wins any better than using the week 8 win percentage as predictor. If DVOA struggles to predict final season win totals at the 1/2 way point of the season, why would it do better at the beginning of the season?
Well pre-season predictions and mid-season predictions are completely different animals, with pre-season predictions being less accurate for any system because of the relative lack of information about a team pre-season compared to mid-season. So while DVOA predictive power improves during the season, so will using things like win percentage. That's the whole reason DAVE was created, to create a better estimate of team quality early in the season.
Anyways, I haven't studied the numbers in detail myself, so I'll take your word for it, I was mostly pointing out that in Burke's articles multi-year support was not provided, whether or not he ran the numbers.
I still find it a little hard to believe that DVOA is really that useless at predicting next season wins though, because that is what it is tuned to do. Every year in FOA (or PFP previously) they have the table in the intro showing correlation of various factors one year to wins the next, here it is for 2000-2008:
Point Differential 0.26
Yards per Play Differential 0.25
Yardage Differential 0.23
None of these are stellar, but DVOA is clearly better than the other factors, and seeing that wins has a positive correlation year to year it sure looks like DVOA would outperform a metric like wins / 4 + 6, unless FO screwed up their methodology to a degree I'd have a hard time believing.
"they have the table in the intro showing correlation of various factors one year to wins the next, here it is for 2000-2008:
Correct, and extrapolating on what Brian seems to be suggesting (demonstrating) is that saying every team will go 8-8 will produce a 0.34 correlation.
But that doesn't make sense. Since previous year's wins correlates at 0.25, it would imply that predicting every team to go 8-8 would be a better predictor, which is obviously not true.
And actually, you can't calculate ANY correlation for all equal values. I just tried using a correlation calculator (at http://www.easycalculation.com/statistics/correlation.php) for a small set of test values and plugging in all the same numbers for either X or Y values gives an invalid result due to a 0/0 division in the formula. Adding a small amount to one of the values will produce a result that is completely sensitive to the sign of the value added. So at best we could say every team will go 8-8 will produce a zero correlation.
Obviously correlation may not be the best metric due to these issues, and it would be nice if FO published RMS errors or other metrics in their DVOA comparison table. This also illustrates why a single statistic can be used to muddy the waters. If 8-8 produces nearly as good RMS errors or mean average errors as DVOA, then maybe those metrics are not very useful in the case. DVOA for example, will pick at least some division winners (useful), while all 8-8 records will "pick" four-way ties in each division (not useful). That's why King Kaufman's contest results are interesting, because they are an example of DVOA doing something football fans care about.
Keep in mind that a prediction system could have terrible RMS error yet still predict the exact division orderings for every team just by getting the relative strengths of the divisions wrong, e.g., under-predictiing every North and South team by 3 wins and over-predicting every East and Est team by 3 wins. This would be a case of bad RMS error but good correlation. Maybe that's why Burke is attacking DVOA on poor average errors, because it has been tuned to produce good correlations, which are more useful in predicting division standings (but less useful in predicting absolute record)? This is all speculation, but worth thinking about.
Here is a post at another sports stats blog comparing FO projections to the Vegas preseason over/under for the past 4 years*. FO and Vegas had been about equally accurate from 2006-2008 (similar average error, and a regression equation predicting wins gave FO slightly more weight). But FO's 2009 predictions were terrible, and looking at the four years of data Vegas has a significant edge (smaller average error and much more weight in the regression equation).
*This blogger is more interested in using football predictions to assist in monetary investments, but he does seem to know his statistics.
Not surprising given small (very) sample size.
NFL sample size is small on multiple levels (4-years of NFL data does not even equal one NBA/NHL/MLB season):
1. For comparison, 1 NFL Game is roughly equal to 5 NBA and/or NHL games, and 10 MLB games.
2. One single random events can have huge impact (i.e. missed field goals, turnovers, injuries, etc.) and can therefore have significant impact - i.e. one turnover in the NBA is relatively minimal, one turnover in an NFL game can be huge.
All of this is related to the point Brian is trying to make - predicting a season is hard, really, really, hard - Pete Rozelle is thrilled btw.
edited for poster drunk math error
I just got a good chance to look at this. The interesting thing is that for 2006-2008, FO and Vegas were basically in a dead heat. Then in 2009 FO basically stayed the same (though technically their worst year, only by 0.02 wins per team compared to 2008), but Vegas improved dramatically.
Overall Vegas is better on average but less consistent than FO: FO's average miss varies only by 0.32 from worst year to best, Vegas varies by 0.73.
FO also noted that 2009 was unusual in that team qualities changed significantly less than for other years. If we assume that Vegas predictions are more heavily based on previous year W/L performance than FO's numbers this would support a great year for Vegas. Given Vegas' volatility though, I wouldn't expect such an advantage most other years (Vegas, like FO, also got worse every year during 2006-2008).
© Football Outsiders, Inc. // Site powered by Stein-Wein // Partner of USA TODAY Sports Digital Properties