Writers of Pro Football Prospectus 2008

16 Oct 2013

Bayes Theorem and the New York Giants

In all honesty, I could link to three or four of Chase Stuart's columns per week, so there's a non-trivial amount of discretion going into linking to this one.

If you were to ask me, "What is the one thing that the football analytics community (at least in the public domain) could do right now to advance our cause without requiring more robust data?" I'd answer, "applying more advanced statistical techniques to the data we currently have." Keith Goldner did this when he developed his Markov model of football, but nothing's really been done since in that regard until Stuart applied Bayesian statistics in the linked column. Granted, Bayes Theorem has been around since 1763, and none other than Nate Silver has applied it to baseball, but we seem to be lagging behind. Given my background in measurement methodology, I have ideas for sure, but let's hope Stuart's column is a jumping-off point. The time to move on from correlation and ordinary least squares regression is nigh.

Oh, and since I should probably say something about the column's findings, it turns out that the posterior distribution of this 0-6 Giants team says they're about a 4-12 team in terms of "true" quality, which is slightly worse than the 5-11 that Brian Burke predicts, but better than both their 2-14 Pythagorean expectation and 1-15 estimated wins expectation. We can reconvene in two months to see which of these four prediction methods was right (this time).

Posted by: Danny Tuccitto on 16 Oct 2013

31 comments, Last at 18 Oct 2013, 11:10am by TomC


by IrishBarrister :: Thu, 10/17/2013 - 1:51am

I used to put together a Bayesian rating model for NFL and FBS teams based on margin of victory, strength of schedule, and home field advantage (similar to what guys like Sagarin run). You know, assume team performance is a standard distribution around a central mean and use games as Bayesian inferences. But I've always wondered whether such a method would lend any additional predictive power to advanced stats like S&P+ and DVOA. Might be an interesting off season project (I'd be willing to help).

by Brad M (not verified) :: Thu, 10/17/2013 - 3:14am

"which assumes that the talent level of NFL teams is normally distributed, an assumption I will make throughout this post"

Yeah, but it's not.

by Brad M (not verified) :: Thu, 10/17/2013 - 3:19am

Actually, I may have jumped the gun. I'd at least like some kind of evidence that that it is - even if it's just citing nfl standings from the past 30 years. I guess I could do that myself, but that's a hell of an assumption that doesn't necessarily need to be just taken on faith in this instance.

by David :: Thu, 10/17/2013 - 3:44am

Admittedly, this is after only two minutes' thought, but I have no idea how you would prove that hypothesis. The biggest issue that I see is how we measure the level of talent on a team. My initial thought, like yours, was to assume a correlation to win-loss records, but the work of DVOA, SRS, Pythagorean scores and others are all an attempt to show the limits of win-loss records as a measure of team quality

And that's assuming team quality is directly related to player quality (I mean, it probably is, but it's still an assumption)

Given the number of assumptions involved in trying to define the hypothesis, I'm pretty happy that normal distribution can be taken as the default, simply on the law of large numbers(53x32 is a pretty big number, after all)

by Anonymousse (not verified) :: Thu, 10/17/2013 - 9:36am

"And that's assuming team quality is directly related to player quality (I mean, it probably is, but it's still an assumption)"

Seeing the difference between last year's chiefs, and this year's, I'd say that coaching is a pretty large factor (also between both years Eagles)... Which makes this difficult.

Football just has so many moving parts, and so few ways to isolate the contributions of those parts.

by sundown (not verified) :: Thu, 10/17/2013 - 12:36pm

It'd be a pointless exercise. Not even sure how you'd define "evenly distributed." Teams will always be better or worse at various positions... you'd have to average it all out somehow and probably weight various positions. And there'd be no provably correct way to do any of that. But even if you got that far, you've got coaching coming into play, the motivation of players, etc. (Because even if the talent was identical, the motivations wouldn't have to be and that'd make a huge difference.)

by dan s (not verified) :: Thu, 10/17/2013 - 9:40am

I agree with you that it's the best baseline assumption absent a way of measuring talent. But I'm curious the normal distribution isn't the norm when you're talking about freakishly talented people who are already a few standard deviations out from average.

Maybe the normal distribution still works at this extreme--there's an average NFL talent level, and even this far out, there's a bell curve around the new normal. But it'd be interesting if there were a different pattern going on...maybe distributions get funkier this far out from the mean.

by EricL :: Thu, 10/17/2013 - 4:50pm

I took this to mean not "talent levels in the NFL are normally distributed" (they shouldn't be - they're the far right edge of the bell curve), but the talent _within the league_ is normally distributed _among the teams_.

So, your hypothetical peak-of-the-bell-curve 8-8 team has a mean talent level, and it's normally distributed from there. Teams like the Seahawks (generally accepted as having one of the deepest rosters in the league) would be at one end of the curve, and the Jaguars at the other.

by dansvirsky :: Thu, 10/17/2013 - 9:50am

You know, Andrew Gelman writes a lot about how assuming a normal distribution actually isn't hugely important for running statistical tests. See: http://andrewgelman.com/2013/08/04/19470/

That said, now that I reread it, he does caution that lack of normality is an issue for predicting individual data points, so maybe we're still in hot water here. I wish I understood the nuts and bolts of this stuff more.

by Danny Tuccitto :: Fri, 10/18/2013 - 1:36am

I actually think this is one of the bigger issues with how the NFL stats community deals with data. It's true that the normality assumption -- whether univariate or multivariate -- is usually not that big a deal, but sometimes it's a huge deal, and people need to take that into account. And the good news is that there are plenty of statistical techniques that can adjust an analysis to account for non-normality. For instance, this journal article I got published back in graduate school used a weighted least squares estimator that's robust to non-normal data.

by DrP (not verified) :: Thu, 10/17/2013 - 10:09am

6%, is about 4x larger than 1.5%, but focus on the actual magnitude instead of the relative and it is still in the range of see you next year.

They have a theoretical chance, but that still requires a huge amount of change for them.

by whckandrw (not verified) :: Thu, 10/17/2013 - 12:18pm

How do you guys still not have a thread for Josh Freeman being named the starter in Minnesota?

by Raiderjoe :: Thu, 10/17/2013 - 12:24pm

Not sure. Maybe if tema was more poplar there would be post about J. Freeman being named stating qb Minn. Vikes.

by sundown (not verified) :: Thu, 10/17/2013 - 12:42pm

Don't ask. It'll only bring pain. There's probably something buried in the FAQ section about how they never post about new starting quarterbacks until he's taken his first snap as a starter, or something.

by Rivers McCown :: Fri, 10/18/2013 - 1:27am

Yes, we actually throw people who ask questions like that on the torture rack. ROBOPUNTER runs it -- he's sadistic.

Honestly I'd love to see you guys have the ability to just nominate your own links so we could run things that way once we actually upgrade the website. Most of the reason why I, personally, did not deem a "Josh Freeman is a starter" thread a worthy enterprise is because we already had threads on him getting cut and him getting signed. How much of the Josh Freeman audience are we trying to pander to? If you guys really wanna branch off on that take -- we do have an open injury thread now that could be utilized. Just pretend it says "lineup changes" instead of "injuries." Hey, you can discuss Case Keenum there too! I think he'll be terrible!

by RickD :: Thu, 10/17/2013 - 1:26pm

Not every personnel decision gets its own thread. In fact, I'm hard-pressed to think of a situation where a change in the starting QB did get its own thread.

by apk3000 :: Thu, 10/17/2013 - 2:12pm

Did the change to Kaepernick last season get one? Otherwise, "bad team changes QB" isn't much of a headline.

by jds :: Thu, 10/17/2013 - 3:51pm

Tanier already did that:


Or rather, he told you why you don't need a thread for that (no thread for Thad Lewis either).

That is, unless you are trying to start the Joe Webb controversy.

by Andrew Potter :: Thu, 10/17/2013 - 4:33pm

Or Case Keenum, for that matter.

Best of luck to 'em.

by Pat (filler) (not verified) :: Thu, 10/17/2013 - 12:44pm

The fly in the ointment in the article here is that he's assuming a flat prior. In the first part, where you try to figure out the likelihood that the Giants are a 0.500 team, that part is mostly OK, although not perfect: the chance of drawing a team from the NFL that's 0.500 or better doesn't have to be 50%. In the past few years, the number of teams 0.500 or better was 17, 20, 16, 20, 21, 17, which is an average of 18.5. There's a slight upward bias there because the NFL plays an even number of games, but that's not that much. There's no reason that the true strength distribution has to be symmetric about 0: naively you'd expect it *not* to be, because it's much harder to be perfect than it is to be terrible, since the other team is always trying to win.

But the second part, where he calculates the expected value of the posterior distribution, doesn't consider the true strength probability distribution at all, which means he's assuming a flat prior - just as likely for a team to go 8-8 as 0-16.

The next fly, of course, is assuming that NFL games are a coin flip. They're not: they're contests, which means that your team's likelihood of winning is not the true strength value - it's the output of some function (often called a 'game output function') which takes the true strength of your team, and the true strength of your opponent, and outputs the likelihood of winning.

by Pat (filler) (not verified) :: Thu, 10/17/2013 - 12:54pm

Just realized how to take out the bias from the even number of games: you just assign an 8-8 team as half a team above 0.500. If you do that, you end up with an average number of teams above 0.500 for the past 10 years as 16.15. So the assumption of the probability for a 0.500 team being 0.500 is probably close enough to perfect that any bias doesn't matter.

Rest of the comment stays the same, though.

by RickD :: Thu, 10/17/2013 - 1:52pm

A lot of this stuff is just formulated badly.

For example:

"What you want to know is the likelihood that the Giants are actually a .500 or better team. "

No, we know that the Giants are not "actually" a .500 or better team. They are actually a .000 team.

What he wants to say is "Suppose we use a model where any team has a pure 'win probability'. What is the probability that the Giants' 'win probability' is .500 or more?"

It's not clear that this is a reasonable way to model all the outcomes of the games of a football season. Football teams have schedules where the number of wins is determined by games they play against each other. The presumption that each team has an independent "win probability" and that the outcome of any game can be determined by said "win probability" presents theoretical difficulties. What happens when Team A with "win probability" p_A meets Team B with "win probability" p_B? It's clear that the win probability of the game is conditional on both teams playing, not just one.

So there must be more constraints in the system that are being allowed here. We cannot simply sample 32 "winning probabilities" from a normal distribution and say that is a feasible set of win probabilities.

I feel like there are some flaws here. I would be happier with a model that allowed for an extra variable, "team strength", which itself could be normally distributed. And then the outcome of any given game could be expressed as a function of the relative strengths of the two teams. I don't think the implied model Chase Stuart is using can work . He wants to go backward from summary statistics to model parameters. I'm not sure that's really possible here.

by Pat (filler) (not verified) :: Thu, 10/17/2013 - 3:08pm

The "true strength" model of a player in a 2-player contest is pretty much used in all game-theory models of sports: to that you have to add a "game output function" - something that takes 2 'true strengths' and outputs a probability (e.g. GOF(p_A, p_B) = probability for A to win the game). The true strengths can be normalized any which way, so it makes sense to scale them such that GOF(p_A, 0.5) = p_A: that is, the probability of a team to win against a team with 'true strength' (p_B) = 0.5 is equal to their 'true strength', and 'true strength' = 0.5 is equal to an average team. Sports don't always do this, of course: the Elo rankings in chess range from [100, infinity], for instance, although the practical upper bound is probably 3000-ish, but those are perfectly usable "true strength" measures as well.

So you're absolutely right that at first glance it seems goofy: however you could make his statement a bit clearer if you said "Suppose we use a model where teams have a true strength, normalized such that 0.500 is an average team, and the true strength is equal to your win probability over an average opponent. Then, if we assume that the Giants have played 6 'average opponents' so far, what's the probability that their 'true strength' is 0.500 or more?"

With that, it's clear what the limitations are in the test: with only 6 opponents, the likelihood that you've averaged things out to an 'average opponent' is really, really low.

by TomC :: Thu, 10/17/2013 - 2:33pm

FO already does Bayesian statistics; that's what DAVE is.

by Danny Tuccitto :: Fri, 10/18/2013 - 1:38am

In spirit, yes. In application, no. It does apply the general idea of a "prior belief," but doesn't apply the Bayesian math.

by TomC :: Fri, 10/18/2013 - 11:10am

Perhaps not explicitly, but I can definitely write down a Bayesian expression for the probability distribution for the quality of Team X that returns DVOA as the peak of the (log) likelihood and DAVE as the peak of the posterior.

(edit: that is, if I knew the DVOA and preseason projection formulae, which I don't. Really. I swear.)

by grady graddy (not verified) :: Thu, 10/17/2013 - 6:15pm

It should be noted that Bayesian work in sabermetrics predated Silver's linked article by many years. See the many discussions by mgl, tangotiger etc. on BTF and the InsideTheBook website.

by Danny Tuccitto :: Fri, 10/18/2013 - 1:39am

Oh yeah, totally agreed.

by akn :: Thu, 10/17/2013 - 8:05pm

I'm familiar with and use Bayesian methods now and then, but this article (and the baseball-based version it's based off of), do a really poor job of explaining the concepts, and a really poor job of presenting the math. In this day and age, at least make a token attempt to show larger equations/calculation formatted properly instead of some quadruple parentheses monster. It takes literally no effort to use something like the Wolfram-Alpha computational knowledge engine to more coherently present your work.

by Danny Tuccitto :: Fri, 10/18/2013 - 1:41am

Hehe. I can sympathize with that critique, just understand that a) Chase isn't a mathematician/statistician by trade, and b) the fact that someone is delving into Bayesian statistics with respect to football is far more important in the long run than that they presented in a clunky way.

by bubqr :: Fri, 10/18/2013 - 6:47am

I really enjoyed Nate Siver book in that regard.