07 Aug 2006

*Guest Column by Patrick Allison*

It's often said that NFL preseason is pointless. The players themselves often say they would rather just have the season start, and coaches constantly worry about someone getting injured. People complain it's not real football, because coaches don't gameplan the same, starting players don't play the entire game, and probably most importantly, the teams don't actually care about winning, for the most part.

But on the other side, it's not like offensive linemen protect the quarterback any less just because it's preseason. If a tackle misses a blocking assignment, and the QB goes down, that could doom the entire season. They have to play as if it's real. And no QB will actively try to throw an interception, even in preseason, unless you're Chad Hutchinson.

As for the fact that some of the players aren't the same, well, to be honest, the regular season faces those problems as well. The Buffalo Bills that you face in Week 1 are probably not the same Bills you face in Week 17. Plus, over the season, every team needs to draw from its backups as players get injured. The quality of your backups eventually becomes the quality of your team. Depth is every bit as important as the ability of your first string.

That being said, can we find an actual correlation between preseason and regular season performance? Well, it's difficult; it's only four games in preseason. Do the number of wins in the first four games of the regular season correlate with the number of wins in the remaining 12? Not well. The Patriots and Eagles of 2003 know this from pleasant experience. Plus, what if your preseason contests in 2004 were against four very bad teams? It's easy to see that this approach can be very heavily biased.

Thankfully, there is a way we can deal with this bias. Occasionally, though not often, NFL teams play a team in preseason and then face them again in the regular season. So we can try looking for a correlation in the point difference in the preseason game, and the point difference in the regular season game. Let's say Denver beats San Francisco in the preseason by 30 points. What happens during the regular season if they meet again?

First, let's look at the distribution of point differentials from all NFL games from 2000-2005:

It's a nice, smooth distribution, with a width (RMS) of about 15 points. That is, about 70 percent of all games in the NFL are decided by 15 points or less. That means, if we have no correlation between preseason games and regular season performance, if Denver beats San Francisco by 30 points in the preseason, we would not expect Denver to beat San Francisco again by 30 points -- it's just too uncommon an occurance in the NFL for this to happen twice if the first occurance didn't mean anything. So, by looking at the correlation between the point differentials, we can see if the preseason game influences the regular season game's results.

This is going to be a sloppy, sloppy method. Scoring in football is quantized in weird quantities. It may be easier to get a 7-point differential than a 1-point differential. But we still might be able to see something, though obviously smoother quantities like FO's DVOA stats would be better here.

So we know that this is going to be sloppy. But can we learn anything from it? Well, the first question to ask is what does this look like in a situation where we know that the games mean something? That we can answer easily. Teams in the same division meet each other twice in the regular season, so we just have to look at those games. If Philadelphia beats Washington by 30 points, what does that mean for the next game? So here's a plot of first game regular season point differential vs. second game regular season point differential for the past six years.

The grey line is just to guide the eye -- it's a correlation of 1, that is, the first game point differential equals the second game point differential. The pink line is a linear regression fit. While the fit looks pretty flat -- it's a correlation coefficient of 0.20 -- given the number of points, the trend is quite significant (P-value of 0.008, for those who care). Note how few teams lost their first game by 14 points or more and won the second game by 14 points or more - only four. In fact, only once did a team lose by 20 or more points in their first meeting and win by 20 or more (for real) in their second: the New England Patriots, who lost 31-0 to the Buffalo Bills in 2003, and then beat them 31-0 at the end of the season. (The other point was the Vikings beating the Bears in a meaningless Week 17 game.) That's what those two far outliers are (one for the Patriots, one for the Bills). So if you thought that was weird, you should have. In contrast, there were 17 situations where a team won their first game by 14 or more points, and won their second game by 14 or more points.

This result is exactly what we expect. That's good. It means that people who try to figure out what game results mean (like, anyone at FO) aren't out of their minds -- if a better team beats a worse team once, they're likely to do it again. Game results aren't random. To illustrate this a little more clearly, let's look at that distribution of point differentials -- but now, let's look at the distribution of point differentials from the second game, after the first game has been won by more than 14 points.

It looks similar to the first distribution, with a spread of about 15 points, but the mean is now shifted by about 5 points. So if a team wins their first game by 14 or more points, the average outcome of the second game is a win by 5 points. Unless you're the Buffalo Bills.

Now that we know what the regular season games look like (where we all agree they mean something), let's go to the preseason since 2000. Let's be a little more intelligent about it, though. Let's look at only the first half differential in the preseason versus the full differential in the season (or postseason -- both are included). The correlation is significantly reduced if we use the full game, which is a good thing -- this is what our intuition would expect, since the players in the second half sometimes are on different teams during the season.

This time around, the correlation is actually stronger (correlation coefficient of 0.31), though the significance is less (P-value of 0.04), though still significant. There are no teams who were losing by more than 20 points who won their regular-season (or postseason) games by more than 20 points.

Thanks to Michael David Smith for pointing out why the correlation could be stronger: in the regular season, all intradivisional games are played once at home, once away, whereas that's not the case for preseason rematches. Since we know that home field advantage exists, we actually expect the regular season correlation to be weaker. As a stupid example, imagine that home field advantage gives a team 5 points. If the Packers play at Chicago, and lose 10-25, when Chicago plays at Lambeau, you might expect them to lose 15-20. This would put a point on the previous chart at (-15, -5), which would tend to flatten the correlation.

There aren't enough data points to make the same distribution as before, but we can move things down a little and look at all games where the point differential at the half was more than 7 points. This distribution's got a bit more of a tail than the first one.

However, it's still shifted positive significantly. So if you are more than 7 points ahead of an opponent at the half, on average, you beat them by about 6 points in the regular season. To put things in perspective, note that of the 44 teams in this sample, 28 beat the teams a second time. Only 16 teams lost, and only six by more than 10 points. The tail here is Tennessee, San Diego, and Washington, who must've ticked off Oakland, San Francisco, and Pittsburgh, losing by 27, 28, and 21 points respectively. (It took teams a while to remember Ryan Leaf is god-awful).

So what exactly does this mean? Well, it's difficult to say. The sample size is certainly small, that's for sure, but the effect is pretty significant. It looks like preseason first halves do have predictive power on the regular season.

These are the common preseason games this year, and there are an unusual number of them:

- Buffalo-Detroit, preseason Week 4, rematch Week 6.
- Cincinnati-Indianapolis, preseason Week 4, rematch Week 15.
- Denver-Arizona, preseason Week 4, rematch Week 15.
- Jacksonville-Miami, preseason Week 1, rematch Week 13.
- Kansas City-St. Louis, preseason Week 3, rematch Week 9.
- Oakland-San Francisco, preseason Week 3, rematch Week 5.
- Oakland-Seattle, preseason Week 4, rematch Week 9.
- San Diego-San Francisco, preseason Week 4, rematch Week 6.
- Arizona-Chicago, preseason Week 3, rematch Week 6.
- Carolina-Pittsburgh, preseason Week 4, rematch Week 14.
- Chicago-San Francisco, preseason Week 1, rematch Week 8.
- Dallas-New Orleans, preseason Week 2, rematch Week 14.

(Ed. note: The number of preseason/regular season rematches is high this year because the AFC West and NFC West tend to schedule each other for preseason games to cut down on travel costs -- and this year, they are also scheduled for interconference play.)

Certainly the preseason is less important than the regular season -- after all, these games don't actually count. Plus, coaches don't usually put backups in when the game is close, and we know they gameplan differently in the regular season. But it seems like the data are trying to indicate that preseason does, in fact, mean something -- at the very least, fans should be concerned when teams are blown out in the first half. Dismissing the games outright is probably a little irrational.

For another take on this which shows that preseason does in fact matter, check out TwoMinuteWarning.com. This is an updated version of an article TMW first did in 2004, and it looks at what the preseason can tell us about total wins and losses in the upcoming regular season.

One final point: there's not enough data in four years of the preseason to see if the preseason/regular season correlation gets weaker as the number of weeks separating the games increases, but we know from weighted DVOA that the predictive power of older games drops significantly after about 13 weeks. So take the common games where the second game is much later in the season with a grain of salt.

*Patrick Allison is a graduate student in physics at Ohio State, who is thankful for free time on airplanes to work with random football statistics. If you are interested in writing a guest column for Football Outsiders, something with a unique take on the NFL, please e-mail info-at-footballoutsiders.com.*

56 comments, Last at 16 Aug 2006, 5:45pm by Pat

## Comments

Kibbles (not verified):: Mon, 08/07/2006 - 1:52pmUmm... I think those charts are out of order. When you're talking about a pink line or a grey line, I'm seeing a bar chart with a tail, and when you're talking about the chart's tail, I'm seeing a scatterplot with a pink line and a grey line.

Pat (not verified):: Mon, 08/07/2006 - 1:58pmThey are - I emailed Aaron about it. They should go 1, 4, 2, 5, 3 rather than the order they're in.

I blame myself.

admin:: Mon, 08/07/2006 - 2:00pmShould be fixed now.

bmw1 (not verified):: Mon, 08/07/2006 - 2:27pmPat-

Why didn't you run a regression on the preseason-regular season rematches? It could give you a pretty nifty prediction equation, which would be interesting from a fan and statistical view, but might also help those who wager on the games.

Pat (not verified):: Mon, 08/07/2006 - 2:52pmbmw1: I did - it's mentioned in the article (that's what the 4th figure is). It's a correlation of about 0.31, plus or minus 0.1 or so.

But: Take a look at the 4th figure. You'll note that the pink line (the regression) looks like it's above most data points at the early part, and below for the late part, and it is - that's what point quantization (i.e. 7 points for a touchdown) does for you. The close games (wins within, say, 7 points) flatten the regression a lot. If I had just done a regression with victories of, say, 10 points or more, the correlation would be higher.

So I wouldn't read too much into preseason games where the score at the half is, say, 7 points or less. But if Chicago comes out and is up by 21 on Friday over San Francisco, it's going to be another long year in the Bay Area.

bmw1 (not verified):: Mon, 08/07/2006 - 3:04pmPat-

I'm confused; is .31 the Beta coefficient or the Pearson (or some other) correlation from the regression represented by the 4th figure?

If it's just the correlation, and if you did run a full regression, you should throw the full equation on at the end of the article.

bmw1 (not verified):: Mon, 08/07/2006 - 3:08pmOh yeah, and what is the multiple R squared from the regression?

thad (not verified):: Mon, 08/07/2006 - 4:23pmYou know I did not look at the writers name, I just started reading. Halfway thru I thought, good God, Pat is the only one who is gonna understand all this.

Well I get the gist of it, and without doing all the math I had assumed it was true.

great article Pat

thad (not verified):: Mon, 08/07/2006 - 4:33pm2 questions

1. What is RMS

2. Why didnt you use standard deviation?

Todd T (not verified):: Mon, 08/07/2006 - 4:55pmIt's great that the author took a shot at this topic that everyone wonders about. But I'm not sure there was anything very meaningful in the findings. There's statistical significance (as reflected here by p-values etc.) and then there's actual significance. The statistical significance that the author found indicates that there is a non-zero correlation, but to really draw any useful conclusions, that correlation would need to be larger than 0.3. Visually, this is borne out by the large fraction of points that are in the "northwest" and "southeast" quadrants of the scatter plots. These are the game pairs where the results were reversed from one game to the next. These are barely outnumbered by the non-reversals, but because there were a lot of observations, the statistical test indicated good confidence in that small difference.

I wonder how much of the statistical significance arises from using each game as two separate data points, one for each team. IMHO this is not really appropriate. Doubling the sample size makes it much easier to be confident about a small observed effect being real. But one game pair is really one observation, even though two teams are involved.

Pat (not verified):: Mon, 08/07/2006 - 4:59pmI'm not that fond of the way most standard statistical measures are used - they usually get taken well out of context. The correlation seen is significant (p-value of ~0.04), and given the sample size compared to the full-season data, this tells you that "teams are as likely to repeat preseason half 1 performance in the regular season as often (or better) as their regular season performance".

The R-squared of the regular season-regular season and preseason/regular season correlations are just 0.04. The majority of the variation you see is due to point quantization.

If itâ€™s just the correlation, and if you did run a full regression, you should throw the full equation on at the end of the article.

It's not bias-free - it'll overpredict the small margins, and underpredict the large margins. The actual equation is less important than the fact that inasmuch as regular season games predict future regular season games, preseason games predict future regular season games.

If you do wager on games (and I don't), this is just a guide. Say you're trying to guess how Dallas will do versus Philly in December. Well, you'd look at how they did the previous game they played in, right? In exactly the same way, if you're trying to see how Chicago will do against San Francisco in Week 8, look at how they did against them in the first half this Friday.

What is RMS/Why didnt you use standard deviation

RMS means Root-mean-square. The RMS listed here is actually "RMS deviation from the mean", or standard deviation. It's not called standard deviation because physicists like to sound better than statisticians.

Pat (not verified):: Mon, 08/07/2006 - 5:06pmThe correlation coefficient isn't that important. The importance is whether or not preseason game scores are related to regular game rematches just like regular season rematches are. And they are. This tells you that preseason games are as predictive as regular season games are of performance in a rematch.

The correlation coefficient and R-squared aren't going to get that high because it's a game. :)

I wonder how much of the statistical significance arises from using each game as two separate data points, one for each team.

Only one data point per game is used to calculate the p-value. I've taken statistics. :)

Fnor (not verified):: Mon, 08/07/2006 - 7:25pmHopefully at PSU and not OSU... I remember OSU's stats professors....

Pat (not verified):: Mon, 08/07/2006 - 7:55pmActually, the only class I've taken at OSU was a class specifically created to generate a GPA for me. I transferred mid-grad school, after I had finished all of my classes at Penn State, but when I got here, I was told I needed to take at least one class because I couldn't graduate without a GPA.

So yes, it was at Penn State. :)

GlennW (not verified):: Mon, 08/07/2006 - 8:00pmThe math department should understand the limited statistical significance of only one class towards GPA. Next thing you know they're making you class valedictorian...

Jason Scheib (not verified):: Mon, 08/07/2006 - 8:50pmDo you know if it makes a difference if you only look at the score at the end of the first quarter? Only because, especially early in preseason, the starting units are only on the field about that long if that, and so it seems like that may possibly correlate better. I understand that doesn't give a team much time to build a significant point differential, but just a thought.

Pat (not verified):: Mon, 08/07/2006 - 9:19pmI tried doing that, and you're exactly right - I don't think it gives enough time. The correlation's significantly weaker (which is wrong - it should be stronger, because one thing I didn't say in the article is that part of the increase is also due to the fact that I'm using only half a game - obviously with a full game a team has more time to build a much larger point differential. That's not the entire reason, though, the rest is what MDS pointed out).

Basically, the closer the RMS average point differential gets to zero, the more point quantization comes into play. Take last night, for instance - Philly was up 7-0 at the end of the first quarter and completely dominated the first-team Raiders. But you could imagine a much harder-fought first quarter with the teams much closer, and still ending up with a 7-0 score.

Preseason VOA might be interesting to look at, though, since it's smoother. You definitely would've been able to tell that Philly smothered the Raiders. In Aaron's infinite free time, of course.

Incidentally, I also tried excluding the last week of the preseason. That reduces the significance. That's fairly interesting to me. My personal guess on that, though, is that the performance of the second-string backups somewhat echoes the performance of the first string, which kinda makes sense. If your first string sucks, the second string is probably going to suck more. Teams usually don't keep better players on the second string. Most likely that tapers off for third and fourth string players.

David (not verified):: Tue, 08/08/2006 - 1:36amOkay guys. Let's not be ridiculous here. You are showing us a scatterplot with a correlation of .20.

What you do not tell your fine readers is that your plot is a scattershot, which indicates no correlation. Correlations read between -1 and 1. The closer you get to zero, the less of a correlation there is. Any statistition would tell you that this result is a meaningless one. That regression line is pretty worthless too.

James G (not verified):: Tue, 08/08/2006 - 8:56amDavid - I don' think it's that bad. First, as Pat says, he is comparing preseason correlation to regular season correlation. So the Q is, how do those correlations compare? Second, some of the things that are in PFP such as DVOA to winning % the next year have pretty low correlations, too (although often better than 0.2). Third, I've seen presentations in neuroscience (although that's not my field) with even worse correlations than that with actualy conclusions drawn.

Shannon (not verified):: Tue, 08/08/2006 - 9:08amDid you consider normalizing the point differentials?

You mentioned home field advantage, and looked at points greater than 7 points; but that eliminates a lot of data points.

I recommend subracting the home field advantage from the second game when the field is switched, and doing nothing when it is not.

IIRC in the Pats/Bills outlier both teams lost at home, so this would push them further out. But on average if both effects are real it should increase the correlation significantly.

Duck in MA (not verified):: Tue, 08/08/2006 - 9:22amPatrick, are those plots done in ROOT? I was staring at them and thinking to myself I've seen a million like them, only with muons. Great article, I guess I'll have to watch the preseason games now. Well, I was going to any way, but now I can argue with my wife that they are statistically correlated to the regular season and thus MUST watch them.

Pat (not verified):: Tue, 08/08/2006 - 10:42amAny statistition would tell you that this result is a meaningless one.

Any statistician would tell you that the important value for significance is p-value, not the correlation coefficient. If I take a set of numbers, X, and plot them versus (0.2*x) and do a regression, I'll get a correlation coefficient of 0.2. The p-value will be zero, which means there's no chance that it's random whatsoever. That means "these numbers are related." The fact that the correlation coefficient is 0.2 just means they're related in a certain way.

The correlation coefficient itself doesn't really matter here. I know that the regression doesn't explain the majority of the spread. Of course not. It's a game. We wouldn't watch it if it did. But the important point is what James G pointed out: the preseason correlation is essentially identical to the regular season one.

Plus, as I've mentioned above, the correlation coefficient gets flattened by the points in the center. This is just because I technically should've given each point an error bar of +/- 7, because that's essentially the point quanta you're dealing with. But this isn't important, because I do the same thing for both groups.

Have I mentioned I don't like the fact that statistics terms get used out of context? :)

Patrick, are those plots done in ROOT?

Yup. Histograms in a spreadsheet program are beyond awful.

turbohappy (not verified):: Tue, 08/08/2006 - 12:38pmYou would think they would be related in some way, interesting article. After the Colts went 0-5 in the preseason and then started the season 13-0 I started to wonder though ;o)

Pat (not verified):: Tue, 08/08/2006 - 1:06pmThing is, Indy's preseason slate in 2005 wasn't pretty - the easiest team they faced was Buffalo. Atlanta, Chicago, Denver, Cincinnati were the other teams. So that's 1 bad team, 1 average team, and 3 playoff teams. Heck, Indy faced almost as many playoff teams (3) in the preseason as they did in the regular season (4)!

The only game that really was an outlier game was the Buffalo game, and anyone who actually saw the Buffalo-Indy preseason game realizes why.

That's part of the bias I mentioned early on. Had Indy actually played Atlanta, Buffalo, Chicago, Denver, and Cincinnati early in the regular season in a row like that, I think there's probably a good chance they would've lost a few.

So when Indy went 0-5, that didn't mean they were going to be bad. It meant that the teams they played were going to be on par with them. And in 3 out of the 5 cases, that was true.

Todd T (not verified):: Tue, 08/08/2006 - 2:21pmRe 22 and misuse of statistical terms, to be fair, the poster claimed that the results don't have useful meaning, not that you'd somehow goofed statistical significance. Reading your response to my statement along the same lines (10), I can buy that you have shown that the preseason correlation is on par with the regular season correlation. The reason I brought up the double-observation issue is that the plots show each point twice. Glad that that's the only place they show up twice.

RZHawk (not verified):: Tue, 08/08/2006 - 5:03pmYou might add the Seattle-San Deigo games on 8/26 and 12/24 to the preseason and reg. season rematches.

Nice article!

Pat (not verified):: Tue, 08/08/2006 - 6:27pmRZHawk:

Woah, woah woah: how did I miss that one?

Dangit. That will be an important game to watch, considering both teams could be in the playoff race.

Dan (not verified):: Tue, 08/08/2006 - 6:47pmNice article, Pat. It would be really interesting to see how well you can predict the results of the first few games of the regular season using preason stats (like DVOA) and the stats from the previous weeks of the season, and to see how long you had to go into the regular season until there was no added value to taking preseason stats into account (like the 13-week mark for regular season games).

I have one statistical nitpick (on a comment, no less). This doesn't sound right:

If I take a set of numbers, X, and plot them versus (0.2*x) and do a regression, Iâ€™ll get a correlation coefficient of 0.2. The p-value will be zero, which means thereâ€™s no chance that itâ€™s random whatsoever. That means â€œthese numbers are related.â€? The fact that the correlation coefficient is 0.2 just means theyâ€™re related in a certain way.

Won't r=1, and p be some number greater than 0 (with p smaller when the set of numbers is larger)? Fahrenheit temperature and Celsius temperature, for instance, have a correlation of one, becaue they have a perfect linear relationship.

Pat (not verified):: Tue, 08/08/2006 - 8:02pmIt would be really interesting to see how well you can predict the results of the first few games of the regular season using preason stats (like DVOA)

I agree. It was fairly obvious to me that the limiting factor in this study was the granularity of the score. I mean, those full plots are essentially -5 to 5 in terms of touchdowns, and if you think of football scores as being +/- 1 touchdown, you can see that quantization will just kill any correlation. To be honest, I'm actually amazed there was much of one at all.

That's why I don't really understand people who ignore preseason games.

You're right that r will be 1. I was trying to find a good example and I mixed up correlation coefficient and just the linear coefficient. The coin flip overtime example is correct, though. I thought of that later, and it's a much more appropriate example: winning the coin flip influences overtime victory, but it isn't the sole cause of victory, and the simple counting nature of the statistic means that the correlation is only ever going to be a small portion of the spread. Likewise, in football, being better than your opponent previously in the season influences the likelihood of you winning again, but it isn't the sole cause of the score of the next game.

p will be strictly zero, though, in that example, if you've got zero error (i.e. the linear coefficient will be some number with purely zero error - i.e. the chi-squared is zero).

Dan (not verified):: Tue, 08/08/2006 - 8:43pmYou're right about p.

Something else that would be really neat, and would provide some context for judging how meaningful preseason games are, would be to run the same type of analysis that you did here, except looking at teams that play each other:

- twice in the same season (those would be teams in the same division)

- in consecutive seasons (again, that's mostly division games. You'd probably look at the correlation between the second regular season game one year and the first regular season game the next year)

- in the regular season and the playoffs in the same year (although you might not be able to get the sample size there)

You'd want to be sure to make the 3-point home field advantage correction, since in one of those cases every pair of games is at different fields.

Pat (not verified):: Tue, 08/08/2006 - 8:48pmRegular season/playoffs are in the regular season/regular season plots here. I didn't distinguish when they played. There's only a handful of them, and a few of them are those throwaway ones like Denver/Indy from 2004.

Mark (not verified):: Wed, 08/09/2006 - 6:48amI don't think this study proves much until it is demonstrated that:

a) A team's 1st-half pre-season result vs. a given opponent is more strongly correlated to the the reg. season re-match result than it is to reg. season results against similar-caliber opponents. i.e., yes, the pre-season means something, but does it truly mean anything unique with regards to a specific matchup?

or

b) the predicted outcome for a reg. season game (using some standard power rating system) is more accurate when a pre-season (1st half) result is factored in than when it is left out.

In short, I am not a believer in assigning special value to previous matchups. I have no affiliation with the following site, but it has convincing evidence that using past matchups (more than past games overall) to predict re-match outcomes is not effective:

http://frappe.dolphinsim.com/ratings/notes/matchups.html

James G (not verified):: Wed, 08/09/2006 - 8:26amMark - I love that site (I can't really claim neutrality - the guy who runs it was 2 years ahead of me in undergrad school and lived just a few doors down from me), and I think what he is saying is also reflected in the DVOA stats here - Accounting for all games is better than giving more weight to rematches. However, I still think Pat's study is valid in that it shows that there is some correlation for rematches (granted, more games is better because the team's actual strength is better represented), and that preseason first half score is at least as correlated to rematch as a regular season rematch is.

Pat (not verified):: Wed, 08/09/2006 - 9:37amSee, James understands! Woo!

Of course using only rematches aren't going to be the best way to predict the outcome of a game - there's more information available, and using less information to predict an outcome is always a failure.

The only reason I'm using rematches alone here is because it removes the strength-of-schedule bias. See the comment regarding Indy's preseason last year: 0-5, followed by 13-0, doesn't look like a good predictor. Except for the fact that that 0-5 was against very good teams, and so while 0-5 is definitely low for the quality of the team, it's not crazy.

thad (not verified):: Wed, 08/09/2006 - 11:15amP(sa,sb|ma,mb) = C ma^(sa/F) e^(-sa/F) mb^(sb/F) e^(-sb/F)

A formula from the site James G loves.

This is just waaayyyyyy beyond my understanding.

Doesn't it seem like the more analysis that is done the math just gets more intense. And the understanding of win probability does not increase proportionatly.

That was brutal, I am gonna go read espn now

Enkidu (not verified):: Wed, 08/09/2006 - 12:26pmIf "we know that home field advantage exists," then why is the mean point differential zero?

Pat (not verified):: Wed, 08/09/2006 - 12:40pmThere's a point in there for each team. Team A beats Team B, gets point differential +X. Team B gets point differential -X. Mean point differential is zero (exactly).

You'd see home field advantage if you plotted a histogram of all of the point differentials for the home teams.

Pat (not verified):: Wed, 08/09/2006 - 12:47pmOh, and I should note that the website listed above was looking at win probability, not score correlation. Given the huge significance (p less than 1%) of the regular season-regular season matchup, I can't agree with those results. Even though the correlation coefficient is small (so only ~4-5% of the variation is explained by the correlation) that should still lead to more than what's quoted at that website.

Just looking at the plot by eye you can see there's far more games along the diagonal than off of it.

James G (not verified):: Wed, 08/09/2006 - 1:04pmI think there's a difference between Andrew's data and Pat's data and a difference in Q asked. In Andrew's site, the Q is: Given the large set of data he has, should particular importance be given to head-to-head games. He also has another article on head-to-head (very math intensive, linked to my name) showing similar results.

Pat's question is due head-to-head games correlate with each other. I am not surprised at either answer.

This is the way I see it: In isolation, if A beats B, A is likely better than B, and the correlation in Pat's study suggests as much.

If A beats B, C, and D, and loses to E, but E has lost to B, C, and D, it's likely that A is better than E despite the head-to-head loss. This is where the full set of data comes in. Computer rankings would show this and no special emphasis should be placed on E beating A. And that's what the Dolphin study is showing.

In the two team scenario, A beating B means A is some % likely to be better than B, and that would show up in correlations in rematches. And in fact, by including point gap, we can even see what that likelihood is. However, the same principle can be used for teams that don't play each other, but are connected through other team's head-to-head results.

Pat (not verified):: Wed, 08/09/2006 - 1:20pmOh, duh, I misunderstood that. You're right. He's looking for an excess factor. Right.

That's not what I'm interested in. I'm just looking to correct for the schedule bias. That is, we know that if team A beats team B in the regular season, they're likely better, and are more likely to win that game again. Thus, the regular season "matters". We now also know that if team A beats team B in the preseason, they're likely better, and are more likely to win that game again. Thus, the preseason matters as well, in the same way.

Enkidu (not verified):: Wed, 08/09/2006 - 2:13pmRe: 37.

Then why is the distribution asymmetric about zero?

Jim A (not verified):: Wed, 08/09/2006 - 3:21pmThis is a very nice study, perhaps not of much practical value by itself, but it implies that preseason games are indeed ripe for more meaningful analysis if you can filter out the noise.

BTW, Andy Dolphin is one of the more brilliant sabermetricians around and has just co-authored a highly-regarded baseball analysis book. Too bad James G couldn't have convinced him to do more work on football analysis.

Pat (not verified):: Wed, 08/09/2006 - 5:20pmThen why is the distribution asymmetric about zero?

I think that's because the binning wasn't symmetric about zero on that one, but I'm not sure. That's a mistake, though. It should be symmetric. The mean and standard deviation are independent of the binning, though, and the general shape is correct.

James G (not verified):: Thu, 08/10/2006 - 10:35amI've been thinking about Jim A's comment - what Pat's study suggests is that we should measure DVOA in first halves of preseason games and see if that's as good a predictor of the first 4 games of the season as DVOA from the first 4 games is of the next four games. The important point, however is that is has to be DVOA and not VOA, and I wonder if there is enough interconnectivity between teams in the preseason for that value to be legitimate.

By only looking at rematches, Pat has in fact, correced for this, but at the expense of removing a number of data points.

I think a more layman's way of the Dolphin study would be saying that there is no such "team X" owns "team Y" any more than team X is normally superior to team Y. In 2004, when the 8-8 Rams beat the 9-7 Seahwaks twice in the regular season and once in the playoffs, it was as much a product of even teams with "game noise" favoring the Rams as it was that the Rams owned the Seahawks.

charlie (not verified):: Thu, 08/10/2006 - 11:31amthis is football not math!! a lot can happen in a player or a whole tm's heart that has nothing to do with your stats. thats only one of the many other factors that could and do play a role .

Jim A (not verified):: Thu, 08/10/2006 - 1:25pmI agree that adjusting for strength of schedule is pretty important for a small sample size of 4 games, but I'm not sure it's absolutely critical. One idea that might be fairly easy to do is to compile a rating system based on only the first halves of preseason games, then compare it to the same rating system based on the first halves of the first 4 weeks of the regular season. Is one more predictive than the other? Ideally, you'd want a metric more fine-grained than points to avoid the "quantization" issues Pat describes, but this still might be revealing.

turbohappy (not verified):: Thu, 08/10/2006 - 2:28pmRe: 34

There is definitely more to the Indy story. In every preseason game last year they played their starters less than their opponents. In the Buffalo game it was a crazy amount, like 1 drive for Indy vs. more than a half for Buffalo. If both agreed and played starters vs. starters for the first half and scrubs vs. scrubs in the second half you could more easily judge things.

Pat (not verified):: Thu, 08/10/2006 - 4:40pmThere is definitely more to the Indy story. In every preseason game last year they played their starters less than their opponents. In the Buffalo game it was a crazy amount, like 1 drive for Indy vs. more than a half for Buffalo.

I don't think there is much more, actually, except maybe in the Cincinnati game, although there, like I said, it still showed Cincinnati wasn't likely to get crushed by Indianapolis. And they weren't.

The disparity in the Buffalo game was due to relatively unlikely events: a blocked punt returned for a touchdown and an interception.

I mean, would it really have been surprising to see "CHI 14 IND 7" in the regular season? The Chicago game looked a lot like their regular season efforts, too: offensive ineptitude, great defense and a bunch of turnovers.

As for the differences in playing time? That only existed in the Buffalo and Cincinnati game. In the others, the Colts and opponents started roughly exactly the same length of time.

Todd T (not verified):: Fri, 08/11/2006 - 11:48pmRe 37: Then each game-pair IS in the dataset twice, as I thought back in comment 10. I don't think that's the right thing to do statistically. The two games between New England and Buffalo are the same two games, the same pair, whether the counting is from New England's perspective or Buffalo's. Both observations are the same pair of events with signs reversed, as opposed to the second pair being a whole new pair of games between the same two teams. Therefore the second pair, from the second team's perspective, is not providing any new information, and is not a new random draw. The apparent power of the test goes way up - the p-value goes down - with these extra observations, but they aren't additional observations. But i'll shut up now about statistical practice.

Re: 41: They're symmetric around the ordered pair (0,0), not 0. For every (-10,+20) there's a (+10,-20).

Pat (not verified):: Sat, 08/12/2006 - 1:53pmThen each game-pair IS in the dataset twice, as I thought back in comment 10. I donâ€™t think thatâ€™s the right thing to do statistically.

No, it isn't. It's in the histograms (which it has to be), and it's in the plots (to make them look nice), but it isn't in the regression.

And even if they were in the regression, I do actually know how to correct for two data points that aren't independent.

The reason they have to be in the histograms is because what I'm really using the histograms for is to put a cut on them - when one team beats another by 14 or more, for instance. Since both teams can't beat each other by 14 points in the same game, each game is only in the histogram once.

The only histogram that actually contains both data pairs in the plot is the first one, and it doesn't make a difference there.

Enkidu (not verified):: Sat, 08/12/2006 - 2:52pmRe: 43.

It's not obvious to me how integers can be mis-binned.

Re: 49.

You're correct to point out that the scatter plots have an inversion symmetry thanks to the duplication. But then the first histogram must be symmetric about zero, since it's effectively the histogram over either dimension of the (symmetric) first scatter plot.

Re: 50.

In what way did the scatter plots look un-"nice" without the duplication?

Pat (not verified):: Sun, 08/13/2006 - 3:58pmItâ€™s not obvious to me how integers can be mis-binned.

Huh? They can be mis-binned just like anything else. I think the bins might be 5 from -50 to 0, and 5 from 0 to 45. I'm not sure that's the issue, though. I think it's dumber than that.

It's only the first histogram that it would affect, though. With the others, I'd know something is wrong because the mean was calculated multiple ways.

In what way did the scatter plots look un-â€?niceâ€? without the duplication?

Didn't look as linear. Of course it was, as the correlation/regression showed, but it just looked a little weird. Personal preference, probably.

Pat (not verified):: Sun, 08/13/2006 - 4:06pmRegarding the binning: It's 10 bins from -50 to 50, which seems like each bin should be 10 points wide, but there are 101 points between -50 and 50, not 100 (because of zero). So 10 bins over 101 points is 10.1 points per bin. The rounding switches around zero, which makes it asymmetric about zero. So the bin on the negative side gets -10 to -1, and the bin on the right gets 0 to 9. Given that no games ended in zero-point differentials in the sample, there's the asymmetry.

Enkidu (not verified):: Mon, 08/14/2006 - 3:30pmRe: 52, 53.

Oh, so the bins are -50 to -41, -40 to -31, . . . , 30 to 39, and 40 to 49. I thought they were -50.5 to -40.5, etc. Makes sense, thanks.

And thanks for a great article.

Hey FO editors! How about more articles like this one with data plotted instead of tabulated!!

Kevin (not verified):: Wed, 08/16/2006 - 5:08pmI might be a little late to the party, but I can tell you the the R^2 between regular season meeting is 0.04, and for the preseason-regular season meeting is 0.09.

Also, note, the gray line does NOT mean a correlation of one. It means a slope of one. A correlation of one can happen with any slope...as long as the points fall directly along the line.

And we expect the link line to be less than one... its a VERY common phenomenon called "regression to the mean." Flip the Y an X, and the slope will still be less than one (I bet).

Pat (not verified):: Wed, 08/16/2006 - 5:45pmAlso, note, the gray line does NOT mean a correlation of one. It means a slope of one. A correlation of one can happen with any slopeâ€¦as long as the points fall directly along the line.

Yup. That should've said slope. I have a nasty habit of mixing the two up. That's the only place in the article it's screwed up, though (although I did it again in comments).

And we expect the link line to be less than oneâ€¦ its a VERY common phenomenon called â€œregression to the mean.â€?

Kindof. I think what you're saying is what I said above: it's due to poorly-characterized errors. Basically, every data point should have a +/- 7 point error bar on it - that's quantization error. So everything from -7 to +7 on X and Y should be essentially ignored by the fit, because it's basically indistinguishable from zero. I didn't do the regression with errors because I'm lazy, and it really wasn't that important, as this was just a comparison study. I don't really care what the actual correlation is. Just that it's the same or better.

If you only look at data outside of the +/- 7 point range, the slope goes *way* up - to about 0.7. If X and Y errors were added, it'd likely be basically 1. Your eye can see that already.