Writers of Pro Football Prospectus 2008

25 Oct 2005

32 Teams, One Chart

Build a better power ranking and the world will beat a path to your door. Using something called "unambiguous beatpaths," FO reader Tunesmith has an interesting way of ranking all 32 teams. I like the way the chart gives an extremely simple way of seeing the entire NFL season, and I like how Tunesmith cheerfully describes his power rankings as "a vastly inferior system" to DVOA.

Posted by: Michael David Smith on 25 Oct 2005

52 comments, Last at 13 Nov 2005, 12:27am by tunesmith

Comments

1
by Pat (not verified) :: Tue, 10/25/2005 - 11:37am

Heh. What's nice about this graph is except for one outlier, there's a really nice division between teams that suck and teams that don't. Just draw a line below TB/ATL/CHI/NYG/CAR/SF/SD. Everyone above this line is good (save SF, the outlier). Everyone below this line is not.

SF is only there because it beat STL, which is returning to true 2004 form as being an awful team masquerading as a mediocre team.

2
by JonL (not verified) :: Tue, 10/25/2005 - 11:42am

I don't think these rankings give Denver enough credit.

3
by PatsFan (not verified) :: Tue, 10/25/2005 - 11:56am

That's really interesting! As MDS said in one of the comments on your blog, definitely keep it going!

4
by princeton73 (not verified) :: Tue, 10/25/2005 - 12:06pm

Just draw a line below TB/ATL/CHI/NYG/CAR/SF/SD. Everyone above this line is good (save SF, the outlier)

and the Jets

5
by Parker (not verified) :: Tue, 10/25/2005 - 12:10pm

I'm not sure I understand the graph part. At first glance it looks like the team should be in some kind of decending order of goodness, with Indy at the top and Houston at the bottom. It would then stand to reason that teams at the same 'level' on the chart would be fairly equal in performance. But then I see the Jets next to Pitt and Seattle and Atlanta grouped with SF, so that can't be what is happening.

That being said, I love it. There's just a hint of insanity to it. The squiggly lines and multiple arrows leave one with the idea that the author has gone a bit mad trying to make sense of a senseless thing.

Brilliant.

6
by Vash (not verified) :: Tue, 10/25/2005 - 12:47pm

Teams that appear in the top 3 tiers despite not having a single win on the chart:
New England
Oakland
N.Y. Jets

Seems pretty subjective to me, especially since the Jets officially blow, Oakland is 2-4, and New England is 3-3 with some ugly losses.

7
by zlionsfan (not verified) :: Tue, 10/25/2005 - 12:48pm

This is very cool. Tunesmith, have you gone back and applied this to any previous seasons to see what the graph would look like?

8
by MJK (not verified) :: Tue, 10/25/2005 - 1:08pm

Hmmm, this is very interesting. Except, doesn't this system tend to weight certain games very heavily? Every team plays the same schedule as their division mates except for two games. So there's a very high probability of short beat loops between two division mates and teams they both play. But the two extra games will tend to create very long beat loops, if at all, because there is little connectivity between the schedules of the teams. Hence these games would be very important in determining a teams ratings. This is the OPPOSITE of the current NFL tiebreaker schemes, which de-emphasize out-of-division games. It seems like interdivisional and interconference games will end up being far more important to a team's ranking than intradivisional games (this is the same criticism that Pat brought up when I was talking about ranking teams using Maximum Likliehood methods).

For example, through the first few weeks New England was very heavily penalized for losing to Carolina, far more so than they were rewarded for beating Pittsburg, because very few of their opponents played Carolina and the loss wasn't removed.

9
by JonL (not verified) :: Tue, 10/25/2005 - 1:36pm

RE: several

Beatpaths rely on a listing of preferences, so by definition there is an element of subjectivity. My own opinion is that in general, the term "quality wins" is incredibly subjective, to the point where, except in extreme examples (a win against Houston vs. a win against Indianapolis), it's of little utility.

10
by Scott de B. (not verified) :: Tue, 10/25/2005 - 2:03pm

I don't think there is any subjectivity about the graph -- the procedure is entirely mechanistic. The power rankings, yes.

As for determining the best teams using the graph, I don't think height on the graph is the best method (though a rough rule of thumb). Rather look for teams that a) are ahead of several good teams (somewhat circular, I know) and b) have long beat paths.

11
by tunesmith (not verified) :: Tue, 10/25/2005 - 2:29pm

Hey everyone. Thanks for the link, MDS. :) Yeah, if you read back a few entries on my site, you'll see that I came up with the idea while I was a tad delirious from a bad cold, but it's actually proved pretty useful since then.

Scott's right - the height of the beatpath matters more than the placement of the team. Teams rise to the top as they've beaten more good teams that have in turn beaten other good teams. Meanwhile, teams like San Diego have had a tendency to lose to teams that have lost to other teams San Diego has beaten.

MJK has an interesting point. I think there might be a couple of things that will make up for it, though. The "rare opponent" beatpath loops will tend to be longer, involving more teams. The system always finds small beatpath loops first and takes them out, which tends to obliterate longer beatpaths. Second, intra-division matches actually help. If there's a split between two teams, they cancel out any three-team loop both were in. If there's a sweep, previous beatpath loops will remain, but the victor will still get credit over the team they beat.

In a sense, NE is getting credit for beating PIT, because they're not being penalized for their loss to SD.

If you want to keep up, be sure to keep checking the main url of the site (http://thunderthumbs.org/) . It's my personal weblog so you might have to put up with me writing random other articles, at least until I spin off the nfl stuff. :-)

12
by Pat (not verified) :: Tue, 10/25/2005 - 2:35pm

All of SF's losses (save Arizona) have come against very good teams. This doesn't take into account that San Francisco looks like crap, and St. Louis uses voodoo dolls to win games against good teams.

tunesmith: One thing you could do is allow strength of victory to lengthen the lines of the bottom dwellers. This is arbitrary for Houston, but for the other teams, it would help align them.

If you take the average loss margin of SF versus Dallas and Philadelphia, for instance, it's something like 28 points (by some miracle, Dallas didn't blow them out). That line could be stretched and would move SF away from SD.

The other possibility is moving low-ranked teams to their lowest closed beatpath, assuming that teams belonging to a circular beatpath are roughly equivalent. This would move SF to the STL/SF/ARI level. Makes sense to me.

13
by tunesmith (not verified) :: Tue, 10/25/2005 - 2:43pm

And yes, I successfully tuned the generation software last night so that it was able to chug all the way through week 21 of the 2004 season. So soon, I'll have graphs up for 2004. Maybe rankings after that. I'm working on an automatic way to generate the power rankings that rely less on my subjective judgments, since my subjective judgments suck. :-)

14
by Larry (not verified) :: Tue, 10/25/2005 - 2:57pm

Very interesting. The more I think about it the more sense it makes. That it's nontrivial is also an appealing factor.

Parker - the height of each team is, I'm pretty sure, unambiguous, because that is determined by who beat you (and who beat them) as well as who you've beaten. If you start with the longest path, and require that all victories have a height difference, every other team is in some way attatched to some point on the initial path and has its height determined. It is very cool.

15
by tunesmith (not verified) :: Tue, 10/25/2005 - 3:04pm

I should make the point that I don't actually even control the placement of the bubbles graphically. I use the open-source graphing package "graphviz", which is written by the guys at AT&T, who are all, I'm sure, a hell of a lot smarter than I am. :-)

I do think the length of the arrows can be adjusted through the software, so I am continuing to look at ways to do that consistently. So far though, no approach I can think of strikes me as elegant enough. As for the teams whose bubbles appear too high up at this stage, that'll probably fix itself as the season goes on, when they lose to other mediocre teams.

16
by Richie (not verified) :: Tue, 10/25/2005 - 3:14pm

Can we just give Carl his own section on the board for his random ranting?

17
by Richie (not verified) :: Tue, 10/25/2005 - 3:31pm

But then I see the Jets next to Pitt and Seattle and Atlanta grouped with SF, so that can’t be what is happening.

I think the point to note is that it's possible that Oakland is just as good as Pittsburgh, due to the ambiguousness of each teams wins and losses. There just isn't enough info yet.

18
by admin :: Tue, 10/25/2005 - 3:56pm

Hello. Last time I checked, this is not a discussion thread for issues of racism in football. This is a discussion thread for Tunesmith's team beatpaths graph. There's a contact form up above and if you want us to link something for discussion in Extra Points you are welcome to send it in. To be honest, hijacking a thread is a big middle finger to both Tunesmith and MDS, no matter who the hijacker is. Knock it off.

19
by Pat (not verified) :: Tue, 10/25/2005 - 3:58pm

tunesmith:

That's why I liked using the minimum of the lowest circular beatpath and the unambiguous beatpath level - because it uses information you already have and doesn't add anything new.

This pushes Oakland down to Dallas/Philadelphia level, leaves SD where it is, lowers SF to STL/ARI level, leaves GB where it is (no circulars), leaves BAL where it is (CLE/CHI are higher).

One tweak would be "drop to the level where you have the largest number of members of a circular beatpath I belong to" - so SF drops to ARI/STL because that has 2/3 of it, but NYJ drops to ATL/CAR level because it has 2 members and the BUF level only has 1. This keeps NE from falling all the way down to BUF, for instance.

20
by Carl (not verified) :: Tue, 10/25/2005 - 3:59pm

Fair enough, Outsiders. I took it to another site.

21
by Carl (not verified) :: Tue, 10/25/2005 - 4:00pm

As for Tunesmith, there's actually better social-linking software out there that is used to explain connectivity.

This is kind of a meaningless jumble of subjectivism. Which is fine, just not as interesting as Markov Chain functions.

22
by Sophandros (not verified) :: Tue, 10/25/2005 - 4:09pm

Maybe I'm just slightly touched by madness, but the term "beatpath" totally appeals to me.

23
by tunesmith (not verified) :: Tue, 10/25/2005 - 4:30pm

Carl, there's no subjectivity in the beatpath graph. There's all sorts of it in the power rankings, because there's a huge number of possible powerranking permutations that respect the beatpaths. But I'm working on a way to remove the subjectivity from the power rankings, by instead deferring to the rankings of the previous week. I should be able to generate a power ranking history back through opening day of 2004.

As for the fun term "beatpath" - it actually comes from my past research into high-tech voting methods that are better than the crazy majority-rules system we have now. There's a family of voting methods called "Condorcet" methods, where the winner of an election/contest is the one that would beat every other candidate in a head-to-head matchup. In some of its variants, they use "beatpath methods", which are mathematically related. Since sports is all about head-to-head matchups, it seemed like a good fit.

24
by DavidH (not verified) :: Tue, 10/25/2005 - 4:32pm

This is kind of a meaningless jumble of subjectivism. Which is fine, just not as interesting as Markov Chain functions.

I thought the whole point of the graph is that it is completely objective. Maybe not the best way to look at the wins, but at least objective. ... I guess you could be talking about the rating, though, now that I think about it.

Also, how is this related to Markov chains? I have seen those used to deal with game state transitions in baseball - are you saying someone should be doing a Markov Chain model with all the possible downs, sitances, scores, and times represented? I'm sure that's not it, but I can't figure out what else you might mean.

25
by DavidH (not verified) :: Tue, 10/25/2005 - 4:33pm

tunesmith beat(path) me to it

26
by tunesmith (not verified) :: Tue, 10/25/2005 - 4:45pm

Pat: Denver's got some pretty impressive beatpaths, but they're also in some loops that are fairly low on the graph. That's the situation I can't reconcile with the suggestion you're making. Or are you saying that this is only for teams that don't have any (unambiguous) beatpath wins?

That might be possible. And... it might be possible in reverse, too, to raise up teams that are "undefeated" (in an unambiguous beatpath sense). Although it's unclear what to do with NE.

(I seriously have to come up with a better term than "unambiguous".)

27
by tunesmith (not verified) :: Tue, 10/25/2005 - 4:50pm

Yeah, I don't understand the Markov Chain comment either. Markov Chains work by coming up with a probability of a future result by analyzing past actions, and they're linear. These graphs don't predict future behavior, and the beatpaths have multiple parents and children.

I mean, Go Broncos!

I'm betting this will probably end up more accurate in reflecting team quality than the NFL's W-L records. For 2004, it shows pretty clearly that Philadelphia wasn't really the league's second-best team.

28
by Michael David Smith :: Tue, 10/25/2005 - 5:10pm

Tunesmith, I wonder if you could explain how big of a shift on the graph one big upset can make. For instance, how different would the graph look if Houston had beaten Indianapolis?

29
by Bowman (not verified) :: Tue, 10/25/2005 - 5:50pm

MDS,

From what I can tell, it would have created a Ind => Tenn => Hou => Ind circular beatpath, which would be eliminated. The Jax win is not effected. The only change to the graph would be to remove the arrow from Tenn to Hou.

In English, there is no big change, because Ind. beat Jax, (who has played a stronger schedule to date.)

30
by Bowman (not verified) :: Tue, 10/25/2005 - 5:53pm

In addition, Ind. would still be unambiguously better than Hou., because Ind => Jax => Pit => Cin => Chi => Min => NO => Buf => Hou.

31
by tunesmith (not verified) :: Tue, 10/25/2005 - 6:04pm

Good question. I just reran the simulation with their score flip-flopped, and if Houston had beat Indy, the graph looks almost identical. The reason is because Indy has so many different beatpaths to Houston. The victory creates a beatpath loop for only one of them - through TEN and HOU. So the only change in the graph is that Tennessee doesn't get credit for its win over Houston anymore.

In fact, this is really amusing. In order for Houston to have fought through all of Indy's beatpaths to it, Houston would have had to have beaten Indy SIX TIMES on Sunday in order to be ranked ahead of them.

32
by DavidH (not verified) :: Tue, 10/25/2005 - 6:05pm

And to expand on Bowman's last post, the Ind->Jax->Pit->Cin->Chi->Min->NO->Buf->Hou chain would not connect back around and become a circular path, because the Hou->Ind path would have already been erased due to the Ind->Ten->Hou->Ind loop. I think.

33
by tunesmith (not verified) :: Tue, 10/25/2005 - 6:07pm

Heh. What Bowman said. Houston needing to beat Indy six times still cracks me up, though.

34
by tunesmith (not verified) :: Tue, 10/25/2005 - 6:11pm

Yes, DavidH is right. I always remove all the (tied-for) smallest beatpath loops first, and then recalculate.

The reason is because if

DEN over KC over MIA over DEN

then Miami shouldn't get credit for defeating Denver, because they got beaten by a team that Denver is apparently better than.

But if KC then beats DEN (to split the series), then it gives lie to the fact that DEN is better than KC.

So at that point, MIA shouldn't be kept from their victory over DEN anymore. Their victory over DEN would reappear. As would KC's victory over MIA, for similar reasons.

35
by John P (not verified) :: Tue, 10/25/2005 - 6:33pm

Re #2

Almost made cola come out of my nose, danmit. Post a warning before you do stuff like that!

36
by Pat (not verified) :: Tue, 10/25/2005 - 7:47pm

Or are you saying that this is only for teams that don’t have any (unambiguous) beatpath wins?

Yah. I was suggesting that teams with unambiguous beatpath wins should go one above that team. Teams with no more unambiguous beatpath wins should go to the the level that they have the most circular beatpath members in common with.

Subjectively, it seems to put teams in the right regions.

37
by Pat (not verified) :: Tue, 10/25/2005 - 9:01pm

Yeah, I don’t understand the Markov Chain comment either. Markov Chains work by coming up with a probability of a future result by analyzing past actions, and they’re linear.

You can use Markov chains to rank teams. It's functionally similar to the way that Page Rank algorithm works for Google.

It's also essentially the same as the way that the Colley matrix (or any of the other computer rankings) for the BCS work (math explanation here). Short answer is that you're trying to find the win probability for each team that best explains the data that you have. (Take this example and imagine calculating the steady state vector given a set of data of weather on days).

I think both of the methods have advantages. In a large enough set of data, both of them give the same results. I think your method might be a little more biased by single game results, but it also places more weight on the games themselves.

As with all automated rankings, more methods are better than fewer, because each one has its own (understandable) bias. That's why, for instance, Sagarin uses multiple ranking systems and averages the two.

38
by DavidH (not verified) :: Tue, 10/25/2005 - 9:07pm

Cool.

39
by Pat (not verified) :: Tue, 10/25/2005 - 9:24pm

And now, to add a completely complicated comment:

One thing that was mentioned in a previous Extra Points thread was the possibility of an unbiased ranking system. Nobody likes bias, and so this always sounds way cool, but it's just another ranking system, so don't get too impressed.

"Bias" in this case could be viewed as "how much are my results affected by one single game?" because, as Aaron has mentioned, certain events in football seem to be random and should be random, like the bounce of Quintin Mikell's kick block.

In a mathematical sense, the way you'd do this is start off by assuming that teams lose fluke games, and assume that the probability of losing a fluke game follows some structure. Then find some way of calculating your rankings while minimizing the total deviation that each game produces.

As a stupid example, imagine ranking all teams using tunesmith's method (putting equal level teams at an equal rating). Then recalculate it, removing each game one by one, and calculating the deviation from the final rankings. You'll now know how much each game affects your rankings (as tunesmith pointed out, the INDY-HOU game is unlikely to be a fluke, since it, well, doesn't really change anything).

Then assume a "fluke percentage", and calculate how many fluke wins you should get in the number of games you've seen. Remove that number of "fluky" games, and poof, you've got your (now unbiased) ranking.

Smart people will say that the problem is that you assumed the fluke percentage. Which is true. But you can derive the "fluke percentage" from the data itself.

(Note that I'm *not suggesting this method!!* It's arbitrary and silly. It's just an example. tunesmith's removal of circular beatpaths is actually similar to removing bias, and better given the rest of the method).

40
by Paul (not verified) :: Tue, 10/25/2005 - 11:53pm

Very cool stuff Tunesmith. But, is your graph actually portraying what you say it's portraying or have you done a great job at creating the perception that it is. Either way it is compelling and you have me intrigued.

41
by tunesmith (not verified) :: Wed, 10/26/2005 - 1:19am

Paul, I don't know what you mean. I certainly wouldn't spend hours drawing bubbles and pushing them around on a screen. (Instead, I'd spend many MORE hours writing a program to do it for me.)

However y'all may be interested in knowing that I have graphs for the 2004 season, including playoffs, up on the site. Come on over and draw your own conclusions of what it portrays. ;-)

42
by Parker (not verified) :: Wed, 10/26/2005 - 10:57am

I think having an 8x4 foot version of that full beatpath picture mounted an a wall would be cool. Or at least an interesting conversation piece. More interesting than pictures of Campbells soup cans, in my opinion.

43
by Becephalus (not verified) :: Wed, 10/26/2005 - 12:10pm

Tunesmith i wrote a seminar paper on voting methods and beatpaths. Small world :)

44
by Larry (not verified) :: Wed, 10/26/2005 - 4:07pm

I've always preferred Bradley-Terry as a win-loss only based system. it has the 'disadvantage' of, in its pure form, of reporting all undefeated teams as equal. I don't personally think that's a disadvantage, since what information would you have from wins alone to say which undefeated team was better?

Essentially, each team i is assigned a rating value Pi that is used in predicting the expected result between it and its opponent j, with the likelihood of i beating j given by:

Pi / (Pi + Pj)

The probability P of all the results happening as they actually did is simply the product of multiplying together all the individual probabilities derived from each game. The rating values are chosen in such a way that the number P is as large as possible. This is the Wolfe rating used in the BCS (the description is from his site). Wolfe somehow manages to order his undefeated teams, which can be done by adding results of fictitious games, but I find that distasteful to say the least.

Anyway, care to run your program on college football, tunesmith? It'd be pretty interesting, actually.

45
by Michael David Smith :: Wed, 10/26/2005 - 5:07pm

Running this on college football would be so incredibly complex that I don't know how much use we'd get out of it. With 32 teams, it's not hard to eyeball the chart and have a sense for what teams have accomplished. College would have to include every I-A team, plus every I-AA team that played a I-A team, plus every team they played, etc.

46
by Pat (not verified) :: Wed, 10/26/2005 - 5:22pm

It wouldn't be hard to do it on the individual conferences themselves, which are extremely well connected and have small numbers.

47
by tunesmith (not verified) :: Wed, 10/26/2005 - 6:29pm

I've thought about college football, actually. You could run it on the Top 20 every week, and let it extend down to the other teams the Top 20 has played. It would still be pretty big, but it might be manageable.

48
by Larry (not verified) :: Wed, 10/26/2005 - 6:34pm

Yes, the result would be an enormous chart. You could eliminate 1-AA games for simplicity, I suppose. You certainly couldn't include every college football team, that'd be crazy huge. But just 1-A would be ok. You could also just include all 1-AA teams as 1 'pseudo-team,' though that obviously isn't right either, but it'd allow you to capture the Stanford loss. Did any other 1-A team lose to a 1-AA team?

I think it would be illuminating to see what relationships between conferences survive the removal of intransitive wins (my shot at an alternative name for ambiguous beatpaths). Is the SEC better than the Big 10? This'd be an interesting to way to say something like, "Yes, the 4th best SEC team is better than the 2nd place Big 10 team" or some such thing.

Plus, if the initial 2004 graph is cool (and it is), then a college graph can only be more fun, right?

49
by bengt (not verified) :: Thu, 10/27/2005 - 6:44am

Just a quick thought:
Imagine two beatpath loops A-B-C-A and A-B-D-A. When the first loop is removed, the A-B link is removed and the second loop is not a loop anymore. Therefore either C or D will be awarded a (otherwise ambiguous) win over A, depending on which loop is considered first. The algorithm would be dependent on the order of evaluation which is probably not a good (but at least not un-objective) thing.
At least that is what I would expect. Maybe Tunesmith has had that idea before and circumvented the problem?

50
by DavidH (not verified) :: Thu, 10/27/2005 - 5:21pm

#49:

In that situation both loops would get taken out. He explains that in the comments section of one of the posts on his site.

51
by tunesmith (not verified) :: Thu, 10/27/2005 - 5:26pm

Yeah. I basically just see the relationships as equally ambiguous.

I'll be making a post soon to illustrate some common problems, and what I do to solve them. Some scenarios are knotty and maybe a bit controversial. Keep following the site to see. Also, I may be moving this all to its own expanded website. If so, I'll announce it there.

52
by tunesmith (not verified) :: Sun, 11/13/2005 - 12:27am

Whoops - not sure if anyone is still reading this thread, but I did mention that if I launched a new site, I'd announce it here. That new site is http://beatpaths.com/ .