24 May 2004
By Michael David Smith
Take two running backs. One is already in the Hall of Fame. Let's call him Larry Csonka. The other, you're sure, deserves to be in the Hall of Fame. Let's call him Terrell Davis. You're certain the two players are similar: You point out that both were Super Bowl MVPs and that their career numbers look somewhat alike. But you're having trouble convincing the guy next to you at the bar, and you'd have even more trouble convincing those 39 guys on the Hall of Fame selection committee. What can you do?
An answer might come in the form of similarity scores. Like much in the world of sports statistical analysis, similaritiy scores are stolen from Bill James, who developed them for baseball. If two people had identical career numbers, they would have similarity scores of 1,000. Of course, two players never have identical career numbers. So what we do is deduct a certain number of points for each statistical category in which the two players are different. For example, with running backs, we could subtract:
1 point for a 100-yard difference in rushing yards
1 point for a 100-yard difference in receiving yards
1 point for a difference of 50 carries
1 point for a difference of 50 catches
1 point for a difference of one rushing touchdown
1 point for a difference of one receiving touchdown
1 point for each difference of one-hundredth of a yard per carry
So let's look at Csonka, the Hall of Famer, and Davis, the former MVP who probably won't get into the Hall:
|Player||Carries||Yards||TD||Catches||Yards||TD||YPC||Similarity to Csonka|
This doesn't prove anything, but it does at least give you a statistical basis for your argument about Larry Csonka and Terrell Davis. A similarity score of 942 means Davis's career is quite similar to Csonka's. When we add in the other similarities that statistics don't measure (both were Super Bowl MVPs and won two rings, both played with Hall of Fame quarterbacks, both played behind good offensive lines), we can say that Davis and Csonka had very similar careers, and that's good ammunition for the people who want to see Davis in the Hall of Fame. We can go through every quarterback, running back and receiver in the Hall of Fame and find the most similar players statistically, and that will give us an idea of which other players are similar to Hall of Famers and therefore worthy of induction themselves. With receivers we would only use receptions, yards, touchdown receptions and yards per catch, while with quarterbacks we'd add interceptions and completion percentage to yards and touchdowns. Unfortunately, we'd probably also have to normalize with the league average because of the stat inflation that has happened with the NFL's passing numbers.
Similarity scores require a huge amount of research, and football research is in its infancy, but let's start here with running backs, who are more similar through the years than quarterbacks or receivers. And among running backs we'll start with Jim Brown, who's generally recognized as the gold standard of running backs. So what you see below is all the running backs in NFL history whose similarity score compared to Jim Brown is at least 800. I also added the NFL's all-time leading rusher, Emmitt Smith.
|Player||Carries||Yards||TD||Catches||Yards||TD||YPC||Similarity to Brown|
I want to make it clear here that I'm not equating similarity to Jim Brown with greatness. Faulk, for instance, is lowered because he did significantly more than Brown catching the ball. That's certainly not a bad thing. Payton and Smith chose to play past their primes and therefore moved up and away from Brown in career totals while simultaneously moving down and away from Brown in yards per carry. If Smith and Payton had retired on top the way Brown and Sanders did, they'd be more similar to Brown (though still not as similar to Brown as Sanders is), and I'm not going to criticize Smith and Payton for choosing to play as long as they could. But similarity scores aren't intended to show the best players; they're intended to show the most similar players. Sanders and Perry had short, great careers, just like Brown, so seeing Sanders and Perry on top of the similarity list indicates that the system works. It's striking to see how similar the career totals of Riggins and Brown are, and that's why it's important to use yards per carry as one of the similarity categories. We can see that Riggins finished within 1,000 yards rushing, 500 yards receiving and 10 total touchdowns of Brown, but it's not until we look at yards per carry that we realize just how far short Riggins falls.
If I could add another statistic to similarity scores for running backs, it would be fumbles. But so far I haven't been able to find fumble stats for the old timers. Fumbles for running backs are as important a statistic as interceptions for quarterbacks, yet for some reason the NFL doesn't seem to want to reveal the number of fumbles its past running backs have produced. (Although in researching this article I did find another similarity between Csonka and Davis: Csonka had 21 career fumbles and Davis had 20.)
One of the biggest differences between football and baseball is the way the number of games played in a season has impacted season and career totals. Baseball gave Roger Maris's home run record an asterisk when he had the audacity to use eight extra games to top Babe Ruth. But eight games represents only 5 percent of the baseball season. NFL seasons are 33 percent longer now than they were in 1960, when Jim Brown was in the middle of his career. Combine that fact with offense-friendly rules changes, and skill position players have much better numbers now than they did in the past. That poses a problem for similarity scores, but not a fatal problem. After all, when we examine the two players most similar to Brown we get one modern player and one who preceded Brown.
This is, to the best of my knowledge, the first attempt at using similarity scores to get a historical perspective on football. (I have seen similarity scores used for fantasy football, but those compared only modern players and made no attempt at comparing current players to the stars of the past.) This is only a first try, and I hope readers will give their thoughts on how it can be improved.