Writers of Pro Football Prospectus 2008

FOOTBALL OUTSIDERS SIMILARITY SCORES

Similarity scores were first introduced by Bill James to compare baseball players to other baseball players from the past. The general idea was to start at 1000 points and subtract for the various differences between two players; the players closest to 1000 were the most similar. The method is all over the great Baseball Reference website and, just as UNIVAC eventually led to your Palm Pilot, can be seen as the ancient predecessor to advanced baseball projection methods like Nate Silver's PECOTA.

It was only natural that the idea would spread to other sports as statistical analysis spread to other sports. NBA analyst John Hollinger has created his own version to compare basketball players, and we have created our own version to compare football players.

Similarity scores have a lot of possible uses, and we aren't the only football analysts who use them. Doug Drinen of the website footballguys.com has his own system that is specific to comparing fantasy football performances. The major goal of our similarity scores, however, is to compare career progressions and try to determine when players have a higher chance of a breakout, a decline, or -- due to age or usage -- an injury. Therefore we not only compare not only numbers like attempts, yards, and touchdowns, but also age and experience. We often are looking not for players who had similar seasons, but for players who had similar two- or three-year spans in their careers.

Similarity scores have some important weaknesses. The method compares standard statistics like yards and attempts, which are of course subject to all kinds of biases from strength of schedule to quality of receiver corps. The database for player comparison begins in 1978, the year the 16-game season began and passing rules were liberalized (a reasonable starting point to measure the "modern" NFL). We also project statistics for 1982 and 1987 as if the strikes did not happen, although we cannot correct for players who crossed the 1987 picket line to play more than 12 games.

The method is subject to change in the future; we want to tweak it and perfect it. But here is how it works right now:

FOR ALL POSITIONS

  • Subtract 15 points for each year difference in age between the two players
  • Subtract 15 points for each year difference in career experience between the two players
  • Subtract 10 points for a difference in one game played
  • Subtract 20 points times the difference in games played after one

(For example: If Player A was in 16 games and Player B was in 13 games, that's 50 points.)

QUARTERBACKS

  • Subtract 0.45 points for each difference of 1 pass attempt
  • Subtract 1 point for each difference of 10 passing yards
  • Subtract 1 point for each difference of 0.1% in completion percentage
  • Subtract 4 points times the difference in passing touchdowns
  • Subtract 5.5 points times the difference in interceptions
  • Subtract 40 points times the difference in yards per pass attempt
  • Subtract 3 points for each difference of 4 rushing attempts
  • Subtract 1.5 points for each difference of 10 rushing yards
  • Subtract 3 points times the difference in rushing touchdowns

RUNNING BACKS

  • Subtract 4 points for each difference of 5 carries
  • Subtract 1.5 points for each difference of 10 rushing yards
  • Subtract 4 points times the difference in rushing touchdowns
  • Subtract 100 points times the difference in yards per carry
  • Subtract 1 point for each difference of 2 receptions
  • Subtract 1.5 points for each difference of 10 receiving yards
  • Subtract 1.5 points times the difference in receiving touchdowns

WIDE RECEIVERS and TIGHT ENDS

  • Subtract 3 points times the difference in receptions
  • Subtract 1 point for each difference of 2 receiving yards
  • Subtract 8 points times the difference in receiving touchdowns
  • Subtract 10 points times the difference in yards per catch

When measuring a two-year or three-year span, we use a mathematical method called the "harmonic mean." The harmonic mean is higher when the items being compared are closer together. We measure the most recent year twice, then add either the previous year (for a two-year span) or the previous two years (for a three-year span). For example, over a two-year span, Player B with similarity scores of 900 and 900 will come out as more similar than Player A than Player C with similarity scores of 950 and 850.