Similarity scores were first introduced by Bill James to compare baseball players to other baseball players from the past. The general idea was to start at 1000 points and subtract for the various differences between two players; the players closest to 1000 were the most similar. The method is all over the great Baseball Reference website and, just as UNIVAC eventually led to your Palm Pilot, can be seen as the ancient predecessor to advanced baseball projection methods like Nate Silver's PECOTA.
It was only natural that the idea would spread to other sports as statistical analysis spread to other sports. NBA analyst John Hollinger has created his own version to compare basketball players, and we have created our own version to compare football players.
Similarity scores have a lot of possible uses, and we aren't the only football analysts who use them. Doug Drinen of the website footballguys.com has his own system that is specific to comparing fantasy football performances. The major goal of our similarity scores, however, is to compare career progressions and try to determine when players have a higher chance of a breakout, a decline, or -- due to age or usage -- an injury. Therefore we not only compare not only numbers like attempts, yards, and touchdowns, but also age and experience. We often are looking not for players who had similar seasons, but for players who had similar two- or three-year spans in their careers.
Similarity scores have some important weaknesses. The method compares standard statistics like yards and attempts, which are of course subject to all kinds of biases from strength of schedule to quality of receiver corps. The database for player comparison begins in 1978, the year the 16-game season began and passing rules were liberalized (a reasonable starting point to measure the "modern" NFL). We also project statistics for 1982 and 1987 as if the strikes did not happen, although we cannot correct for players who crossed the 1987 picket line to play more than 12 games.
The method is subject to change in the future; we want to tweak it and perfect it. But here is how it works right now:
QUARTERBACKS
Subtract 15 points for each year difference in age between the two players
Subtract 10 points for each year difference in career experience between the two players (based on the year the player came out of college, not necessarily his first year in the NFL)
Subtract an additional 15 points for each year in difference in career experience as a starting quarterback between the two players, based on the first year where a quarterback started at least six games
Subtract 5 points times the difference in games played
Subtract 20 points times the difference in games started
Subtract 0.225 points for each difference of 1 pass attempt
Subtract 2.5 points for each difference of 10 passing yards
Subtract 1.6 points for each difference of 0.1% in completion percentage
Subtract 3 points times the difference in passing touchdowns
Subtract 2 points times the difference in interceptions
Subtract 2 points times the difference in sacks
Subtract 150 points times the difference in yards per pass attempt
Subtract 1 point for each difference of 4 rushing attempts
Subtract 1 point for each difference of 10 rushing yards
Subtract 1 point for each difference in rushing touchdowns
RUNNING BACKS
Subtract 15 points for each year difference in age between the two players
Subtract 10 points for each year difference in career experience between the two players
Subtract 5 points for each year difference in NFL experience between the two players
Subtract 15 points times the difference in games played
Subtract 5 points times the difference in games started
Subtract 6 points for each difference of 5 carries
Subtract 1 points for each difference of 5 rushing yards
Subtract 10 points times the difference in rushing touchdowns
Subtract 100 points times the difference in yards per carry
Subtract 1 point times the difference in receptions
Subtract 1.5 points for each difference of 10 receiving yards
Subtract 3 points times the difference in receiving touchdowns
Subtract 3 points for each inch difference in height
Subtract 10 points times the difference between the two players in Body Mass Index
WIDE RECEIVERS and TIGHT ENDS
Subtract 15 points for each year difference in age between the two players
Subtract 10 points for each year difference in career experience between the two players
Subtract 5 points for each year difference in NFL experience between the two players
Subtract 3 points times the difference in receptions
Subtract 1 point for each difference of 2 receiving yards
Subtract 8 points times the difference in receiving touchdowns
Subtract 12 points times the difference in yards per catch
Subtract 1 point times the difference in carries (wide receivers only)
Subtract 3 points for each inch difference in height
Subtract 5 points times the difference between the two players in Body Mass Index
DEFENSIVE SIMILARITY SCORES
The defensive similarity scores system is a bit too complicated to explain fully. The coefficients are different for each position. The similarity scores measure a number of stats, basically split into three categories:
1) Biographical facts, such as age, experience, height, weight, and BMI.
2) Standard stats, such as sacks and interceptions.
3) FO advanced individual defense stats, such as Stop Rate and Defeats. These are stats from PBP, not game charting, so they do not include defensive coverage metrics for defensive backs.
FOR ALL SIMILARITY SCORES
When measuring a two-year or three-year span, we use a mathematical method called the "harmonic mean." The harmonic mean is higher when the items being compared are closer together. We measure the most recent year twice, then add either the previous year (for a two-year span) or the previous two years (for a three-year span). For example, over a two-year span, Player B with similarity scores of 900 and 900 will come out as more similar than Player A than Player C with similarity scores of 950 and 850.
Offensive players similarity scores are based on stats back to 1978. Defensive players similarity scores are based on stats back to 1997.