QB Ratings Reconsidered
by Brian Hook, with additional writing by Aaron Schatz and Anthony Brancato
Football fans like to argue about quarterbacks, and invariably statistics come into play, since they're so handy for proving just about any point you want to make.
For this reason there have been numerous ways to judge a quarterback. Some use the "win is a win" argument; others use gross values such as career touchdowns or yardage or completion percentage; and yet others use the rather arbitrary (and rather unwieldy) NFL system called "passer ratings."
In this article, we disacuss three new systems for rating quarterbacks. First is the Football Outsiders Value Over Average (VOA) system, which attempts to quantify the particular value of a quarterback by comparing him to the league average, and its cousin, Points Above Replacement (PAR), which compares a quarterback to an estimated "replacement level." Second, Brian Hook has come up with a simple system known as the Hook Quarterback Value (HQV), which offers a high-level assessment of a quarterback based on drive efficiency. Third, Anthony Brancato has developed what he calls the Quarterback Rating System (QBR), which differs from the NFL's passer rating because it evaluates the "total package" and takes into account other skills a quarterback may possess besides just pure passing.
Trying to take something as intangible as "good quarterbacking" and assigning a single number to it will never be perfect. There will always be the intangibles to argue -- one quarterback benefited from a great running game, another was hurt by a poor defense, another had a great running back to keep defenders honest, and yet another had a bad offensive line. Was that interception the result of a bad pass or because a receiver slipped? The excuses and rationalizations are legion, and with that we must accept that no rating system will be perfect. The important thing is to understand what is being measured and how, and then we can at least interpret or prioritize the results as we see fit.
This article examines the relative merits of the NFL passer rating; TFO's PAR and VOA ratings; and HQV, all in terms of single-game ratings. Then we'll bring in Anthony Brancato's QBR to discuss the 2003 season as a whole.
NFL PASSER RATING
The NFL's own quarterback metric, known as "passer rating," was adopted in 1973. You can read more about it at the NFL's official website. The basic gist of the formula is that it measures a QB against historical averages in four key categories: completion percentage, yards/attempt, touchdowns per attempt and interceptions per attempt. It then scales and adjusts each of these values and creates a number from 0 to 158.3.
Looking at this we can identify several problems. The first is that the scale is rather unintuitive. How bad is a 65.0 passer rating? Or a 90.3 passer rating? Is a 158.3 that much better than a 132.0?
The second problem is that it double-penalizes downfield passers. A quarterback that throws down the field a lot will likely have a lower completion percentage, but since yards per attempt is measured, this means that a downfield QB will suffer twice during the calculation -- a lower completion percentage in turn translates to fewer yards per attempt, even if yards per completion are high.
The third problem is that it rewards touchdown passes, often a function of play calling and not necessarily passer ability.
The fourth problem is that passers that would rather throw the ball away than get sacked or force a ball into coverage, or who spike the ball to stop the clock, are penalized. According to the passer rating formula, you're a better quarterback if you get sacked for a 13-yard loss than if you throw the ball away.
The final problem is that it does not take into account at least two types of plays that really do make a difference: sacks and fumbles. When New Orleans hosted Tampa Bay in 2003, Aaron Brooks had a very good 101.8 passer rating - despite only getting seven points on the board, fumbling four times, and losing three of them.
The NFL site specifically mentions that they are rating players as passers and not quarterbacks. The NFL passer rating makes no attempt to measure anything other than the question, "When a quarterback throws the ball, does a good thing happen or does a bad thing happen?"
Their disclaimer is laudable, but the statistic is used so often to describe a QB's prowess (or lack thereof) that the disclaimer is pretty much ignored. That said, as a rough measure of a quarterback's passing efficiency, it's okay. But it leaves a lot to be desired. Since it does not grade on a scale (other than the original scale established by examining quarterback performance since 1963), nor does it bias for opposing defenses, it's an absolute, instead of relative, quantification of a quarterback.
VALUE OVER AVERAGE (VOA) and POINTS ABOVE REPLACEMENT (PAR)
Football Outsiders, in an attempt to overcome the situational biases inherent in NFL statistics, have devised two separate ratings: Points Above Replacement (PAR) and Value Over Average (VOA). You can find a complete description of the Football Outsiders system here. Here is a summary:
VOA breaks down the NFL season play- by- play to see how much success offensive players achieved in each specific situation compared to the league average. Each play is then given a success number based on a system originated in the book Hidden Game of Football, and tweaked a little bit since. The value of each play is compared to the league average based on a number of variables, including down, distance, location on field, current score, and time of game. If the listing is DVOA, the strength of the opponent against the passing game is also taken into account. The player then has his VOA (or DVOA) calculated based on the sum of his relative performance, divided by the number of plays.
The problem with VOA is that, by comparing to average performance per play, it suffers wide swings in value for players who don't play much. That led to the development of PAR, Points Above Replacement, which represents the total number of points scored due to plays where this QB passed or carried the ball, compared to a replacement-level QB in the same game situations. As a total number, this rating is better for measuring performance over a short number of plays, like one game instead of a whole season. The development of PAR is discussed more here.
Both VOA and PAR grade on a curve, since they are based on data collected from an entire season and compared to other players in the same situation. (Note: This article was written prior to the recent upgrade to VOA that combined the 2002 and 2003 seasons to create slightly improved league-average benchmarks.)
HOOK QUARTERBACK VALUE (HQV)
At a high-level, HQV is sort of the anti-PAR/VOA. It makes no attempt to take into account opposing defenses, nor does it look at situations or comparisons against similar players. It only asks two fundamental questions: When a quarterback is given the opportunity, does he score? And when given that opportunity, does he get a lot of passing yardage? It doesn't bother getting to the whys and hows of it, it just assumes that it will all come out in the wash.
In this context, "an opportunity" roughly correlates to a drive. Using the statistics generated by Football Outsiders, I ran a filter across the data to extract the total number of "assignable" drives. This means any drives where there was a play that involved a quarterback. A drive is defined as any sequence of plays differentiated from another sequence either by a change of possession; end of half; end of game; or end of regulation period.
A drive that does not involve the quarterback directly is considered "unassigned" for purposes of my computation. For example, if a team is running out the clock and has three hand-offs and a punt, then the quarterback is not assigned that drive.
The only two statistics I use are the number of points scored on a drive, and the number of passing yards on a drive. The questions are, can he get points on the board, including PATs and field goals, and does he pass a lot to achieve those points? I normalize these against reference points generated by the 2003 season (no, not quite as good as using stats from 1963, but I don't have those...) and then weight their resulting values using 60% contribution from points and 40% from passing yards. If you would like to know how this ratio was developed for the HQV, that information is located in an appendix to this article.
You can argue until you're blue in the face as to the relative merit of points over passing yards, but I'm comfortable with a 60/40 split. It rewards quarterbacks that get points on the board, but it still requires a pretty significant amount of passing yards to be considered great.
HQV's ultimate goal is to take a few steps back and just look at the end result of a quarterback's drive management, not to look at every little detail. The interesting effect is that any Bad Thing or Good Thing is automatically taken into account by this system. Quarterback fumbled the ball or throw a pick? Well then, that's one less possession he gets to score points with. Quarterback get sacked? Well, odds are that the drive will stall as a result. It normally takes care of itself, but, of course, there will always be situations where it's not the quarterback's fault that a drive died. I believe that every system that tries to quantify performance will suffer from that same problem.
All that said, here is a list of the top five games by HQV for the 2003 regular season:
|Green, Trent||100.0||Week 15 vs. DET||7||20/25||341||3||0||38|
|Brooks, Aaron||100.0||Week 7 vs. ATL||8||23/30||352||3||0||42|
|Bledsoe, Drew||96.8||Week 2 vs. JAC||8||19/25||314||2||0||35|
|McNair, Steve||92.3||Week 3 vs. NOR||8||22/33||252||2||0||27|
|Johnson, Brad||92.2||Week 6 vs. WAS||7||22/30||268||4||0||27|
Green's performance against Detroit is pretty much benchmark material. The Chiefs officially had 10 drives in that game, but the last two were Todd Collins's drives, and another consisted, effectively, of all Priest Holmes (3 runs for 29 yards along with an encroachment penalty that nullified Green's only pass of the drive, thus not giving him credit for the drive). If he had received credit for that drive, his HQV would have dropped to 99.7, even with the touchdown, since his points/drive is already capped and it would have lowered his passing yards/drive number.
Brooks's performance suffers from one minor anomaly. Technically there were 11 Saints drives in that game, but the last two were with backups (Bouman), so Brooks should only be credited with nine drives. But he's actually credited with eight, which boosts his HQV a bit. The reason that there's a "missing" drive is that, like Green, there were some circumstances that nullified his passes -- but to an extreme. That drive had eight plays, but all three passing plays were discounted as the result of penalties - defensive pass interference; illegal use of hands; and intentional grounding. So officially, there were only four rushing plays, and Brooks thus didn't get credit. If he had been credited with the drive, his HQV would have dropped to 95.8. Not a massive difference.
Bledsoe's game against Jacksonville had 10 drives, but the last two, again, were with Alex Van Pelt in the game, so Bledsoe was credited with 8 drives. Of those drives, five ended in touchdowns and three ended either in a punt or turnover on downs.
McNair against New Orleans is interesting for at least one reason -- there were only 16 drives in the whole game, well below the average of 24 or so. Other than that, McNair had a pretty good day as a quarterback. He was credited with all eight Titans drives, and he scored on five of those drives. His points per drive and passing yards per drive are both extremely good, the latter in part due to several very long, sustained drives. McNair had one critical fumble in the New Orleans red zone, but even with that, he managed to pull off one of the top five games of the year.
Johnson's game against Washington had nine official drives, but only seven were credited to him since one drive was a single play end-of-half kneel and another was an end-of-game all-rushing sequence. Discounting those, he was passing for a lot of yardage for each scoring drive, and he was also scoring consistently well, with four touchdowns and three drives leading to punts.
Looking at this data it is apparent that HQV rewards consistent scoring, which, by extension, also punishes turnovers, sacks and the inability to gain first downs. Passing yards and long drives also help raise HQV.
By comparison, there were some notable games that didn't score as well on HQV (but still scored well, mind you). McNair's 422-yard day against Houston rated an 84.3 (#20 overall for the season), but he "only" scored 3.1 points per drive and had 10 drives to do his magic. Favre's game against Oakland was an 87.2 and #18 for the season, but he benefited from having 11 drives at his disposal and, while his gross passing yards were phenomenal, from a per-drive point of view they were merely "pretty damn good". By comparison, Green had 30 percent more passing yards per drive against Detroit than Favre did against Oakland.
Like the NFL passer rating, the Hook Quarterback Value (HQV) is an absolute measurement, not a relative one. An HQV computation is unaffected by the performance of other quarterbacks in the same season, nor is it altered by defensive rankings. It is not a rating that tries to get down and dirty and figure out exactly what happened and when and why - - VOA and PAR are far superior for that - - but for a general assessment of "has a quarterback taken advantage of his opportunities?" it seems to do the trick.
COMPARISON OF BEST ONE-GAME PERFORMANCES
Here is a table of the top ten quarterback games of the year according to DPAR, along with the corresponding ratings according to Hook Quarterback Value and the official NFL passer rating. We've also included three performances that would make the top PAR list but were penalized due to easier opponents. "Pass plays" here includes sacks, and the two games with asterisks are the only games listed in which the quarterback threw an interception (one in each).
The first game on this list, Peyton Manning in Week 11 against the Jets, gives a good example of how Football Outsiders' statistics differ from both the official NFL passer rating and HQV. This game rates as the highest of the year in DPAR and PAR despite the fact that Manning only threw for one touchdown, because he threw 16 of his 36 passes for first downs, plus the touchdown (37 pass plays are listed above because of one sack). Meanwhile, he put Edgerrin James in such good position that Edge ran in for three touchdowns -- two from the one-yard line, and one from the four-yard line.
For fun, here's a list of the worst games of the year by DPAR, along with HQB and official NFL passer rating:
Yes, this means Tommy Maddox had one of the season's best games against Baltimore, once you adjust for their great pass defense... and then had one of the season's worst games against Baltimore, even after you adjust for their great pass defense.
BRANCATO QUARTERBACK RATING (QBR)
PERCENTAGE OF COMPLETIONS - Subtract 30 from the completion percentage, and multiply the result by 0.8333. Unlike in the NFL's Passer Rating, there is no lower limit, meaning that this step can result in a negative value if the completion percentage is less than 30.0, and no upper limit, meaning that the maximum value possible (58.3333) is awarded for a completion percentage of 100, not 77.5. Why shouldn't a quarterback get a higher rating if he completes 90 percent of his passes?
AVERAGE YARDS GAINED PER PLAY - Obtain combined net yardage by taking the total yards passing, subtracting the number of yards lost on sacks, and adding the number of yards gained rushing. Then obtain the total number of plays by adding together the number of pass attempts, plus the number of times sacked, plus the number of rushes (i.e., carries). Divide the combined net yards by the total number of plays. Then subtract 2.5 from it and multiply the result by 4.9167. As in Percentage of Completions above, a negative value is possible, in this case if the yards gained per play is less than 2.5; if the result is lower than minus 34.6667 (yards per play lower than minus 4.55), award minus 34.6667 points. There is also no upper limit, meaning that the maximum value possible (474.4583) is awarded for an average gain per play of 99 yards, not 12.5. This change removes the incentive for a quarterback to take sacks rather than throw the ball away. In addition, leaving out rushes and yards gained rushing from the official NFL system punishes a mobile quarterback and rewards a one-dimensional, immobile pocket passer.
PERCENTAGE OF NET TOUCHDOWNS - Obtain net touchdowns by adding the number of touchdowns passing plus the number of touchdowns rushing, then subtracting the number of interceptions returned by the opponents for touchdowns and the number of lost fumbles returned by the opponents for touchdowns. Then divide the result by the number of total plays (not the number of passes attempted as in the Passer Rating System) and multiply the result by 3.9583. A negative result will occur if the number of touchdowns returned by the opponents on interceptions and lost fumbles is greater than the number of touchdowns passing and rushing; if the result is lower than minus 200 (net touchdown percentage lower than minus 50.52), award minus 200 points. There is no upper limit to the value that can be awarded in this step, meaning that the maximum value possible (395.8333) is awarded for a net touchdown percentage of 100, not 11.875. This gives another reward for the rushing quarterback, and penalizes him for "allowing" a touchdown by throwing an interception that gets run back for a TD.
PERCENTAGE OF GROSS TURNOVERS - Obtain the number of gross turnovers by adding the number of interceptions plus the number of lost fumbles. Then divide the sum by the number of total plays, as in the Percentage of Net Touchdowns step above. Then multiply the turnover percentage by 3.9583 and subtract the number from 39.5833. A negative result will occur if the gross turnover percentage is greater than 10.0; if the result is lower than minus 200 (gross turnover percentage greater than 60.52), award minus 200 points. If we're going to count rushing as well as passing, shouldn't lost fumbles count as well as interceptions?
The sum of the four steps is the Quarterback Rating; there is no need to divide the sum by six and then multiply by 100, as in the Passer Rating System. A demonstration of how the two systems work on comparable scales is located in an appendix to this article.
COMPARISON OF SEASON PERFORMANCE
Here's a comparison of the top 33 quarterbacks in the NFL based on the various quarterback rating systems being introduced in this article. This list includes the 32 players who qualified for the NFL's QB Rating standings, with a minimum of 256 plays required, as well as Rich Gannon of Oakland, who just missed with 248 plays. They are ranked in order of the NFL's official passing rating, with ranks also given using the Brancato QB rating, the Hook QBV, and Football Outsiders VOA and DVOA (the only opponent-adjusted number on this table).
Aaron adds: I wanted to make some comments on the two players whose ranking using the Football Outsiders numbers most differs from their ranking using the other metrics, Brett Favre and Kerry Collins. Our ratings for Favre, like our ratings for Green Bay's offense as a whole, have been the subject of much debate over on this other thread. You can see from the difference between Favre's VOA and DVOA that part of the issue is the easier schedule of pass defenses faced this year by Green Bay (and, you'll notice, the entire NFC North).
Another factor is that Brett Favre succeeded in part because he was put in the right situations to succeed. Of the 33 quarterbacks listed above, Favre had the highest average "VALUE AVAILABLE" per play. This is the part of the VOA formula that measures the expected league-average success on a certain play given its down, distance, etc., and Favre's high number means that he was often in good passing situations because of his team's quality defense and running game. (In case you are curious, last among these 33 quarterbacks is Stewart, and he took that bad average situation and sucked extra hard.) To give just one example, Favre faced fewer second-and-long and third-and-long situations than average -- he threw 43 percent of his passes in these situations (defined as six yards to go or more) compared to the average quarterback who faced these situations 46 percent of the time. It doesn't sound like a big difference, but over an entire season it is.
Collins is, in many ways, the anti-Favre. He faced a very tough schedule (the NFC South and AFC East being packed with good pass defenses) and he ranked 26th out of the 33 quarterbacks listed above in value available because of the Giants' poor defense and special teams as well as the Tiki Barber's down season. His problem wasn't a lot of third-and-longs -- he actually faced that situation less often than Favre -- but rather being stuck in his own end of the field with his team losing, stuck with no alternative but to pass and pass some more.
The upshot of this is that both Collins and Favre performed roughly at the level you would expect of an average quarterback given the situations they faced. But Favre's statistics are far superior because he played against easier opponents and in easier situations. Of course, Favre also has a good excuse for his "league-average" performance: the massive four-game dip in DVOA from Weeks 10-13, coinciding with the thumb injury.
Each ranking system has its pros and cons, and really the choice of which is "best" comes down to what you value and what you would like to measure. The NFL passer rating measures how efficient a quarterback is when he throws the ball, it does not take into account the events leading up to the pass, nor does it take into account scrambling ability or "clutch" play or leadership. It's pretty simple - when the ball leaves the quarterback's hand, does something good or bad happen?
Brancato's QBR provides a number that can be easily compared to the official NFL passer rating, while at the same time taking into account non-pass related skills that the official number leaves out. That means its an improvement in many ways, while at the same time suffering from the same "What the heck is a 90.3" problem.
PAR and DPAR measure, with and without defensive adjustment, how good a player is compared to a random quarterback off the bench (i.e. a "replacement player") based on the specific situations faced in each game. VOA and DVOA do likewise, but by comparing the quarterback to average rather than replacement level they do not take into account how often the quarterback has played. PAR/DPAR is probably the better measure for performance in a specific game; VOA/DVOA, as long as you are only counting quarterbacks who meet a minimum requirement of plays, is probably the better measure for performance over a season.
All four of these stats are accurate, but given the number of them and the complexity of their computation, they suffer primarily from a difficulty of explanation and intuitiveness. The average fan can't compute these values since they require play-by-play data that is not widely available.
HQV is provides an easy to understand score by offering a 100-point scale with an average of 50. But it suffers from the opposite problem of PAR and VOA. It doesn't care enough about the details to even look into them, so while you can get a general understanding of a quarterback's value to a team, you don't really learn enough to qualitatively assess why he's good or bad.