FO Basics: Different Kinds of Stats
Photo: USA Today Sports Images

by Aaron Schatz

Over the next couple weeks, we're going to run a series of articles we're calling FO Basics. We get a lot of questions about our work, but there are also a lot of readers who don't ask questions. We hope this series will help answer some questions and clarify some confusing things for even those readers who don't respond on the message boards.

The schedule:

  • August 30: Where our stats come from, and the difference between charting stats and play-by-play stats.
  • August 31: A summary of research from our first seven years.
  • September 1: Our college stats, how they differ from our NFL stats and from each other.
  • September 6: The importance (and limitations) of watching games on tape.
  • September 7: Regression towards the mean -- what it means, and how we use it.

So let's start with a look at our stats. My goal here is not to fully explain how our formulas work -- we have a separate page you can read with a lot of those explanations -- but rather to try to clear up some common misconceptions about where our stats come from and whether or not they count as "subjective" or "objective."

I'll start the discussion by separating FO stats into four different categories: play-by-play, game charting, historical stats, and projections.


Play-by-play statistics themselves come in two different categories: counting stats and formulas.

Play-by-Play Counting Stats are the simplest type of statistics: the numbers that come directly from the official NFL play-by-play. That starts with all the standard statistics, anything you can find on yards, carries, passes, receptions, touchdowns, sacks, interceptions, and fumbles. This category also includes the defensive statistics which are technically "unofficial" but are tracked by the league, included in the play-by-play, and listed on tackles, passes defensed, pass targets, and quarterback hits.

Even though these stats come straight out of the play-by-play, the numbers you find at Football Outsiders, or in Football Outsiders Almanac, aren't always necessarily the same as the numbers you will find over on pro-football-reference. For example, if you look at our stats page for quarterbacks, the column for "passes" does not include clock-killing spikes and the column for "runs" does not include kneeldowns. Nonetheless, these stats on FO are still essentially play-by-play stats, as they are counting plays as officially reported by the league. (The official gamebooks will usually note a kneeldown, and often marks a clock-killing spike differently from other incomplete passes.)

Many of our individual defensive stats also qualify as play-by-play counting stats. Total Defensive Plays, Stops, Stop Rate, Defeats, and Average Yardage of Tackles are all based solely on the information in the official gamebooks, not the game charting.

Finally, I would include in this category all the "average stats" that don't adjust numbers or give different weights to different stats in order to create some kind of new rating. Yards per carry, yards per reception, touchdown-to-interception ratio, completion percentage, and so forth.

Play-by-Play Formulas take the official NFL play-by-play data and use math to create new ratings that try to adjust for context or measure multiple skills with a single, more simple number. The most common play-by-play formula is NFL passer rating. It isn't common around here, but it gets used everywhere else. Win Probability statistics used on various sites are essentially play-by-play formulas, since they analyze the last few years of play-by-play to determine each team's chances to win based on the game situation and time remaining. P-F-R's Simple Rating System and the stuff Jeff Sagarin does would also fall into this category.

Most of the stats you find at Football Outsiders, especially during the NFL season, fall into this category. That includes DVOA, DYAR, Adjusted Line Yards, and all our special teams stats. This is a very important point about DVOA: It is based entirely on the official play-by-play. It does not incorporate game charting except in a couple of very minor ways. Unfortunately, this limits us at times. For example, DVOA and DYAR for receivers are not adjusted for dropped passes, since drops are not marked in the official play-by-play. However, I've seen various comments on the Internet mentioning that DVOA is "subjective" because it is based on the information we get through game charting, and it is important for people to understand that the play-by-play stats are very different from the charting stats.

DVOA is an objective metric, unless you have a very strict definition of "subjectivity." It looks at every single play and the yardage gained, then compares to a baseline based on down, distance, and other elements of the situation. Then we add the opponent adjustments, which are based on all the plays the opponent has had during that season. Nearly everything involved in DVOA is strictly based on the official NFL play-by-play logs.* Plays are scored on "success points," but we've spent a lot of time working on the formula for "success points" to make it correlate as closely as possible with wins. Those success points are compared to baselines which aren't chosen out of thin air; they represent the average performance of every NFL team over the last few years in the situation being measured. Likewise, the cutoffs for measuring Adjusted Line Yards are based on regression analysis from multiple years worth of NFL data.

We work very hard to make sure that our stats do not specifically favor any team or player. All the calculations that we've done to create the DVOA formula are based on trying to improve the accuracy for every team in the league over a multiple-year period. This is one reason why we tend to dismiss suggestions or complaints that revolve around the rating for one specific team. The goal is to create a rating that does the best job of measuring all 32 teams, and it is unlikely that some flaw in the system is affecting one team but none of the other 31. However, we tend to seriously consider any patterns showing that multiple teams in multiple years share a certain quality and are all overrated or underrated. We're always looking to be more accurate, because, well, nobody likes to be wrong. Of course, a few subjective decisions had to be made during the development of DVOA and other metrics. We had to decide exactly where to draw the baselines for an "average play," and what adjustments to make or not make. These decisions were made as carefully as possible.

We are often asked if DVOA is supposed to measure how well teams have played in the past or how well they are likely to play in the future. The answer is "sort of both." The idea is to try to measure some kind of platonic ideal of how good a team is right now. 

When I try to improve DVOA, I'm fiddling with what splits I use to create baselines, or what the added adjustment is for touchdowns, or various other things. In comparing each version of DVOA to the others, I'm trying to maximize (and balance) two things:

  • How well are we measuring the long-term quality of the team. That means looking at the correlation of each team's DVOA from year to year. Sometimes I also look at comparing the first half of the season to the second half, or comparing odd weeks to even weeks, to try to get the most consistent rating that filters out the effects of circumstance and random chance.
  • How well we are measuring winning. This means trying to get the best correlation between wins and the non-adjusted rating (i.e. no changes based on opponent strength, and fumbles are only negative events if they end in a turnover rather than recovery by the offense).

The resulting metric doesn't perfectly measure how well a team has played in the past, or how well they will play in the future, but it does a good job of balancing the two. For example, we know that red zone performance tends to revert to the mean over time, so if we wanted a rating that specifically was meant to predict future results, we would make red zone plays less important. In reality, however, DVOA and DYAR consider red zone plays as more important than plays on the rest of the field, because better red zone performance means the team is going to win more games. We give a bonus on plays that score touchdowns... but at a certain point, we stopped raising that bonus because the correlation to winning was no longer improving and the year-to-year correlation started to get weaker.

One thing I've said in interviews, and I'll say it again here, is that there is a pretty good chance that DVOA is not the absolutely, positively most accurate power rating out there. P-F-R's Simple Rating System gets similar results and -- not a surprise, given the name -- is a lot more simple. However, obsession with comparing the small differences in accuracy between DVOA and other power ratings misses the point. The beauty of DVOA is that it is derived from play-by-play, and therefore can be broken up into any grouping of plays you want: by down, by player, by location on the field, and so forth. That kind of matchup analysis is very important to what we do at Football Outsiders.

Finally, I should point out that DVOA is not the only metric we use around here. Economists who get into sports analysis often try to drill player value down to a single uber-stat, because only a single stat that you can use for all players allows you to compare player value to dollars. However, the philosophy of Football Outsiders is the exact opposite. Our goal is more stats, not fewer. Each stat we use tells part of the story about why a team is playing well or playing poorly. DVOA gives you an overall picture but to see the details you also have to use Adjusted Line Yards, Adjusted Sack Rate, the defensive stats like Stops and Defeats, and the game charting stats.

The special teams statistics are a good example of a place where some people might get confused by terminology. We turn the total value of special teams into a DVOA rating so that we can combine it with offense and defense, but the individual ratings for each aspect of special teams are not based on "success points" like DVOA (because they don't have to measure both progress towards a first down and progress towards the goal line) and they are based on totals, not percentages.

*We do incorporate game charting into DVOA in a few small ways: to determine whether an aborted snap was a bad handoff or a blown pass play, to mark squib kicks, and occasionally to change a backwards pass, which is officially a running play, so it counts as a pass play in our stats. There's one other somewhat subjective element in the FO play-by-play formulas, which is that we mark end-of-half interceptions (and a few almost-end-of-half interceptions) as "Hail Mary" and do not count them as turnovers. People might have different opinions about, for example, whether a 40-yard interception thrown on third down from midfield with 20 seconds left should count as a "Hail Mary" or not.


Game charting statistics are the numbers we get from an armada of volunteers watching games on tape and then marking down things that aren't tracked by standard play-by-play. We're not the only people doing this, of course, but as of right now all the various game charting projects out on the Interwebs are separate from each other, which means the stats you see from FO may be different from the stats as tracked by K.C. Joyner, or Stats Inc., or others. We're all working off the same limited television camera angles, so we're all making mistakes. For the time being, there's no alternative.

Are the game charting stats objective or subjective? Well, sort of both. Everything we're asking charters to mark has a specific definition. A screen pass is a screen pass. A scramble is a scramble. A dropped pass, theoretically, should be easy to identify. The problem is that a lot of the events we're marking in the game charting end up somewhere in between one designation and another. If a wide receiver has to jump for a pass, and gets his hands on it but loses control, is that pass overthrown or dropped? When a linebacker comes late on a blitz, does he count as a pass rusher on a delayed blitz? Or did he just notice that the running back he was covering was blocking, in which case the play called for him to rush the passer? Unfortunately, we have no choice but to ask our volunteers to make decisions like this. Some charting stats involve more subjectivity than others -- for example, identifying a draw or a screen is a lot more cut and dried than deciding on a quarterback hurry.

What is important, however, is that the game charting project is an attempt to measure events. It is no different from the official scorers assigning tackles or intended receivers on each play, two items in the official play-by-play which can be tough to discern. No players are graded, and we try not to ask the game charters to assign blame on plays unless there is a specific negative event: a broken tackle, a blown block, or a dropped pass. If the offensive line blows the blocking call and leaves a pass rusher unblocked, we don't try to figure out which lineman had the assignment; we just mark "Rusher Untouched."

For the most part, game charting stats are counting stats. A defensive player will have a certain number of hurries, a certain number of dropped interceptions, a certain number of broken tackles, and so forth. There are a few game charting formulas as well. We take game charting stats and adjust them for context, much like we try to adjust the standard play-by-play stats for context. An example would be "Adjusted Yards per Pass," where we adjust the yards allowed by defensive players in coverage based on the quality of the receiver involved.


The category of "historical statistics" has a lot of crossover with the category of "play-by-play statistics," of course. However, when we think of play-by-play stats around here, we tend to think of stats from the years for which we have play-by-play: 1993-2010. Historical stats, of course, would include yardage and touchdown totals going all the way back to the start of the NFL. It also includes stuff like draft information, game scores, and "official" (i.e. team-reported) heights and weights.

There are also formulas developed from historical statistics, of course. That would include Adjusted Games Lost, Draft Value (as based on the infamous "draft value trade chart"), and P-F-R's Approximate Value.


Projections are the formulas that try to predict how well a team or player will do in the future. In general, these projections are based on the idea that the best team in the league doesn't always finish with the best record, and the 20th best team in the league doesn't necessarily finish with the 20th best record. There are a lot of random elements in a football season, and there are a lot of intangibles that we can't project. So the projections represent a range of possible likely results for each team or player. The numbers you see in the KUBIAK fantasy football projections spreadsheet are the average of those possibilities, not a definite prediction of the exact numbers we expect from a player. That's why we say that a player's specific rank in our projections isn't as important as the overall sense of whether KUBIAK thinks he will be better or worse than the past, and whether KUBIAK thinks he is overrated or underrated by conventional wisdom.

The projections we produce before the season are not "DVOA." They are "DVOA projections." People do sometimes get confused between the two. I've seen comments to the effect of "FO's preseason projections are based on an analysis of every play." That's partly true, since various cuts and splits of DVOA go into the projection system, but the projections also consider a lot of other variables such as age and experience at various positions, team pace, recent drafts, free agent additions, and so on.

It's also important to note that there are two main projection systems around here for the NFL: the team projections, and KUBIAK. KUBIAK is the term we use to refer to the fantasy football projections, and the team projections are among the variables used in KUBIAK.


As I often say, intangibles are called intangibles because they are intangible. We don't do stats that measure leadership or team chemistry. That doesn't mean these things don't exist, just that we can't measure them. Leadership and chemistry can develop over time and will affect other teammates, and it is hard for us to guess how. "Heart," on the other hand, is just another element of a player's performance, no different from strength, speed, or ability to learn the playbook. Contrary to popular belief, there is a stat that measures heart. There are a lot of them, in fact. They are called "stats." Most of the guys who have "heart" also have pretty good stats. Fred Biletnikoff used to smoke a pack of cigarettes, throw up for 20 minutes, and then go out and shred every defense he faced. He had great numbers. Anquan Boldin literally took the field three weeks after a broken face. His numbers are pretty good too. But if badly-rated Player X has so much heart, why didn't he use it to maybe get a few more first downs last year?


#1 by Danish Denver-Fan // Aug 31, 2010 - 9:09am

This is a somewhat apropriate time to ask: Why KUBIAK? Why not DEL RIO (Another "obscure" small-team coach) or BRISTER (another Broncos backup QB)? Or is it some sort of clever acronym?

#8 by dbostedo // Aug 31, 2010 - 10:26am

It could have been BRISTER, except that he was a starter on the Steelers for a while. Anyway, per Wikipedia which I think is correct in this case :

"The name is derived from the current head coach of the Houston Texans and former offensive coordinator for the Denver Broncos, Gary Kubiak. The name is an homage to PECOTA, the player forecasting system developed by Nate Silver of Baseball Prospectus. Kubiak's name was chosen because he had been a relatively obscure backup quarterback for the Denver Broncos, similar to the role played by MLB player Bill Pecota."

#9 by Aaron Schatz // Aug 31, 2010 - 10:42am

I actually came up with it in Cambridge Corner Clubhouse one Sunday in 2004, the year before we released the first KUBIAK projections. I think it was Week 3, I remember Rex Grossman getting hurt that week and there was a great LT-to-Brees option touchdown. I was with my friend Jordy Singer who is a Denver fan, and we were talking about the Denver offense and joking about how Kubiak sounded like he should be one of those big mainframe computers from the 50s, like ENIAC and UNIVAC. I knew at that point we would be doing PFP at the end of the season, and I put the computer thing together with Bill Pecota and came up with the idea of calling the projection system KUBIAK.

Kubiak then f'ed up the joke by getting hired as a head coach. At the time we created the system, the conventional wisdom said he would be a coordinator lifer, like Jim Johnson or Monte Kiffin.

#2 by Will Allen (not verified) // Aug 31, 2010 - 9:13am

Your refusal to pay me royalties, for the data generated by my finely calibrated swaggerometer, and my precisely tuned justwannawinometer, clearly shows that you have never played the game. I scoff in your general direction!

#5 by nat // Aug 31, 2010 - 9:59am

If you exposed your justwannawinometer to recordings of Peyton Manning talking about Mike Vanderjagt, it would explode.

#4 by drobviousso // Aug 31, 2010 - 9:28am

"How well are we measuring the long-term quality of the team. That means looking at the correlation of each team's DVOA from year to year. Sometimes I also look at comparing the first half of the season to the second half, or comparing odd weeks to even weeks, to try to get the most consistent rating that filters out the effects of circumstance and random chance."
Can you talk about how you avoid over-fitting, especially when, for example, a team like Washington gets a new HC, defensive scheme, and QB (and maybe offensive scheme, not sure)?

#10 by MJK // Aug 31, 2010 - 11:42am

Same thought occurred to me when Aaron was talking about how they try to improve the stats by trying to improve consistency within a year and from year to year.

Yes, it's implausible that a team should be first ranked in DVOA one year and 26th in the next...but some variance is expected.

I think the fundamental problem is small sample size. Normally, when you're dealing with a noisy dataset, there are plenty of data points that allow you to discern temporally-shifting trends from noisy outliers that you want to avoid overfitting to. However, in football, there simply isn't enough data to discern this.

I, too, am curious where Aaron and company draw the line between trying to come up with stats that allow temporal variance but that also try to maintain consistency.

#11 by Thomas_beardown // Aug 31, 2010 - 11:48am

I'm going to guess they make sure any changes improve correlation for the league as a whole, and not just for single teams.

One team dropping from 1st to 26th shouldn't be a huge deal, as long as every team isn't moving that much.

#6 by Raiderjoe // Aug 31, 2010 - 10:10am

Good artocle. Tells readers old and new abiut site. So if new you should reaf above thing. Then you knoe how dvoa and dave work and then can post abiut all types of football topics. Also buy fottball Outsiders Almanac 2010. Book good. Contains many ibteresting stats and articles and projections for your fatasy football dtatft

#7 by Scott P. (not verified) // Aug 31, 2010 - 10:25am

And if you're new around here, you should also read the above comment. Tells readers old and new about Raiderjoe. Then you know how he works, and then can post about all types of FO readers.

#12 by MJK // Aug 31, 2010 - 11:49am

We work very hard to make sure that our stats do not specifically favor any team or player.

This statement is a little inaccurate, and is likely to open the door to people that will say things like "then why is Philadelphia ranked so highly every year".

Obviously, because the stats here are designed to correlate to wins, they will tend to favor a team or player that does things that tend to contribute to winning. What they do not do is favor any specific team or player because of the creator's subjective opinion over what's important or valuable. It's a subtle distinction, but an important one to make, especially to people that aren't really into math or statistics, and it's one of the key differences between something like the Aikman efficiency ratings or QB rating and DVOA. In the first two (I think, although I'm not actually sure), things like completion percentage or TD:INT ratio are assigned arbitrary weights because they seemed like a good idea to the people inventing the rating system. I could hypothetically create a rating system, and if I'm a fan of the Colts or the old K-gun, give an increased weighting to teams that emphasize a no-huddle offense. In DVOA, I believe the weights and splits are carefully derived by regression analysis to maximize correlation to wins.

Hence you will pick up things like, if a team tends to do things well that lead to more wins most of the time, DVOA will tend to favor that team even if it has some bad breaks and doesn't win a lot of games in a given season.

#13 by ReiDeBastoni (not verified) // Aug 31, 2010 - 3:14pm

I have a question: you mentioned Hail Mary interceptions don't count negatively, because, I'm assuming, the other team doesn't have an opportunity to make the team pay for the "mistake." Is there currently or plans for making Successful Plays near the end of a half worth more than usual? I think it would make sense that they contribute towards wins at a higher weight than the same successful plays made in the first quarter. Thanks.

#14 by Anonymous1 (not verified) // Aug 31, 2010 - 3:40pm

I can see how those types of plays would make the team more likely to win that specific game, but would they have any predictive value?

#15 by ReiDeBastoni (not verified) // Aug 31, 2010 - 3:44pm

I'm sure that certain teams *cough* Colts *cough* are consistently better at 2 minute drills than others.

#16 by Thomas_beardown // Aug 31, 2010 - 4:00pm

2 minute drills are still counted, just not hail marys

#17 by Joseph // Aug 31, 2010 - 5:22pm

Hail Mary INT's are just counted as incompletions, as 99% of the time the predictive value of the 2 plays are the same. The only difference is whether the DB caught or swatted away the pass. The only reason for a Hail Mary is the (lack of) time left on the clock.

Aaron, I mentioned this in some other thread, but have you checked the possibility of counting red zone plays as worth "more" than other plays? At the least, the retrodictive part of DVOA should go up, right?

