Writers of Pro Football Prospectus 2008

Most Recent FO Features


» BackCAST 2018

The question is not whether Saquon Barkley is the best running back in this draft class. The question is whether any running back, even one as good as Barkley, warrants a top-five draft selection in the NFL in 2018.

30 Aug 2004

Zebras on Parade

by Ryan Wilson

Way back in April Peter King wrote in Monday Morning Quarterback about the inconsistencies that exist among NFL officiating crews in terms of how many flags each crew throws per game. His primary concern was that Walt Anderson's crew, over the course of the 2003 season, had 91 more accepted penalties than Gerry Austin's crew. King was quite clear about why he found this problematic:

"Maybe I pick weird things to get upset about, but I think this is ridiculous. One or two games you can explain away, but on only one weekend that both of these crews worked did the Austin crew have more accepted penalties than Anderson's. Highest numbers of penalties called by Anderson and Co.: 33, 26, 19, 19, 18, 18 and 18. Austin's men had one game with more than 14 called fouls in a game -- they called 21 in Week 2."

Now this says nothing about which crew made more correct calls, but it speaks volumes about the disparity in officiating across crews -- and that's a problem.

Or is it? When I first read this article I was convinced that having the Anderson's crew throw 57% more flags than Austin's crew would lead to some incongruous results over the course of the season. After pouring over the data for the 2003 NFL season, that's not what I found.  There doesn't seem to be much of a relationship between the number of penalties called and the final score. On average, the officiating crew has minimal impact on the outcome of the game.  Obviously there are games that stick in our minds for no other reason than the officials blowing a major call, but in general, the officials have very little influence over wins and losses.

Now that I have your attention, let's take a look at how I came to these conclusions, and then you can decide for yourself if it seems plausible.

Table 1 lists officiating crews from the most penal to the least penal. The percent difference variable is, not surprisingly, the percent difference between a particular crew's average penalties per game versus the league average (the NFL average was 13.2 penalties per game). For example, Johnny Grier's crew called 20.6% more penalties than the average crew (15.9 penalties versus 13.2 penalties per game).  Penalties called per game here vary slightly from the MMQB article, due to different data sources, and this is accepted penalties only.

Table 1: Officiating Crews Ranked by Penalties Called
Crew Chief Total Yards Penalties called
per game
Anderson 257 2193 17.1 30.6%
Grier 238 2001 15.9 18.4%
Winter 213 1666 14.2 8.4%
White 211 1817 14.1 7.7%
McAulay 224 1864 14.0 6.9%
Triplette 210 1813 14.0 6.9%
Morelli 217 1972 13.6 3.1%
Nemmers 197 1550 13.1 0.0%
Blum 194 1592 12.9 -1.5%
Corrente 193 1617 12.9 -1.5%
Hochuli 190 1610 12.7 -3.8%
Carollo 188 1944 12.5 -5.3%
Coleman 179 1472 11.9 -12.9%
Carey 173 1457 11.5 -9.1%
Leavy 172 1400 11.5 -12.2%
Kukar 167 1315 11.1 -17.5%
Austin 166 1394 11.1 -18.3%

The first thing that sticks out is that Anderson's crew has 31% more accepted penalties than the average crew -- that works out to four more penalties a game and 63 more penalties a season. Statistically, both Anderson's and Grier's crew call more penalties than the average crew -- enough that it raised some flags (pardon the pun) in the office of Mike Pereira, the director of officiating.

I'm all for individuality in sports, but I'm not sure officiating is where it should show itself. There should be some uniformity in the way games are called, especially when you're talking about one crew calling 30% more penalties than the average crew. When King brought this to Pereira's attention, here's what he said:

"'That certainly sets off an alarm in my head... The one thing we strive for in our 17 crews is consistency. And that's going to be a big point of emphasis for our crews and our officiating department this offseason.'  To that end, Pereira said the league would emphasize again the same definition of every penalty with each crew. That began with a four-day session this past weekend with the 17 crew chiefs. It will continue with the annual three-day officials clinic in July for all 120 zebras, then in their trips to training camps this summer."

I had to read this next paragraph twice, because on the surface it's both funny and disconcerting.

"In addition, Pereira thinks... high-tech tools will help make penalties more uniform. Officials have a private Web site now that has video capabilities and allows each official to go online at any time in or out of season to check the league's catalog of calls. 'If an official wants to see all the illegal contact calls, for instance, he can go to the site and watch them all, one after another,' Pereira said. Last season, 119 of 120 officials used the site."

Let me get this straight, officials have the capability to view video footage of what illegal contact looks like? Shouldn't that be part of the pre-screening process -- I mean, really, is it too much to ask potential officials to actually know the rules before they're hired?

But seriously, I think it's important the Pereira realizes this as a problem and is doing something to address it. If he wants to take it one step further, maybe he can print up some flashcards with each infraction printed on one side and the severity of the penalty on the back -- that way officials can whip them out during the game, consult the appropriate flash card and make the correct call. I'm kidding of course (unless you think it'll really help), but doesn't there really need to be more structure in how officials call games if the number of penalties has some effect on the outcome? Let's take a look.

I first set out to see how penalties per game varied with points scored per game. Surprisingly, there was absolutely no relationship. In fact, the correlation between total points scored in a game and total penalties in a game was 0.07. Graphically, it looked like this:

Let me explain why I thought there might be a relationship between total points scored and the number of penalties called. It seems reasonable that a lot of penalties in a game might lead to more points if these penalties were weighted in favor of the offense (they might sustain drives, move a team into field goal position, etc. And it could also be the case that a crew that calls very few penalties might actually miss some calls (pass interference, defensive holding, etc.) and the result would then be lower scores. After some thought, I realized one could also weave a story that favored the defense in a heavily-flagged game and favored the offense in a lightly-flagged game. Of course all this did was just further cloud my thinking about the relationship between penalties and performance.

Despite my growing confusion, I trudged on. Next I decided to try and break down the numbers further. We've already established that certain crews call more penalties than other crews. But to truly appreciate these differences across teams, we'll need to manipulate the data a bit. I came up with a measure that looks at how many penalties an officiating crew calls in the games they work when compared to how many penalties other crews call in games involving the same teams over the course of the season. I disaggregated total penalties per game into penalties called against the winning team and penalties called against the losing team. I called these new measures WPR and LPR (cleverly standing for Winning team Penalty Ratio and Losing team Penalty Ratio).  At the risk of confusing nearly everybody, let's look at an example.

Let's look at the Week 10 game between the Falcons and the Giants worked by Walt Anderson's crew. For the year, the Falcons averaged 13.8 penalties a game and the Giants averaged 15.3 penalties a game; in the Week 10 game 16 penalties were called. By dividing the total penalties in this game (16) by the Falcons' average penalties per game for 2003 (13.8), we have a measure of how many more (or fewer) penalties are a result of a particular officiating crew (and we do the same calculation for the Giants).

Winner Loser Total Penalties ATL avg. penalties NYG avg. penalties WPR LPR
ATL NYG 16 13.8 15.3 1.16 1.04

In English, this means that Anderson's crew called 16% more penalties in this game than what the Falcons averaged over the course of the season. The Giants had a LPR of 1.04, so they experienced a 4% increase in penalties in this game when compared to their season average.

WPR and LPR (as well as a number that averages them together) now give us some idea about how many more (or fewer) penalties a crew calls against a team when compared to how many penalties other crews called against the same team over the course of the season.  WPR and LPR were applied to every game of the 2003 season based on the officiating crew and here are the results:

Table 2: Penalties, Specific Crews vs. Teams' Season Averages
Crew Chief WPR LPR Average
Crew Chief's
Avg. Pen./G
Anderson  1.39 1.18 1.29 17.1 6.7 6.7 13.3
Grier 1.09 1.14 1.12 15.5 7.1 6.9 13.9
Winter 0.99 1.13 1.06 14.2 6.7 6.7 13.4
White 1.12 0.99 1.05 14.1 6.7 6.7 13.4
Triplette 0.94 1.17 1.05 14.0 6.8 6.6 13.3
Morelli 1.13 0.96 1.05 13.5 6.5 6.4 12.9
McAulay 0.96 1.13 1.04 14.0 6.6 6.8 13.4
Nemmers 0.82 1.15 0.99 13.1 6.5 6.7 13.2
Blum 0.91 1.07 0.98 12.9 6.4 6.7 13.1
Corrente 0.88 1.07 0.98 12.9 6.6 6.6 13.2
Hochuli 0.94 0.98 0.96 12.6 6.4 6.8 13.1
Carey 1.02 0.89 0.95 11.9 6.1 6.4 12.5
Carollo 0.90 0.98 0.94 12.4 6.6 6.6 13.2
Leavy 0.88 0.87 0.87 11.5 6.3 6.9 13.2
Coleman 0.84 0.86 0.84 11.4 6.8 6.7 13.5
Austin  0.69 0.91 0.80 10.7 7.0 6.5 13.4
Kukar 0.78 0.80 0.79 10.8 6.8 6.9 13.6

One thing that sticks out immediately is the disparity between WPR and LPR among crews. For example, Anderson's crew called 39% more penalties against the winning teams than what these teams averaged over the course of the season. They called only 18% more penalties against the losing teams than what these teams averaged over the course of the season.

Austin's crew was at the opposite end of the spectrum. They called 31% fewer penalties against the winning teams than what these teams averaged over the course of the season. But Austin's crew called only 9% fewer penalties against the losing teams than what these teams averaged over the course of the season.

What's interesting about this observation is that maybe the question isn't how many penalties a crew calls in aggregate, but instead what might be important is the disparity in penalties called between the winning and losing teams.

To test this last theory, I looked at the overall winning percentage of teams by officiating crews. For example, Anderson's crew officiated 15 games in 2003 and the average overall winning percentage of the 15 winning teams was 0.588. Likewise, the average overall winning percentage of the 15 losing teams was 0.392 (Just to be clear, Anderson's crew officiated 15 games with 15 different winners. The 0.588 is the average winning percentage of these 15 teams over their respective 16-game schedules. The 0.588 is not the winning percentage of the winning teams when Anderson's crew worked their games - - that winning percentage would be 1.000!).

Table 3: Season Records vs. WPR/LPR
Crew Chief Winners'
Win Pct.
Win Pct.
Anderson 0.588 0.392 1.390 1.180
Grier 0.520 0.391 1.090 1.140
Winter 0.538 0.496 0.990 1.130
White 0.558 0.479 1.120 0.990
McAulay 0.602 0.426 0.960 1.130
Triplette 0.488 0.484 0.940 1.170
Morelli 0.629 0.400 1.130 0.960
Nemmers 0.604 0.421 0.820 1.150
Blum 0.604 0.388 0.910 1.070
Corrente 0.467 0.425 0.880 1.070
Hochuli 0.592 0.474 0.940 0.980
Carollo 0.586 0.465 0.900 0.980
Coleman 0.632 0.489 0.840 0.860
Carey 0.598 0.430 1.020 0.890
Leavy 0.571 0.438 0.880 0.870
Kukar 0.625 0.438 0.780 0.800
Austin 0.629 0.430 0.690 0.910
Correlation with win pct. -.164 -.198

The thinking is that perhaps there would be a relationship between the team winning percentages and penalty ratios across officiating crews. For example, what if the winning teams in which Gerry Austin's crew worked had an overall average winning percentage of 0.350? That would immediately raise some concerns that Austin's crew was throwing too few flags per game -- and that may have in part been responsible for the weaker teams winning an unusually large proportion of those games.

Following that logic, if we assume that officiating crews are randomly assigned to games, and that WPR and LPR are relatively similar, then we should expect the eventual winning teams winning percentage (and the losing team's winning percentage) to also be relatively constant across officiating crews.

Well guess what? Statistically, there is no difference between winners' (and losers') winning percentage when considering WPR and LPR. Specifically, the correlation coefficient between the winning team's winning percentage and WPR was -.164; the correlation coefficient between the losing team's winning percentage and LPR was -.198.

Stated differently, a team's winning percentages remains stable across officiating crews, despite the fact that some crews throw more flags than other crews. And the fact that WPR and LPR differ across crews is negligible enough not to have any affect on whether teams win or lose more often than they would otherwise.

Of course any analysis has some shortcomings and this one is no different. For starters, I didn't consider the type of penalties called (I hope to address that in the next iteration). Maybe it's the case that Austin's crew makes a disproportionate number of pass interference calls while Anderson's crew has a predilection for holding calls. And while WPR and LPR tell us how many flags were thrown against the winning and losing teams, we don't know what the specific penalty was. Another concern might be when the penalty occurred during the game -- and maybe more importantly, did the penalty take points off the board. This should give a clearer representation of the relationship between penalties and the outcome of the games. And while there is still a lot of work to be done, I think this is a good first step -- and at the very least it sheds some light on the role (or perhaps non-role is a better word here) penalties have on team performance.  You also might want to read a previous Football Outsiders article showing that winning teams often have more penalties than losing teams.

Despite Peter King's concerns that the number of penalties called per game vary widely by crews, the actual effect on the final game results (at least statistically) are not significant. In fact, this should be good news for those critics who felt the point of emphasis on the pass interference rules might result in an unfair offensive advantage. And while it might lead to an overall increase in total points scored, this new enforcement of the rule should have minimal effect on which teams will win and lose.

I don't think these findings excuse the fact that several crews seem to have a different interpretation of the rules than the majority of crews, but if it's any consolation, overzealous officiating has had minimal impact on the outcome of games.  Of course, don't try to convince those Giants fans still fuming over the officiating faux pas from the 2002 playoffs against the 49ers that penalties don't matter.

Posted by: P. Ryan Wilson on 30 Aug 2004