by Zach Binney
Introduction and Motivation
When I first sat down to analyze injuries in the NFL a couple years ago, my goal was to immediately build a fancy model that would let teams see into the future, predicting exactly who would get hurt and when. It would refine the use of the broad "injury prone" tag, letting smart teams (read: those who would listen to me) find great value on guys who had been written off by other teams as made of Gatorade-soaked tissue paper. It would take into account player-level clustering, team-level effects, and time trends. It would tell us in a single color-coded number who was going to go down, allowing teams to approach free agency and the draft with robust historical data rather than the subjective opinions of (best case) a training staff and team physician or (worst case) a general manager with no medical training.
What I quickly realized was that most people can't quote the overall risk of an injury in the NFL. How many players is a team likely to lose in any given year? They had no idea. They certainly couldn't break it down by position. They might have a vague sense that injury risk rises with age, but how much? They probably know (if they read the Adjusted Games Lost column) that injuries have increased over time, but they don't know what sorts of injuries are driving that rise. They might have a sense that ACL tears are bad, and if they pay a bit more attention they might know to tear their hair out when their team announces any injury with the word "triceps." But how nervous should they really be about a hamstring, or a high ankle sprain?
My background is in epidemiology -- most people think we're skin doctors, and the rest imagine us as the "virus hunters" who go out into West Africa and search for Ebola. To be clear, I am way, way too cowardly for that. At a higher level epidemiologists want to do two things:
1. Describe the distribution of diseases (for example, injuries) in a population (football players) and,
2. When we see differences within populations (variation by position or team or year), ask and analyze why these differences exist. On my first day I was gunning to ask and analyze why injuries happen -- and, by extension, predict them -- but skipped right over the describing injuries phase, mostly because I figured that had already been done. It really hasn't.
Describing injuries seems basic, and it is. This post is going to be nothing but simple counts and averages, but it's incredibly important work, and I hope it's interesting as well. It's exciting to have a deeper understanding of what's likely to happen when your star linebacker -- or your opponent's stud left tackle -- goes down with a "knee" or "ankle" injury. Also, with the (justified) focus on concussions lately, the attention on the toll professional football takes on the human body has never been higher. It's a great time to get smarter about that toll. So, read on… or not, I've already got your page view.
One more thing: none of the below would have been possible without FO's foresight in collecting detailed injury data for the past 15 seasons. It's a basic point, but in analytics we're always limited in the questions we can answer by the data that we have. FO has some amazing data, and the years of interns and data managers who collected it are the unsung heroes of this piece. It's good to remember that.
(Ed. Note: I should point out that we didn't start collecting this data until 2007 or so; we then went back and filled in past data going back to 2000. Proper gratitude goes to Bill Barnwell, who came up with the idea of starting the injury database in the first place back when he worked for Football Outsiders, and did a lot of the initial work to go back and fill in those older years. Since he left for Grantland, the injury collection has been managed first by Danny Tuccitto and then by Scott Kacsmar. -- Aaron Schatz)
This data was gathered primarily from injury reports. Scott Kacsmar and others have written about this in AGL columns and elsewhere, but we rely on teams to truthfully and correctly report injuries. We know there's a lot of variation in how teams report -- both whether they even put a guy on the list, and the level of detail they're willing to give out (a true MCL sprain could be a "Knee," "Knee Sprain," "Knee -- MCL," or "Knee -- MCL sprain" in our database) -- so all these analyses are contingent on that data being valid. Garbage in, garbage out.
Additionally, there are some player injuries not included in these statistics because the players never played an NFL snap or were simply deemed too irrelevant. For example, an undrafted rookie who sprains his knee in training camp and gets cut from the 75-man roster isn't counted in these analyses. Thus, our total number of injuries is probably an underestimate of the true number of injuries and the toll professional football takes on the human body.
Finally, the week that an injury "occurred" is the first week the player is listed on the injury report. So if a player appears with a groin injury in Week 4 in our dataset, we do not know exactly when he suffered that injury: it could have been in the Week 3 game or in practices leading up to Week 4. We can only pinpoint injury timing to within about a week (two weeks if the player was on bye), and we can't tell you at all the situation in which it occurred.
For my money, while I admit this data is far from perfect, I think it captures the 80/20 and gets us pretty close to where we need to be.
Organizing and Categorizing NFL Injuries
Now that that's out of the way, how can we think about injuries in the NFL? With something as complex as injuries, one of the first hurdles you hit is how to even organize the data so we can get information and insights out of it. Even after data cleaning, there were almost 380 different injury types in our database. That's not pretty to look at. Mimicking injury reports, I've wrapped these up into higher-level locations in the body: head, foot, wrist, etc. For some of the most common injury locations -- knees, ankles, and shoulders among them -- I've made a few additional splits. I also made a handful of additional on-the-fly splits (for example, separating out broken legs and arms from other leg and arm injuries). These categorizations are subjective, but I've tried to strike a balance between using enough detail to capture subtle variations in injury types while not creating too many categories to visualize or cutting down to sample sizes that are too small for drawing reasonable conclusions. My final scheme includes 50 categories, listed below in Figure 2. (Fifty is still a lot of categories, so Figure 2 actually has two parts.)
If any experts out there have a problem with my categories, it's easy enough to change! This is just a start. Even if we had perfect categories, though, there would still be some significant misclassification of injuries due to variations in how teams put players on the injury report and the amount of information they give out.
Distribution of Weeks Missed due to Injury
A good first question for our data is: How many regular-season weeks do NFL players miss due to injury every year? (Note that we're ignoring playoffs in all these analyses.) Let's look at Figure 1 below. The number of player-seasons with one or more weeks missed comes from the FO injury database.
The first thing that jumps out of this chart is that a majority of players (61 percent) won't miss any time in a given year. Another thing that jumps out is that this data is heavily right-skewed (that is, there are way more players who miss little or no time than who miss extended periods). Sneak preview: this relationship holds pretty much no matter how you cut the data by age or position. Not rocket science, right? But this has important implications when trying to predict injuries:
- Trying to give a projection for the average (mean) number of weeks a player will miss is pretty much meaningless with this kind of distribution since the data is horribly asymmetric. Plus, have you ever seen a coach's or trainer's eyes glaze over when you tell him his guard is likely to miss 0.73 weeks? It makes no sense.
- WARNING: I am still going to use average weeks missed to compare the severity of different injury types. The average is still useful for that, don't yell at me (as previously noted, I'm a coward). I'm just saying it's not useful for projecting how much time a player is likely to miss.
- If we try to use the median instead -- a typical approach when dealing with skewed data -- for the vast majority of players we'd say their median weeks missed is going to be 0 since way over half of similar players missed no time. But does that mean they're very unlikely to miss any time? Not at all. They might have a one-third chance of missing significant time, but because of this distribution neither the average nor median will capture that.
- A better statistic might instead be the risk of missing any time vs. no time, or the risk of missing more than four weeks, vs. one to four weeks, vs. no weeks. That does a good job of dealing with the skewed data while providing useful projections.
Toll of Injuries -- Total Weeks Missed and Injury Severity
From 2000 to 2014 (15 seasons), 30,186 injury reports have been filed, leading to 51,596 regular-season weeks missed, an average of 1.71 weeks missed per injury. That's a pretty hefty toll!
I made a big deal about categories above, so let's look at which kinds of injuries cause the most damage in the NFL regular season. There are two dimensions to this question: damage at the population level (total weeks missed), and the individual level (injury severity).
Figure 2 is a busy chart -- actually, two of them -- but it conveys a ton of information. First a note on how to read it: the blue bars are the total player-weeks missed over the last 15 years for each injury type on the x-axis. The red bars are the average weeks a player was out with each type of injury (i.e., the severity). If you want to calculate the number of injuries of a given type that were sustained since 2000, you can divide the blue column by the red column. For example, ACLs led to about 4,800 weeks missed, at about 10.5 weeks per injury -- about 4,800 weeks/10.5 weeks/injury = 450 ACL injuries have occurred since 2000.
Now, what can we learn from this chart?
Population level damage (Total weeks missed):
- At the population level (all NFL players), general knee injuries (non-ACLs or other tears) do the most damage, by virtue of their sheer frequency (about 4,500 since the 2000 season). They have cost players almost 7,600 weeks over the last 15 years. Although their severity is about average (1.7 weeks missed per injury), they happen so often their damage builds up quickly. No surprise here.
- Some other common culprits round out the top 10 in all-player damage: ACL, hamstring, shoulder (non-tears), ankle (non-breaks or sprains), foot (non-breaks or Lisfranc injuries), groin, and Achilles injuries.
- I was a little surprised to see back injuries at No. 8. I didn't think of them as that common until looking at the data.
- Coming in at No. 10? Concussions. Yes, even Marcia Brady knows they are a pretty big deal.
- Most of the top 10 cause so much damage due to their frequency, but ACL and Achilles injuries appear high up due to a mix of their frequency and severity (about 10.5 weeks out for an ACL, about 7.0 for an Achilles).
We all know ACL tears are season-enders, which brings up an important point: "severity" here is not necessarily a return time estimate. ACL tears occurring in Week 16 are coded as two weeks missed in this analysis, while an ACL tear in training camp is 16 weeks. It might be interesting to re-do this chart with severity as the percent of time left in the season the player missed, but we'll save that for another time. Figure 2 still provides a good comparison of the average seriousness of injuries, since (with some exceptions) injuries don't grow more or less common relative to each other as the season progresses.
All this is a nice segue into…
Individual-level damage (Severity):
If I read about one of my team's key players on the injury report, what words should make my heart sink? Let's consult Figure 2, shall we?
- There are some no-brainers here: ACLs, spine injuries, and "the dreaded" (I always see it prefixed with these two words) Lisfranc injury in the foot. All result in an average of nine to 11 weeks out.
- Other things with the word "tear" in them generally aren't good: knee (non-ACL) and shoulder tears (labrum, rotator cuff) mean on average six to nine weeks missed (again, not a return timetable).
- Speaking of tears: "Pectoral," "triceps," and "biceps" are very bad words to see on an injury report. They often, but not always, refer to a tear in one of these muscles, which at least in the case of triceps and biceps often scratch a player for the rest of the year. Lucky these injuries aren't more frequent: about 100 of each of these injuries since 2000.
- Even less frequent but hugely concerning are heart problems -- clots or arrhythmias, most frequently. If you pop up with one of those you're probably toast, but fortunately we have only logged 10 such injury-report appearances in the last 15 years.
- Anybody who has broken a bone can tell you that fractures are not good: foot, leg, or arm fractures in particular are going to sideline you for a while, leading to an average of six to eight weeks missed.
- Achilles injuries are typically nasty business, too, as noted above.
- On the slightly-less-severe end, high ankle sprains lead to an average of 3.5 weeks missed. That's much worse than regular old ankle sprains, which clock in at about 1 week missed on average.
One thing that seems strange to me is what on earth "Unknown" is doing up so high here. I wish I had more for you, but as the category name implies, we simply don't know much about these injuries. My suspicion is that most of these represent lingering offseason or prior-year problems that stuck with players all year and often caused them to miss significant time or even get placed on IR during training camp or preseason for "undisclosed" reasons. Fortunately, of the thousands of injuries in our database we only have 77 unknowns, a testament to the great work of the interns and data managers over the last 15 seasons.
Next Steps and Comments
We have looked at some very basic data about NFL injuries. In epidemiology, after we take an overall look at a topic, we like to describe how the data varies in different dimensions. One classic framework is to inspect variation by person, place, and time. That seems good, so let's roll with it. In future articles, we'll take a look at how injuries vary by time, and then by person (position and age). Place is already partially covered by the AGL columns, so we'll leave that be for now.
There's a ton of data here, and there are a zillion ways to cut and inspect it. In the subsequent articles to be run the next few Fridays, we’ll be publishing information split up by calendar year, age, position, and week of season. In the comments below, though, I'd love to hear the readers' thoughts on the next direction.
Zach Binney is a freelance injury analyst and a Ph.D. student in epidemiology focusing on predictive modeling. He consults for an NFL team and loves Minor League Baseball. He lives in Atlanta.