Writers of Pro Football Prospectus 2008

14 Oct 2005

Football's Hilbert Problems

By now you know the name Ben Alamar: he helped with some of the statistical formulas in PFP 2005 and is involved in PROTRADE. He's also the editor of a new scholarly journal from Berkeley Electronic Press, the Journal of Quantatative Analysis in Sports. I'm on the editorial board along with some other names you might recognize, and I have a piece in the first edition called Football's Hilbert Problems. It's a quick look at some of the many issues that the nascent science of football research eventually needs to study, inspired by a similar baseball-related article that Keith Woolner wrote in Baseball Prospectus 2000. Also in the first edition: a review of K.C. Joyner's book by San Francisco 49ers Director of Football Operations Paraag Marathe, and an article on ranking college football teams.

Posted by: Aaron Schatz on 14 Oct 2005

15 comments, Last at 16 Oct 2005, 2:48pm by Zac


by geoff (not verified) :: Fri, 10/14/2005 - 3:57pm

I guess I wasn't reading too closely, I thought this said "Football's Hibbert Problems" and was going to be about team doctors that chuckle at inappropriate times.

by Parker (not verified) :: Fri, 10/14/2005 - 4:11pm

I thought it said Dilbert Problems and would be about how NFL players often read the comic with a look of puzzlement on their faces and remark, "I don't get it."

by Aaron (not verified) :: Fri, 10/14/2005 - 4:15pm

Actually, the original version of this had a Julius Hibbert joke, but it got cut. Not the thing for scholarly journals, I guess.

by Trogdor (not verified) :: Fri, 10/14/2005 - 4:25pm

How many of these questions do you expect will begun to be answered by the game charting project?

by Scott de B. (not verified) :: Fri, 10/14/2005 - 4:28pm

There's a problem with the link -- it says the file is a PDF, but it doesn't download as a PDF.

by Pat (not verified) :: Fri, 10/14/2005 - 5:14pm


Just a quick comment: the scorer in Buffalo also breaks down catches into point of catch, and yards after the catch. And location of incomplete pass.

In my mind, the Buffalo play-by-plays are probably the best ones out there. I've been toying with actually trying to write a program to parse those and then try to calculate a few statistics (like average catch percentage as a function of distance of throw).

I only mention this because, well, the Superdome play by plays are unlikely to be available.

by Dave (not verified) :: Fri, 10/14/2005 - 5:21pm

It's inevitable, isn't it, that between the Internet, cable channels and the growth of on-demand, viewer-controlled programming, that viewers will have the option to switch views of football games to game film cams rather than just the big dumb network camera, and see and freeze replays rather than be subjected to the whims of the big dumb network producer. Once it's out there like that to a smart mob of loyal FO readers with indulgent spouses, it makes possible some of the more herculean data-logging tasks.

This concludes your day's technotopian interlude.

by Vern (not verified) :: Fri, 10/14/2005 - 5:36pm

Another big challenge is attempt to measure player performance against not just the results, but the responsibilities of the given play call or system.

Examples are many, how often does a linebacker fill the "right" gap, even if the run went the other way (maybe the run didn't go to that gap because he was there?) A defender in a passing lane could cause an INT. A reciver making the wrong hot read causing an INT...

Seems to me this is not just a problem of tracking, but being able to guess or name the play that was called. In the missed hot read example, can you really be sure who was wrong if you're not the Offensvie Coordinator?

by Jerry P. (not verified) :: Fri, 10/14/2005 - 6:10pm

"Just a quick comment: the scorer in Buffalo also breaks down catches into point of catch, and yards after the catch. And location of incomplete pass."

I thought that was something new the league was doing for all games. Didn't realize it was just in the Buffalo play by play because I had only viewed the Buffalo play by play.

If you're trying to make a program I'd suggest becoming familiar with curl and nawk. The curl command will pull all source from the page then you can use nawk to parse it. Take the parsed output then stuff it into a database then call the database from a PHP web page or something.

I advocate the same approach for FO as well. You should never have to sift through play by play using pivot tables or whatever to get just the plays to fit your criteria. Just set the criteria, let the DB find all the plays, then write a function that calculates DVOA from the query results.

by zlionsfan (not verified) :: Fri, 10/14/2005 - 6:23pm

I don't agree that the relationship between college and pro football is significantly different in terms of the use of stats for predictive purposes than the relationship between the minors and MLB.

Minor-league teams generally keep less-extensive stats than MLB teams and play shorter schedules (and shorter games as well at times - seven-inning games during doubleheaders). There is a wide range of talent in AAA (MLB players on rehab assignments, top draft picks groomed for the majors, career minor leaguers). And with all that, there seems to be a clear relationship between minor-league stats and MLB ability.

It certainly won't be easy, but to borrow from a point Aaron made, twenty years ago, if you wanted to watch college football on Saturday, you got some games from the BCS conferences, and that was about it. How many games are televised now? In BCS conferences (at least during conference play), having your team not televised is the exception, rather than the rule, and non-BCS conferences are steadily getting more air time. More televised games = more opportunities to track plays and more detailed statistics. It may require a charting project as well, but it is certainly a possibility.

by fyo (not verified) :: Fri, 10/14/2005 - 6:47pm

#5: The link works, it's just that the file comes with a .cgi extension for some lame reason. (Lame reason probably being that it was "meant" to be opened within the browser). Fortunately, the mime-type is correct and that's what you browser should be using to decide what to do with the content.

I renamed it Football\'s\ Hilbert\ Problem.pdf and it works just fine.

by Backup Copy Editor (not verified) :: Fri, 10/14/2005 - 7:06pm

Both the article and the journal are well conceived. Somebody needed to write a Hilbert article for football eventually, and I'm glad Aaron was first in line to tackle this thorny topic. Nice job.

My copy-editing eye caught a few inconsequentially small errors in the article:

* page 3, second paragraph from bottom: "who blocked who" should be "who blocked whom"

* pg. 4, last par.: "see (Schatz 2004))" => "see (Schatz 2004).)"

* Also, there are several instances in the article of a double hyphen ("--") or an en dash ("–") where an em dash ("—") is called for; these should be replaced with em dashes for the sake of stylistic integrity.

by NF (not verified) :: Fri, 10/14/2005 - 10:12pm

Nice essay.

by Joon (not verified) :: Sat, 10/15/2005 - 8:25am

just a correction--the baseball-stats keeping organization Retrosheet is retrosheet.org, not retrosheet.com. retrosheet.com is some kind of other website which i've accidentally visited on more than one occasion while attempting to find retrosheet.org. despite having been there a few times, i still have no idea what that site is about.

by Zac (not verified) :: Sun, 10/16/2005 - 2:48pm

Retrosheet.com is just a copycat url, like whitehouse.com (which at one time was a porn site) or goooogle.com . They put it up and hope to get some accidental traffic (to generate ad revenue).