Writers of Pro Football Prospectus 2008

10 Jul 2013

SITE NEWS: FO Needs Programming Help

Football Outsiders is looking for a programmer who can write a parsing program that will help us compile Audibles at the Line each week during the season. We're looking for some sort of code that will go through the Gmail thread, pick out everybody's comments, and then paste them all in one big document so that Rivers McCown can then go through and edit. If you think you can program something like this, please e-mail us at info-at-footballoutsiders.com with subject line AUDIBLES PARSER. Thanks.

Posted by: Aaron Schatz on 10 Jul 2013

12 comments, Last at 12 Jul 2013, 11:10am by jbird1785


by Karl Cuba :: Wed, 07/10/2013 - 12:02pm

Why don't you just make all the comments in real time on a Google document? That way it would all be there the while time, no opening and sending emails, no chance of missing what someone said and it would all be there for Rivers to edit.

Feel free to tell me that this is a stupid idea for reasons I haven't considered.

by dcaslin :: Wed, 07/10/2013 - 1:35pm

Programmer here. That's what I would do. Note that you can also set notification rules on the document to be notified when changes are made.


I suspect the downside might come up if people want to just fire off a quick note from their phone though...

by matt h (not verified) :: Wed, 07/10/2013 - 2:08pm

Notifications are only for spreadsheets, not documents. You can put in comments that tag the person you want to notify, which will email them and let you respond to the comments.

by Rivers McCown :: Wed, 07/10/2013 - 2:31pm

Yeah, a lot of guys aren't watching from a computer. Some people are at sports bars, some are at actual games, etc.

I do think the Google Docs idea is worth using. At least as maybe a hybrid solution, but it'd definitely be hard for people who are out and about.

by sundown (not verified) :: Wed, 07/10/2013 - 6:57pm

Couldn't all that be done on a smart phone? And while the game might be a bit more cumbersome, taking a laptop into a spots bar wouldn't be hard to do.

by dcaslin :: Thu, 07/11/2013 - 10:07am

I played around with this a bit last night. Gmail seems to like using the leading >>>>> chars on all previous emails, I had one chain go 24 chars deep. Is the use case basically to get rid of all the junk text while retaining who wrote what and when? (things like printing to a PDF *won't* do that)

Is reverse sorting also a goal, since the Audibles are usually done in chronological order?

by Reinhard (not verified) :: Wed, 07/10/2013 - 12:29pm

I agree with Karl that your best bet is to identify a platform or technique, rather than doing a custom programming/parsing job.

by trueparallels (not verified) :: Wed, 07/10/2013 - 12:38pm

I'm fairly sure SBN and CBS both use Campfire. http://campfirenow.com/

by JimZipCode :: Wed, 07/10/2013 - 5:11pm

At the end of the email thread, forward it to a user account on a *nix server, with a .forward to cat it to a text file. Or! Just cc every email to that same account, with a .forward in place. Then fire up perl! Perl is written just for this kind of stuff.

A real perl wizard could knock this out with great elan. My own clunky code to do something similar started a new email when the last line in the file began with "From " and the current line begins with "Return-Path: ". I'm sure there are more elegant ways to do it.

If you cc a mailbox on each email, then no need to retain any of the included text from prior emails. That might make the whole thing easier: parse the interval between the internet headers and the forward message, and just cat that to a new text file.

Might be even nicer to run a dedicated mail server (which is a smaller deal than it sounds) that can handle each piece of mail as a programming object (not just text), with a separate header and a mail body. Cat sender, date and mail body to the growing file.

My regex-fu is not up to the challenge of doing this elgantly. There is plaintext email and html-formatted email to handle, etc. But this is way doable with standard tools.

by sabw8217 :: Thu, 07/11/2013 - 12:38am

Ummmm....really? Regexes? Running your own mail server? Please tell me you have a beard and use BSD.

Anyhow, Gmail and Google Docs have APIs, all programming languages have libraries to handle email, this is not a big job. I might bang it out for you and throw it up on Heroku just for fun.

But if a PDF works for you, you can follow the instructions on this thread, that I found when I went looking for the Gmail Labs tool that let you do this. Unfortunately it's been discontinued, but there are instructions here for converting any Gmail thread to a PDF in Google Docs.


by JimZipCode :: Thu, 07/11/2013 - 11:43am

Not really a full beard, more of a partial / goatee kind of thing....

by jbird1785 :: Fri, 07/12/2013 - 11:10am

The problem with the email scraping is one change to Gmail and it has to be reprogrammed.

I know people like tools that are more seamless with their current workflow, but I think the best bet might be a group chat like previously mentioned Campfire, HipChat, or Google Hangouts. They should all allow access by browser or phone app and give you history to copy and paste or an API to download the chat history if you still want automatic formatting. I think Campfire doesn't have an Android app though.