Stat Analysis
Advanced analytics on player and team performance

Introducing Lewin Career Forecast v2.0

by Aaron Schatz

Five years ago, Football Outsiders unveiled our first college quarterback projection system. It came to be known as the Lewin Career Forecast, since it was created by a college kid named David Lewin who now works for the Cleveland Cavaliers. The elements were simple: The LCF did a surprisingly good job of projecting the success of first- and second-round quarterbacks using just college games started and college completion percentage. It was so popular that references to the Lewin Career Forecast started showing up all over the media, sometimes even "referencing" entire paragraphs of my writing.

There's only one problem: In the last couple years, the LCF hasn't done so well. The formula predicted success for a number of flops including Kellen Clemens, Brady Quinn, Brian Brohm, and Matt Leinart. I detailed these issues in an ESPN Insider piece last week, but let me summarize here for those of you who don't get ESPN Insider. From 1997 through 2005, there were 11 quarterbacks who:

  • were chosen in the first two rounds
  • had at least 33 games started in college
  • completed at least 58 percent of passes in college.

Out of these 11 quarterbacks, the worst was Byron Leftwich, who was good enough to lead a 12-4 team to the playoffs in 2005. However, the same baselines between 2006 and 2009 produce this list of quarterbacks: Matt Leinart, Brady Quinn, Kevin Kolb, John Beck, Brian Brohm, Chad Henne, Josh Freeman, and Pat White. OK, maybe we don't consider White as a player who was drafted as a "conventional quarterback," but still, that list has four flops, one success (Freeman), and two guys who we're not sure about yet (Kolb and Henne). It's a huge change from 1998-2005.

With these problems in the last couple years, there have generally been two criticisms of LCF. The first is that completion rates don't clearly indicate NFL-level accuracy anymore because of the rise of the college spread option. However, this really isn't as big an issue as some readers seem to believe. Despite a slight rise in completion rates across college football due to the spread offense, the real issue is number of games started. Before 2005, games started were a strong clue as to whether scouts got it right or wrong on the top prospects. Since 2005, many quarterbacks with plenty of experience washed out while similarly accurate, but much less experienced quarterbacks like Aaron Rodgers and Joe Flacco have become successful NFL starters.

The phrase "before 2005" gets to the second criticism, which is that LCF is more descriptive than it is predictive. It describes the quarterbacks from the years that David Lewin used in his original data set, but a high number of games started only correlates to NFL success for the quarterbacks specifically in that data set. That data set has a small sample size and is "cherry-picked" by only using a small subset of years. That's not necessarily true, however. Two points:

1) When we first ran LCF in Pro Football Prospectus 2006, not every quarterback drafted between 1997 and 2005 was part of the research. Philip Rivers is perhaps the best example of a quarterback who gets a high projection because of collegiate games started; he had 49 starts at North Carolina State. But he wasn't part of the data set used to create LCF, because as of PFP 2006 Rivers had only 30 NFL pass attempts and zero games started. Based on the performance of other quarterbacks, LCF projected that Rivers would be an MVP-level superstar, and he has been.

2) Games started may not seem like an important variable if we go forward from the introduction of the LCF, but it is definitely important if we go backwards. From 1990 through 1997, games started are a hugely predictive variable for first- and second-round quarterbacks. Only two of the top quarterbacks drafted during this period were four-year starters: Steve McNair and Brett Favre. Those are also the most successful quarterbacks drafted during that eight-year period. There were also two quarterbacks drafted with only one year of starting experience: Dan McGwire and Matt Blundin. Unless you read my ESPN Insider piece last week, I'm guessing you have never even heard of Matt Blundin, and McGwire is a well-known flop. The further we go back, the harder it is to get exact college stats, and sometimes we have to guess whether a player started all the games he played in, but it looks like these quarterbacks also started fewer than 24 games in college: Browning Nagle, Todd Marinovich, Dave Brown, David Klingler, Tommy Maddox, Heath Shuler, and Tony Banks. Again, not a Hall of Fame list.

Therefore, we need to accept that any quarterback projection system that is based on past performance is going to value collegiate games started. For more than 15 years, it was far and away the most important variable in determining the success of highly-drafted quarterbacks. However, analysis of quarterbacks drafted between 1998 and 2008 showed that we could add some more variables to the Lewin Career Forecast to make it more accurate. Thus, I present to you Lewin Career Forecast v2.0.

I put together LCF v2.0 with a regression that attempted to forecast total DYAR for these quarterbacks in years 3-5 of their NFL careers. In order to include a larger data set, I did look at 2007 draftees (DYAR in years 2-4), and 2008 draftees (DYAR in years 2-3, multiplied by 150 percent). In his first LCF, David Lewin included only quarterbacks drafted in the first two rounds; for this new version, I included quarterbacks chosen in round three as well. In addition, many of the variables have upper or lower boundaries in order to try to limit the importance of extremes like Colt McCoy's 53 games started or Cam Newton and Tim Tebow's rushing statistics.

The new LCF has seven factors.

  • Career college games started. This is still the most important variable in the equation. Uses a minimum of 20, a maximum of 48.
  • Career completion rate; however, this is now a logrithmic variable. As a quarterback's completion percentage goes down, the penalty for low completion percentage gets gradually larger. As a result, the bonus for exceedingly accurate quarterbacks such as Tim Couch and Brian Brohm is smaller than the penalty for inaccurate quarterbacks such as Kyle Boller and Tarvaris Jackson.
  • Difference between the quarterback's BMI and 28.0. This creates a small penalty for quarterbacks who don't exactly conform to the "ideal quarterback size." This year, that would include both Colin Kaepernick (BMI: 26.8) and Cam Newton (BMI: 29.4).
  • Run-pass ratio in the quarterback's final college season, with a maximum of 0.5.
  • Total rushing yards in the quarterback's final college season, with a minimum of 0 and a maximum of 600.

These two variables work together. Remember, there are two ways to have a high run-pass ratio in college football. Either you are a quarterback who relies a lot on his legs, or you are a quarterback who takes a lot of sacks, because sacks count as runs in college football. So with these two variables, both of those types of quarterbacks end up penalized, while pocket quarterbacks who are successful when they do run (and therefore have positive rushing yards) get a bonus. A good example here is Andrew Luck. Last year, Luck had a very low run-pass ratio of 0.15 -- among this year's top prospects, only Ryan Mallett had a lower ratio -- but when he did run, he gained an excellent 8.2 yards per carry.

  • For quarterbacks who come out as seniors, the difference in NCAA passer rating between their junior and senior seasons.

This variable was a bit of a breakthrough when it came to explaining many of the failures of LCF v1.0. Quarterbacks who struggle as seniors often see their draft stock fall, but apparently not far enough. Obviously passer rating has its issues, but it was a good proxy for figuring out when a quarterback saw his improvement stagnate. There are nine quarterbacks in our data set whose NCAA passer rating fell by more than 10 points in their senior seasons: Rex Grossman (an astonishing 49.3-point collapse), Brodie Croyle, Drew Stanton, Quincy Carter, Trent Edwards, Chad Henne, Brady Quinn, Marques Tuiasosopo, and Patrick Ramsey. Brian Brohm's passer rating fell by 7.3 points. The quarterbacks with the largest senior-year improvements were Jason Campbell, John Beck, Kevin Kolb, Philip Rivers, Chad Pennington, Carson Palmer, and Eli Manning. Obviously this variable isn't foolproof -- besides Beck, guys like Joey Harrington and Kellen Clemens also had significant senior-year improvements, while Jay Cutler and Matt Schaub saw their passer ratings drop slightly as seniors. Still, this variable did a lot to improve results.

What does it mean? This variable could show that quarterbacks who don't keep improving as seniors aren't going to improve as professionals either. Or perhaps, it shows that certain players have flaws in their games that opponents figured out in their senior years.

For quarterbacks who come out as juniors or redshirt sophomores, this variable is always 5.0, which is the average increase for the seniors in our data set.

  • Finally, a binary variable that penalizes quarterbacks who don't play for a team in a BCS-qualifying conference. We counted Notre Dame here as a BCS school, even though that actually lowered the accuracy of the projections. However, this variable only qualifies for Division I-A quarterbacks, not Division I-AA quarterbacks. Perhaps this means that scouts do a better job of identifying the few Division I-AA quarterbacks who can translate their games to the NFL. (The data set has only three of these players: Josh McCown, Tarvaris Jackson, and Joe Flacco.)

How does this new, more complex version of LCF change our projections? To figure that out, I also created a formula that used the same data set (including the third-round picks) with the same dependent variable, but only used the same two factors as the original LCF: just games started and completion percentage. The old LCF had a R-square of .24. The new LCF has an R-square of .58. Here's a list of the best and worst projections from 1998 through 2008 using both the first LCF and the newer version. (Since the newer version is more accurate and has more variables, it's also going to give you higher highs and lower lows, which is why the best and worst projections are more extreme with LCF version 2.0.)

LCF v1.0 Top 10   LCF v2.0 Top 10   LCF v1.0 Bottom 10   LCF v2.0 Bottom 10
Chad Pennington 1778 x Philip Rivers 2476 x Marques Tuiasosopo -506 x Alex Smith -782
Philip Rivers 1671 x Drew Brees 2190 x Michael Vick -473 x Brodie Croyle -736
Kevin Kolb 1626 x Carson Palmer 1973 x Akili Smith -413 x Marques Tuiasosopo -621
Charlie Frye 1615 x Peyton Manning 1784 x Ryan Leaf -326 x Trent Edwards -611
Daunte Culpepper 1396 x Chad Pennington 1678 x Tarvaris Jackson -195 x Ryan Leaf -407
Peyton Manning 1379 x Brady Quinn 1518 x Joey Harrington -14 x Quincy Carter -336
Chad Henne 1349 x Jason Campbell 1506 x Shaun King 54 x Josh McCown -311
Brady Quinn 1348 x Jay Cutler 1444 x J.P. Losman 64 x David Carr -299
Carson Palmer 1198 x Chad Henne 1411 x Brodie Croyle 97 x Patrick Ramsey -223
Donovan McNabb 1163 x Matt Ryan 1403 x Quincy Carter 122 x J.P. Losman/Tim Couch -195

Here's a look at which quarterbacks improved the most from version 1.0 to version 2.0, and which quarterbacks declined the most. The new formula does a good job of improving the projections for a lot of quarterbacks who became stars, although it now misses even more egregiously on Kellen Clemens and Brian Brohm. The list of the quarterbacks who declined the most seems like a good list of players who were overrated coming out of school, with the exception of Daunte Culpepper and Donovan McNabb. Those guys both appear on the "biggest decline" list because of the new BMI variable, as they are two of the three quarterbacks in the data set with BMI over 30. (The other is JaMarcus Russell.)

Biggest Increase for Projection in LCF v2.0   Biggest Decrease for Projection in LCF v2.0
Player LCF v1.0 LCF v2.0   Player LCF v1.0 LCF v2.0
Drew Brees 835 2190 x Charlie Frye 1615 117
Matt Ryan 473 1403 x Alex Smith 221 -782
Philip Rivers 1671 2476 x Trent Edwards 223 -611
Carson Palmer 1198 1973 x Brodie Croyle 97 -736
Kellen Clemens 532 1248 x Daunte Culpepper 1396 663
Vince Young 576 1059 x Donovan McNabb 1163 472
Eli Manning 818 1292 x Patrick Ramsey 420 -223
Brian Brohm 846 1290 x Tim Couch 445 -195
Joe Flacco 305 732 x Rex Grossman 472 -124
Peyton Manning 1379 1784 x David Carr 275 -299

Now, let's look at the projections for quarterbacks outside of our data set. First, we'll look at the projections for the quarterbacks chosen in rounds 1-3 of the past two drafts. The number listed is projected total DYAR for career years 3-5.

  • Colt McCoy: 2,092
  • Josh Freeman: 1,367
  • Sam Bradford: 1,345
  • Jimmy Clausen: 1,062
  • Tim Tebow: 925
  • Matthew Stafford: 714
  • Mark Sanchez: 151

As you might expect, LCF v2.0 loves Colt McCoy. So did LCF v1.0 -- although McCoy wouldn't have been considered by LCF v1.0 because he was a third-round pick. McCoy had 53 college games started with a career completion rate above 70 percent. The new boundaries added to try to limit the importance of outlier variables do tampen down the McCoy excitement slightly. (Not to mention that without those boundaries, Sanchez's projection would actually be negative.) Still, McCoy has the third-highest projection of any quarterback since 1997. Philip Rivers and Drew Brees are the only other quarterbacks projected above 2,000.

Five of these seven quarterbacks have significantly higher projections using the new version of LCF. Only Tebow and McCoy are lower with LCF v2.0, and the difference with Tebow is pretty small.

It's important to understand that LCF is meant to be a tool used alongside the scouting reports, not instead of the scouting reports. Sam Bradford was still the proper number one overall selection in the 2010 draft. What's important is not that his projection is lower than Colt McCoy's projection -- instead, what's important is that he has a very good projection, which should give the Rams confidence that their scouts got it right. We don't claim to believe that the Lewin Career Forecast is a foolproof way of figuring out which quarterback an NFL team should draft. This is an interesting regression analysis, not Moses bringing the tablets down from Sinai. Still, we think that LCF v2.0 is valuable as a crosscheck device and should be part of the conversation about quarterback draft prospects.

With that in mind, let's look at the projections for this year's quarterbacks.

Andy Dalton, TCU: 1,616 DYAR

Important stats: 48 games started, 61.7% competion rate, senior passer rating improved 14.7 points.

Dalton is LCF's favorite prospect for 2011. He's also a great example of where LCF might go wrong. Our own Doug Farrar did a good job of running down Dalton's problems in this post on Yahoo's Shutdown Corner blog. Dalton played in a college spread offense where routes were generally designed to clear out specific spots in the defense. Plays didn't include a lot of receiver progressions. He has problems with arm strength, particularly on those intermediate-length throws that an NFL quarterback has to stick into very tight windows. Still, both his pros and his cons sound a lot like the pros and cons of last year's LCF favorite, Colt McCoy -- and McCoy had a more successful rookie year in the NFL than anyone expected. Dalton is a good example of how the LCF doesn't tell you that a quarterback is definitely going to be a star. It tells you "if your scouts determine that Andy Dalton fits your offensive scheme despite his weaknesses, he is very unlikely to be a complete bust."

Ricky Stanzi, Iowa: 1,305 DYAR

Important stats: 35 games started, 59.8% completion rate, senior passer rating improved 26.0 points, 48 carries for -6 yards.

Stanzi gets an asterisk. I don't think he's going in the first three rounds. He's another guy scouts have to do their due diligence on. Still, he did improve a lot as a senior and could be a nice fourth- or fifth-round sleeper. Rushing numbers suggest he may take too many sacks.

Colin Kaepernick, Nevada: 1,044 DYAR

Important stats: 48 games started, 58.2% completion rate, .482 run-pass ratio, 26.8 BMI.

Kaepernick of course played in a somewhat "gimmicky" offense in college, and a lot of his value was based on his running ability. He didn't have the greatest completion rate across his entire college career, although he's been a four-year starter so there's a lot of film to break down here. He had a moderate improvement as a senior, 11.3 points of passer rating. I don't have much of an opinion on him past these numbers.

For those curious, 6-foot-4, 200 pounds is the same size as Eli Manning, Joey Harrington, Chris Simms, and Tim Couch.

Blaine Gabbert, Missouri: 656 DYAR

Important stats: 26 games started, 60.9% completion rate.

Here is where maybe you get the sense that this isn't the best year for low-risk quarterback prospects. From Gabbert on down, every quarterback prospect for 2011 is lower than every quarterback prospect from 2009-2010 except for Mark Sanchez. Gabbert is a little low on games started, a little high in completion rate, and basically average on all the other variables in the system, so LCF v2.0 thinks he's going to be a very average quarterback. His projection is close to the average projection for all the players in the data set used to create LCF v2.0, which is 604. An average quarterback can be a very useful thing on the right team, but it is not something you want to get with a top ten pick.

Jake Locker, Washington: 569 DYAR

Important stats: 40 games started, 53.9% completion rate, senior passer rating dropped 5.5 points.

I just don't think Jake Locker is ever going to be accurate enough to be an above-average NFL quarterback.

Ryan Mallett, Arkansas: 471 DYAR

Important stats: 29 games started, 57.8% completion rate, 26.8 BMI, 44 carries for -74 yards.

Perhaps you have heard that Ryan Mallett has some mobility issues? In three years of college football, he has a total of 135 carries for -141 yards. There are a lot of sacks in there. Maybe you don't think the rushing yardage thing is a big deal, but here's the list of players in our data set with fewer than -50 rushing yards in their final college season: Brodie Croyle, Tim Couch, Chris Simms, Carson Palmer, Patrick Ramsey, Andrew Walter, Kyle Boller, and Rex Grossman. I count one successful quarterback out of eight. Mallett's downside is Dan McGwire. His upside is "What if Drew Bledsoe was kind of a dick."

Christian Ponder, Florida State: 413 DYAR

Important stats: 33 games started, 61.8% completion rate, senior passer rating dropped 12.0 points.

Maybe somebody reaches for him because so many teams have quarterback needs this year, but Ponder just seems to me like a classic third-round pick. How high is his ceiling, really? Isn't he basically just Drew Stanton? I would be scared of how his improvement stagnated in his senior year.

Cam Newton, Auburn: 175 DYAR

Important stats: 14 games started, 65.4% completion rate, 29.4 BMI, 0.94 run-pass ratio.

I thought Tim Tebow was the most unique prospect in recent times, but Cam Newton may have surpassed him. You get most of the same questions, but you take out the questions about throwing motion and replace them with questions about character and inexperience. Nobody doubts that Newton is an amazing athlete who was a supremely valuable college football player. In the NFL, he is a massive risk-reward candidate. I just happen to think that the risk is larger than the reward. I would not take him with the first overall pick in the draft unless a) there was absolutely no other player worth that top pick, and b) I knew for certain that the post-lockout CBA would include a rookie salary slotting system that would go into effect immediately.

Let's throw in one more guy, because I know some people will be curious.

Andrew Luck, Stanford: 1,604 DYAR

Important stats: 25 games started, 64.4% completion rate, 453 rushing yards with only 0.15 run-pass ratio.

This would be Andrew Luck's projection if he had come out after his sophomore year. If he puts up the same stats as a junior, he'll come out with the second-highest projection of any quarterback since 1997, behind only Philip Rivers.

Comments

123 comments, Last at 11 Apr 2012, 1:10am

98 Re: Introducing Lewin Career Forecast v2.0

Why would you hate to be the Panthers? There are several players in this draft worth the #1 overall pick (Peterson, Dareus, possibly Von Miller). True, they're probably going to blow it and grab a QB, but if you were the Panthers you could just choose to make a much wiser selection.

2 Re: Introducing Lewin Career Forecast v2.0

who did BMI excessively flag outside of Russell? My first intuition is that his egregious failure was such an outlier that it could break the entire system due to not having a large enough sample to work with.

3 Re: Introducing Lewin Career Forecast v2.0

How can I tell this system was created through over-use of regression and under-use of statistical understanding?

Reason one: because it attempts to predict games started. What?

Seriously. WHAT?

Games started is affected by quarterback ability, other quarterbacks available to the team, quarterback health, quarterback compatibility with the team's system/personnel, and other external variables. The only one of these that could possibly be predicted by college information is quarterback ability. So yes, games started will correlate with college stats to some extent, but to attempt to predict it off them is a clear sign that when you have a linear-regression hammer, everything looks like a linearly-related nail.

Reason two: because it's full of cut-offs. If the data isn't linear, it isn't linear -- stop trying to make it linear by chopping out chunks of it. You can get away with this as an approximation technique, but the number of times it's done in Lewin 2.0 seems dangerously high.

Reason three: 20, 48, 28.0, 0.5, 0, 600. Those are the constants mentioned in LCF2. Where did these numbers come from? Careful study of the problem indicating that these were the best choices? Or data-fiddling and over-use of regression? As examples, the 600 looks like a "woah, people with more than this many yards are weird outliers. Chop those bastards out, and watch R-squared go up!" and 28.0 looks like "regress DYAR against BMI. Round off. Hey, jackpot!" Unless there's some reason heavy quarterbacks shouldn't be as good as one would expect from past success, this is purposeful over-fitting.

Prediction: LCF2.0 fails *even harder* than LCF1.0 because it is even more over-fitted.

26 Re: Introducing Lewin Career Forecast v2.0

600 kind of makes sense.

Since 1990, 8 times has a QB in the NFL rushed for 600 or more yards, and 4 of those times was Michael Vick. (Cunningham, McNabb, Culpepper, and McNair each did it once)

Indeed, you could probably cap it at 540 if the QB is white. (Gannon and Young each broke 500 once)

http://www.pro-football-reference.com/play-index/psl_finder.cgi?request=1&match=single&year_min=1990&year_max=2010&season_start=1&season_end=-1&age_min=0&age_max=99&draft_round_min=0&draft_round_max=99&league_id=&team_id=&is_active=&is_hof=&pos_is_qb=Y&c1stat=rush_yds&c1comp=gt&c1val=500&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds

83 Re: Introducing Lewin Career Forecast v2.0

How can I tell this post was crafted through pre-existign bias against the subject and without any actual comprehension of same? Because he thinks that the Games started stats under the prospects (and presumably all those other stats there) are projections instead of, you know, their college data.

And with that "peculiar" reading I'd have far more problem with a projection for BMI...

Seriously, WTF?!?!

- Alvaro

4 Re: Introducing Lewin Career Forecast v2.0

AnonymousA

Games started is a proxy, not a variable. I mean that in a kind way, as a positive for the projection. If a guy can't convince Nick Saban, Pete Carroll, or Steve Spurrier that he's the best quarterback on a college team, on what planet is he going to be successful in the NFL? Games started shows the ability of the player to convince smart football people that he can play football, which is important predictively because (1) It shows that he has the tools college scouts and coaches look for, which are different but highly correlated to NFL skills; and (2) The player will have to do EXACTLY THE SAME THING to win an NFL starting job and accumulate DYAR.

I'm hopeful, but unfortunately this business is a crapshoot; a lot depends on the willingness of a player to work hard after they've earned $25 Million guaranteed for how well they played in college.

33 Re: Introducing Lewin Career Forecast v2.0

Two more words.

Seventh Round

A team took a flyer on him. Teams do that with 7th round picks or undrafted free agents. Graham Harrell wasn't drafted, but he's on an NFL roster and you know it's possible he could end up being a serviceable NFL starter. Of course there are tons of guys taken that late that don't pan out. How did the careers of JaJaun Seider, Tim Rattay, Joe Hamilton, Josh Heupl, Seth Burford, and Ken Dorsey turn out?

Sometimes you get Matt Cassel or Bart Starr, he was taken 200th overall in the 17th round (less teams back then) which compares well to the 230th pick used on Cassel. Most of the time you get Joe Hamilton or Tim Rattay.

82 Re: Introducing Lewin Career Forecast v2.0

Yeah, I get that.

My post was a reaction to the sentence: "If a guy can't convince Nick Saban, Pete Carroll, or Steve Spurrier that he's the best quarterback on a college team, on what planet is he going to be successful in the NFL?".

I agree that college starts are probably a useful indicator of NFL ability. But I think it's silly to imply that a non-starter in college could never have a successful NFL career when Matt Cassel presents such an obvious and current contradiction.

84 Re: Introducing Lewin Career Forecast v2.0

Which would be a wonderful rebuttal if he were complaining about Games Started as a variable. He actually thinks Games Started is an output!

Apparently AnonymousA is so hell-bent on leveling criticism and so self-assured that he can't be bothered to actually read past the GS stat, or he's so bhlinded he thinks they're also proyecting the BMI of several prospects, their run-pass ratio and, most impressive of all, Ricky Stanzi to have 48 carries for -6 yards, which would be quite the astonishing stat-line in the NFL, where sacks count as passing yards, not rushing...

- Alvaro

5 Re: Introducing Lewin Career Forecast v2.0

I would really like to see some out-of-sample validation runs from this regression. It certainly has the appearance of a potentially over-parameterized regression.

89 Re: Introducing Lewin Career Forecast v2.0

So he made a bad point in amongst an overall good argument. I think his other points stand; namely, that V2.0 has been arrived at via regression analysis rather than truly predictive statistics. I am not as convinced as he is that that makes it likely to fail, but I agree this is an experiment to see whether regression-based formulas like this turn out predictive or not.

I'd say the jury is still out on DVOA (though it's easily the best descriptive statistic out there), and I still think it was a grave mistake to switch from DPAR to DYAR.

6 Re: Introducing Lewin Career Forecast v2.0

I like trying to create a metric for ability to avoid sacks. This seems like an underrated ability even in the nfl today.

I also find interesting the idea of penalizing players who regress statistically as seniors. I wonder if it'd be possible to temper this with info on teammates lost between those two seasons though, such as losing a future nfl receiver or left tackle and having them replaced by a much worse player.

the BMI thing strikes me as purely descriptive.

Is the data set too small to adjust the games started variable based on the reasons why a player didn't start? ie injury. plus considering whether a player was kept from starting because of another star a year or two ahead of him? I almost feel like it's more interesting to consider how many games a player didn't start and why vs how many games they did start. of course, this seems like it probably runs into sample size issues.

27 Re: Introducing Lewin Career Forecast v2.0

I think the BMI metric is crap.

Starting NFL QBs have been right around a mean of 28 since 2000 with a standard deviation of about 1.4, (Meaning both Newton and Kaepernick are within a SD of the mean) but that's only true back to 2000. For a data set which goes back to 1990, though, those baselines are no longer true. Starters in 1990 averaged a BMI of 26.7 with a SD of 1.2 -- meaning a baseline QB (BMI of 28) would have been considered obese within the timeline of the background data. (from 1990-2000, QBs went from 26.7 to 28, and have been flat at 28 since)

38 Re: Introducing Lewin Career Forecast v2.0

BMI in general is crap. It can't tell if you're totally ripped or have a huge beer belly, it thinks Michael Jordan was overweight and that Andre Agassi in his prime was within 10 pounds of being so. The only reason it lives on is it provides a measurement device to people who really want to have one--even one that's horribly inaccurate.

8 Re: Introducing Lewin Career Forecast v2.0

Let me point out that a lot of the complaints about the statistical methods here are accurate, but there's really not much we can do about them. We're stuck with a small sample size. When you are stuck with a small sample size, you have to accept imperfection. If we tried to use out-of-sample runs to validate the formula, there wouldn't be enough runs left to create the formula. If we didn't try to correct for outliers, we would end up with some nutty-looking results. So we do what we can, and we say things like "this is not a guarantee" and "we are not perfect."

51 Re: Introducing Lewin Career Forecast v2.0

Classic statistics headslap. When you're stuck with a sample size that's too small, you stop what you're doing. There is no correction for a fundamental flaw.

If you wanted to know whether a particular variable predicted success in the NFL, you could test it. If you go on a data mining expedition to find what multiple variables you can feed into a model, you're going to come up with spurious correlations, regardless of sample size. And the more variables you add in an attempt to 'improve' your model, the worse it gets. And every revision you attempt is just creating a 'chasing your tail' problem.

If you want to say "We do this for fun - so don't worry about it," then fine. Just don't pretend you're doing something serious here.

54 Re: Introducing Lewin Career Forecast v2.0

The whole article basically said "we sure hope these numbers are good, but we really don't know."

To a certain extent, every revision is a chasing tail problem, but they also have multiple more years of data to work with. Slowly but surely the sample size should be approaching the point where you can take it seriously, at least in part.

Also, they've got seven regression variables. Considering that they're looking at better than 10 QB's per year, and have a sample size of at least a decade, I'm inclined to think that overfitting isn't THAT big of a deal. It probably rears its ugly head in terms of variable selection, and maybe that's material, but I would think that at this point it should at least be slightly useful.

And given that there really isn't any other tool out there which does this type of analysis (or at least I don't know of any), I think it's reasonable to allow them to potentially reach a bit in terms of putting something out there that may not have the data it really ought to have.

102 Re: Introducing Lewin Career Forecast v2.0

I admit it's been over 20 years since I did a mosh pit. But the few I did had anything from 110# girls to guys weighing 250#. You knew the rules when you went in. I can't criticize anyone for stage-diving into a mosh pit, even someone the size of Meat Loaf. And just because you dive, doesn't mean they'll catch you.

103 Re: Introducing Lewin Career Forecast v2.0

I was the little kid in the pit in the 80s and I'm the old man in the pit now. The rules haven't changed any. About the only difference is nowadays you get douchebags who think the pit is somehow a place to impress people with their karate skills. You knock them on their ass a couple times and they get the hint - then they usually get the hell out.

Most places banned stage diving in the 90s just because it was such an insurance nightmare. Hell, in a lot of clubs you can't even crowd surf anymore. Of course, these days, they'd just drop me, but like you said, thems the breaks.

10 Re: Introducing Lewin Career Forecast v2.0

Excellent article. The changes make sense to me, and the paragraph putting it in proper context in regards to scouting reports and game film seems like the type of thing Phil Simms skims over over in these kinds of articles. Also that last line about Ryan Mallett cracked me up. Do you think him standing up the Panthers today because he was "out late on the town" is a sign of his immaturity or a calculated move showing he's willing to have his draft stock take a hit to avoid going to Carolina?

12 BMI... causation or correlation?

How do we know whether a QB sucks because a high BMI, or whether he has a high BMI because he sucks?

14 A couple ideas

not sure if they'd be useful, but maybe they would:

1) Interceptions per attempt. I would think that this would tell you something, maybe even more than pure passer rating, since picks are death to a QB's future.

2) Difference between own team's rating (using F/+) and average opponent's rating. If this is really skewed, then it's probably a good indication the QB's stats are massively padded by easy opponents (I suspect that if this is material, it'd hurt Dalton in LCF v3.0). And if it's negative, then it's an indication the QB would have had better stats if he hadn't been on such a crappy team (good example: Jay Cutler at Vandy)

19 Re: A couple ideas

I would like to use some of Bill Connelly and Brian Fremeau's stats, unfortunately we don't have them for before 2005.