In this week's Varsity Numbers, Bill Connelly revisits some measures and concepts: Adjusted Scores, Covariance, and momentum (or whatever you choose to call it).
12 Apr 2012
by Mike Tanier
The Wonderlic is a joke. What the NFL needs is a test designed by someone who understands football, the draft process, and standardized testing. As if someone like that really exists.
Hey, wait a minute!
I hereby publically ask the NFL to allow me to bid on the rights to create a new pre-draft aptitude test. I shall call it The Walkthrough Lick.
My qualifications: a decade of professional NFL draft analysis, 17 years of public education experience, five years of part- and full-time employment in the standardized testing field (including stints as a project manager for test development and scoring), and a profound love of both the Combine and No. 2 pencils. Match that, Charles F. Wonderlic Junior!
Sample Question: In the diagram below, identify the three-technique tackle and the Will linebacker.
a) 2 and 3
b) 2 and 4
c) 3 and 4
d) 3 and 5
e) 3 and 6
|Figure 1: Sample Walkthrough Lick|
The Walkthrough Lick will be designed to assess the draftee’s ability to process football-related information, understand coach’s instructions, and make the types of decisions NFL players must make on and off the field. Notice how explicitly those goals are stated. Good standardized tests set very specific parameters about what they are designed to measure. Bad standardized tests are designed to "measure general mental ability, widely accepted as being one of the single best predictors of job success." General mental ability? What the hell is that? Why, it’s what the Wonderlic measures, according to the Wonderlic website.
Here is more elaboration on the Wonderlic: "It helps measure a candidate’s ability to understand instructions, learn, adapt, solve problems and handle the mental demands of the position." What position? Banker? Cornerback? This is one-size-fits-all testing, and it would not fly in the public education field: the designers would have to develop an elaborate set of clusters, benchmarks, and guidelines, then expend a great deal of energy pretending that their one-size-fits-all test meets all of those clusters.
The NFL deserves better than an off-the-rack test designed for middle managers. More importantly, so do the draftees.
Sample Question: While having dinner at a restaurant that serves alcohol, a patron begins to bother you. After a brief exchange, he makes a scene, claiming that you shoved him. Other patrons notice the incident and begin to crowd around you. What is the best course of action?
a) Offer to buy the man a drink and autograph some memorabilia if he settles down.
b) Loudly proclaim your innocence so the other witnesses can hear your side of the story.
c) Tweet about the incident and call your agent so you can get the truth into the mass media immediately.
d) Excuse yourself as calmly as possible and contact the team’s public relations department.
e) Lock and load.
The Walkthrough Lick will be a 20-question, 30-minute, multiple choice test. At 1.5 minutes per question, it will be much more in line with contemporary standardized tests, like the SAT and high-school graduation requirements, than the Wonderlic, which asks takers to blaze along at 14.4 seconds per question.
(The SAT allows over one minute per question in the math section, much longer per question in sections where the taker must read written passages. Even stringent post-graduate tests allow well over one minute per response. The LSAT, for instance, gives 35 minutes for an average of 25 questions in its logic and reading comprehension sections.)
Tests with extreme speed requirements are meant to assess the capacity to process information and make quick calculations or decisions. Unfortunately, they encourage rushed judgments, which are completely contradictory to what a) modern education prescribes and values and b) most employers want. You may think you want employees who can make decisions in 14.4 seconds, but that skill does not prove useful very often. What you want are employees who can make thorough, reasoned decisions under manageable deadlines.
As for football players, they don’t have 14.4 seconds to do much of anything. The quarterback has about that much time to read the defense before the snap, but "reading" is the crux of our problem.
High-speed tests are often simply reading speed tests: if you can decode the information quickly, you can get the right answer. This becomes a major problem for anyone with a reading-based learning disability. People who aren’t familiar with educational psychology or law may think that "reading-based learning disability" translates as "dumb kid whose mom has a good lawyer" or "dumb kid who needs extra breaks to stay on the football team." While there are always cases of system abuse, reading-based learning disabilities are only diagnosed when the reading impairment can be isolated from, and demonstrated to be distinct from, the individual’s overall cognitive ability. The individual can answer the questions when the test is administered orally, for example.
Good special education programs teach learning-disabled students strategies to help them fully comprehend what they are reading. "Rushing like a madman because you only have 14.4 seconds" is not one of those strategies. Students with 504 Plans or IEPs almost always get a flexibly-timed accommodation on standardized tests, but the tests are also designed to minimize the need for additional time by not being intensely timed in the first place. The only test I have encountered in the last few decades that was timed as rigorously as the Wonderlic is the Jeopardy! contestant test.
The NFL deserves a test that is valid and reflects contemporary testing principles. Draftees, particularly those with learning disabilities, deserve a test that is fair. If you think Michael Oher and Morris Claiborne are the only players this affects, you are not familiar with education at any level, from kindergarten through college. But then, you probably did not think that.
Sample Question: You are the left defensive end, and you have lined up on the outside shoulder of the tight end. At the snap, the tight end blocks the defensive tackle to your right, leaving you unblocked. The running back moves to his right, away from you, and the quarterback extends his arm with the ball and prepares to hand off. Which of the following best represents your assignment on this play?
a) Charge into the backfield and try to blow up the play as soon as the ball is handed off.
b) Pursue at full speed along the line of scrimmage so you can bring the running back down from behind.
c) Slow down, flatten out, and prepare for a bootleg or reverse to your side.
d) Pursue the quarterback in an attempt to sack him after a play-action fake.
e) Race downfield in case you are needed as a last line of defense.
The Wonderlic is little more than a "gotcha," and an excuse for lazy jokes by football writers who don’t know much about football. Do we really know if an 18 is that much better than a 15? If a 23 for a wide receiver is better or worse than a 27 for a quarterback? Do we care? We only know the lowest scores, and we only hear about them through unofficial means. It’s seedy, and it runs counter to the philosophy of assessment, which is supposed to be diagnostic or instructive, not punitive. That’s not Wonderlic’s problem, that’s ours as a sports-entertainment industry, but better test design can help take some of the sting out of leaked scores.
The Walkthrough Lick will be administered using the same strict privacy protocols applied to any high-stakes assessment. But what would happen if someone leaks a score? First of all, anyone who publishes the score should be required to post his SAT score in the same article, but since that is impossible to enforce, we can only restate that the Walkthrough Lick is "designed to assess the draftees ability to process football-related information, understand coach’s instructions, and make the types of decisions NFL players must make on and off the field." It does not test intelligence, or "general mental ability," which is a euphemism for intelligence, which is a loaded term. So the guy with the poor score is not dumb, but he did have trouble with questions about recognizing coverages or reacting properly to sticky situations famous people find themselves in, or handling his money. Instead of attacking his intelligence, anyone who finds the need to write about a leaked Walkthough Lick score will be forced to question a specific, football-related skill set, one not that far removed from his forty time.
Sample Question: A long-time friend asks for $100,000 to invest in his start-up company and says that he can guarantee you a 20 percent annual return on your investment. You should be suspicious of this claim because:
a) Friends become immediately untrustworthy as soon as you become wealthy.
b) Start-up companies can never guarantee a high-percentage return on an investment.
c) Twenty percent is too low a return rate on a $100,000 investment
d) One-hundred thousand dollars is too much money to invest in any one venture.
e) Large-scale investments violate the terms of the NFL collective bargaining agreement.
These sample questions may seem a little easy, but they are just samples. Real Walkthrough Lick questions will be developed by a team of experts in pedagogy, cognitive development, football strategy, and professional ethics. They will be vetted by experienced educators, draft analysts, and test developers. They will then be piloted by a testing group similar to the test’s target audience, and any questions that are too easy, too difficult, or reflect any sort of bias, will be removed or reconfigured. Advanced statistical analysis will assure that each question’s difficulty level is accurately gauged and that each version of the test has a proper array of easy, moderate, and challenging questions.
Translation: I will make interns write the questions for about five bucks each. My wife and I will look them over. Then I will have one of my high school coach buddies give the test to his players. I will keep a "percent correct" matrix for each question on a spreadsheet somewhere, maybe. Rest assured that this is how many, many standardized tests are written, except for those which are cut and pasted out of textbooks. For legal protection, let me state that I have never worked for Wonderlic or its subcontractors, and I apologize to anyone who has ever taken a test I helped construct or score.
Essay Question: You are the offensive coordinator facing an opponent with average players at every position but two. Their right defensive end is an All-Pro pass rusher, and their free safety is an undrafted rookie who, due to injuries, will be starting his first game. Design a play using a base personnel grouping (either 2RB-1TE-2WR or 1RB-2TE-2WR) that you would like to use to generate a big play against this defense. You may diagram the play in the space provided, but you must also explain the features of the play, either as an essay or a series of bullet points.
The essay may seem superfluous. After all, teams will interview the player, and they can send him to the white board or make him explain film if they want. But an essay adds a scoring component. You see, multiple choice tests can be scored by machine, and test purchasers know that does not cost much. But an essay? That must be scored by a "football professional," so I can claim I am hiring ex-college players and coaches for $25 per hour and budget as if only ten tests can be scored per hour. Then, I will grab some teachers who played high school football 25 years ago off Facebook for $12 an hour and "professional development hours" (don’t ask), and when it turns out that they can crank out about 90 tests per hour (read the question again: any max-protect play-action bomb should get a good score, right?), the profit margin ... well, I am licking my chops here. Why, if I didn’t know better, I might think that testing companies add "open-ended" responses simply so they can add sweet, sweet pork fat to their budgets.
Sorry. This test is for the teams and players, not the test developer. Wink!
Of course, the Walkthrough Lick will just be the first in a vanguard of sports testing products. The Walkthrough Lick Decision Maker’s Exam will specifically test quarterbacks, middle linebackers, and other players who have to worry about more than just their own role. It can even be given to coaches! The Walkthrough Lick Kickers and Punters Exam will be 90 points for the name and 10 points for the ability to recognize DeSean Jackson, but it will cost the same amount.
Next, we will branch out to basketball and baseball pre-draft exams. Then, something for hockey players when they reach the age at which they are tracked into those juniors programs. The neonatal features will cost extra. The Walkthrough Test of Sports Blogging Aptitude will allow bloggers to display a seal on their sites, saying they have passed. And if they fail, they can write long rants about how stupid and unjust the test is, right after their entries about which athletes they think are stupid.
The NFL can have the Walkthrough Lick for a song: $50,000 in development expenses to my non-profit development corporation (hehehe), then $5,000 per year to my extremely for-profit administration company. The real money, of course, will come from the prep books: the $20 Barnes and Noble edition, and the much-more-expensive DVD edition for agents and performance institutes (contact me directly for pricing information.)
The Walkthrough Lick: pinpoint accuracy for that three-month period between college stardom and NFL employment when it doesn’t even matter much, anyway.
No 1930s two-way players who led the league with 439 rushing yards. Hooray! My gut tells me that teams which began operation in the early 1960s will have the most interesting lists. Fifty years of history provide plenty of context without having to sort through guys who played in the single wing.
1. Chuck Foreman
If you want to be historically underrated, have your peak years between 1973 and 1977, the NFL’s mini-Ice Age for offensive statistics. Play on a team that lost Super Bowls, because you will be perceived as having some deficiency which kept you from winning those Super Bowls. (If you never come close to the Super Bowl, you have a much better chance of being remembered as a lovable hard-luck case). Also, have some dud Super Bowl games –- 12 carries for 18 yards –- so all the highlight montages show defenders beating you senseless.
Foreman scored the trifecta of underrated-ness. He was one of the niftiest runners in league history, and he was incredibly productive in an era of grinding football attrition. He had some outstanding playoff performances for some great teams. For all of that, he probably will not be able to hold on to this No. 1 spot much longer.
An excellent runner whose DYAR numbers usually hover around 200 for the usual reasons: he gets fed to the line a lot, our stats are not that impressed by one-yard touchdowns, we cannot accurately gauge the effect terrible quarterbacks have on their running backs. If you are tempted to rank Peterson above Foreman right now, as opposed to after a season or two, please take Foreman’s 1975-77 rushing and receiving numbers, multiply them by 1.14 to project to a 16-game season, and compare them to Peterson’s. You raise with Fran Tarkenton, Mick Tingelhoff, and Ron Yary, and I will call with the Dead Ball-like environment of the mid-1970s. It would still be very close, but Foreman’s receiving is the trump card, at least until Peterson has another outstanding season or two.
3. Robert Smith
A historical square peg. Smith was one of the NFL’s fastest players, and he was a high-IQ guy who didn’t fit the football culture, first at Ohio State and then (to a less drastic degree) in the pros. He got stuck behind Terry Allen, then spent two years getting injured precisely at midseason, before stringing together four relatively healthy, increasingly productive, seasons.
Smith’s career arc is actually an arrow that points upward and then stops when he abruptly retired after a 1,521 yard season in 2000. DVOA and DYAR rank him as the fifth-best runner in the league in 1999 and 2000. The Vikings kept Leroy Hoard around as a short-yardage runner, which kept Smith’s touchdown totals low but kept him fresh (and may have helped his DVOA by limiting his non-nourishing carries a bit). Low touchdown totals and a lack of a decline phase make Smith’s raw numbers less impressive than they could have been. He was a heck of a player who happened to prefer science and learning to getting pummeled by linebackers. Who can blame him?
4. Bill Brown
Brown and Dave Osborn shared the Vikings backfield through most of the late 1960s and early 1970s. Either could lead the team in rushing in a given season, but Brown, the nominal fullback (the roles were almost interchangeable), would add 30-40 catches per year. He averaged 14.6 yards per catch and added nine receiving touchdowns in his best season.
Brown also hummed along at 3.3-to-3.5 yards per carry for many years. Norm Van Brocklin was a stubborn coach, and no one was going to tell him not to give 251 carries to a running back averaging 3.3 yards per rush, but Bud Grant did the same thing when he took over. This wasn’t 1936, remember: good backs averaged over four yards per carry in the 1960s. I have no idea what to make of this, but Brown was clearly doing something right.
5. Ted Brown
Brown and Darrin Nelson shared the Vikings backfield in the early 1980s in much the same way that Bill Brown and Osborn shared it in the 1960s. Brown was the "power" back who also caught a lot of passes, Nelson the speedster with the higher per-carry average most seasons. This Brown had a much shorter career than the other one; he was very good from 1980 through 1982, then tailed off into a committee role for several years.
Nelson and Osborn both deserve honorable mention. Both had long Vikings careers. Nelson got thrown into the Herschel Walker trade, then returned a few years later as a return man and third-down back. Osborn backed up Foreman at the end of his career and had the thrill of carrying eight times for minus-one yards against the Steelers in the Super Bowl.
Terry Allen deserves honorable mention for his 1992 season.
I have no idea what to do with Herschel. Maybe we can make a special "other" section for him, Jim Thorpe, and a few others. Who would belong on such a list?
1. C. By the way, homemade tests have an incredible tendency to have a preponderance of "C" answers. When making a commercial test, it is important to strive for a roughly 20 percent split across five items or a 25 percent split across four, if not on each test version, then throughout the question bank. The College Board does a fine job of this. Other companies appear to fake it.
2. D. This one may sound silly, but there are "judgment tests" in most state’s teacher’s exams, and they really do include "should you beat Johnny?" questions, with the correct answer possibly varying on a state-by-state basis.
3. C. If you were fast enough to blow up the play, you would be at right end, buddy.
4. B. "A" is also technically correct but not the best possible answer.
5. Answers may vary.
169 comments, Last at 01 Jan 2013, 7:55am by mano