Welcome to Regression Alert, a weekly column devoted to finding outlier statistics and unsustainable production before the rest of your league catches on. To read more background about what regression is, how it works, why it works, and how we're going to exploit it for fun and profit, please check out this introduction.
When writing a column on regression to the mean, there is an obvious place to start. Yards per carry is one of the most beloved statistics for judging running backs. Jamaal Charles has never averaged below 5 yards per carry in a season where he's had at least 20 carries*, therefore Jamaal Charles is a star. Trent Richardson had 1300 yards from scrimmage and 12 touchdowns as a rookie, ranking as a top-10 fantasy back, but his 3.6 yards per carry was an early warning sign that he would eventually be regarded as a colossal bust.
*(Technically, Charles averaged 4.97 yards per carry in 2013, but what's a few hundredths of a yard among friends?)
I've written more about Trent Richardson before, back in 2014 when another young rookie had just had a high-volume, low ypc season that had everyone drawing parallels and claiming he was destined to disappoint. I wrote that, based on history, maybe we shouldn't be writing off this Le'Veon Bell fellow quite so quickly.
Indeed, the list of high-volume, low-ypc rookie running backs was basically Trent Richardson and a who's who of Hall of Famers or almost Hall of Famers. In addition to Richardson (3.56 ypc) and Bell (3.52 ypc), there's LaDainian Tomlinson (3.65), Ricky Williams (3.49), Walter Peyton (3.46 ypc), Emmitt Smith (3.89 ypc), Matt Forte (3.92 ypc), and Marshawn Lynch (3.98 ypc).
Even the guys on the high-volume, low-ypc list who didn't go on to be All-Pros typically had several quality fantasy years in them. Karim Abdul-Jabbar, Travis Henry, Errict Rhett, and Joe Cribbs all followed up their “inefficient” rookie season with a top-12 fantasy campaign as a sophomore, Sammie Smith improved across the board and finished as RB18, and Jahvid Best looked (and produced) like a star before injuries derailed his career.
Since I wrote that article in 2014, Melvin Gordon has also found himself on the “wrong” side of the ledger with an awful rookie ypc of 3.48. Fearing the shade of Trent Richardson, many owners sold low on the “inefficient” Gordon after a “disappointing” rookie season, only to see him put up a top-10 fantasy finish in just 13 games.
Indeed, other than Richardson himself, the only running back who had a high-volume, low “efficiency” rookie season and followed it up with a disappointing sophomore campaign was James Jackson, who also happens to be the only back in the sample to average below 3 yards per carry as a rookie, (2.84), and whose team thought so little of him that they drafted William Green in the first round to replace him.
What is going on here? Why is having a terrible rookie yards per carry average such a positive sign for a player's career? The truth is that a poor yards per carry average isn't a positive sign. It just isn't a negative one, either. I'm providing a list of high-workload rookies with a low yards per carry, and the high-workload part is the real key.
Backs get a high workload because the coaching staff thinks they're good and wants to give them the ball. In the long run, backs who coaching staffs think are good and want to give the ball... tend to be pretty good. The low ypc, in the meantime, is just a meaningless fluke.
What is Yards Per Carry, Anyway?
To understand why yards per carry is a fluke, you have to understand something very important about yards per carry. It's not measuring how good a running back is. Using the terminology from our introductory column, where “X” represents the stable factors intrinsic to a running back himself, or "skill", and “Y” represents random environmental noise, or "luck"... yards per carry is closer to 0% “X” and 100% “Y” than nearly anyone would guess.
Statisticians have a concept called “face validity”. Most of the rest of us better know it as “the smell test”. Let's say I invent a statistic that I claim measures how good running backs are. The first thing I should do is look at a list of running backs under my new statistic and see if my statistic has face validity— see if it passes the smell test.
If I ranked all 69 running backs who have 500 carries since 2008, and my #1 back was Jamaal Charles and my #69 back was Trent Richardson, that would pass the smell test. But if my top-10 also included C.J. Spiller, Justin Forsett, Felix Jones, and Pierre Thomas, while Le'Veon Bell ranked 19th, Marshawn Lynch ranked 32nd, Frank Gore ranked 35th, and Matt Forte ranked 40th... that doesn't really pass the smell test anymore. But that's what the list of running backs ranked by ypc gives us.
Another way of looking at the problem: let's say I take a statistic that we can hopefully agree is completely uncorrelated with player quality. Say, hair length. If I ranked the 50 best running backs in history by the length of their hair, I'd expect 25% of the best running backs in history to fall in the top quartile of all backs for hair length, 25% to fall in the next quartile, 25% to fall in the third quartile, and 25% to fall in the bottom quartile, (with perhaps some variation depending on whether you count Edgerrin James before he shaved his dreadlocks or after).
Basically, since hair length is completely unrelated to player quality, I'd expect quality players to be distributed randomly along the hair length spectrum.
Where do quality players fall along the yards per carry spectrum? Fellow Footballguy Chase Stuart has calculated era-adjusted yards per carry for each of the top 200 running backs in history by total carries. Since 1990, 17 running backs have made the Hall of Fame. Here's how many running backs rank in each decile on that list:
Top 10%: 12%
2nd 10%: 12%
3rd 10%: 24%
4th 10%: 18%
5th 10%: 12%
6th 10%: 6%
7th 10%: 6%
8th 10%: 12%
9th 10%: 0%
bottom 10%: 0%
To be fair, part of the reason there aren't more backs from the top deciles is that some of the guys in that range are already enshrined. Jim Brown, Gale Sayers, Lenny Moore, and O.J. Simpson all rank in the top 20 all-time in era-adjusted ypc. But so do Robert Smith, Greg Pruitt, Wendall Tyler, and James Brooks.
There's clearly not zero correlation between yards per carry and player quality, (or “X value”). But, just as clearly, what little correlation there is is very weak.
Meanwhile, there's also mountains of evidence that yards per carry is almost entirely due to extrinsic factors (or "Y value") instead of intrinsic factors. Danny Tuccito has calculated how long it has historically taken various statistics to “stabilize”— to reach a point where they represent 50% player skill and 50% external factors. (Using my framing, 50% X and 50% Y.)
For instance, for Yards per Attempt, (arguably the single best "simple stat" in all of football), it takes about 396 pass attempts before a player's average represents 50% skill, 50% luck. That's a little bit less than a full season in an offense.
For yards per carry to stabilize, a back would need about 1978 carries, (in Danny's words, "a vomit-inducing" 1978 carries). For context, that's about how many carries Marshawn Lynch, Chris Johnson, or LeSean McCoy have in their careers. But yards per carry hasn't stabilized for those three backs yet because we actually need 1978 carries on the same team and in the same offense. Essentially, the answer to the question of when yards per carry stabilizes is “never”. A back's yards per carry is always more luck than skill.
Why is this? A big part of it is that yards per carry is very sensitive to outliers. You've all probably seen someone say “take away his long run this game and he was below-average”. But this isn't just true about single games. Last year, there were 16 running backs who were above the league average in yards per carry. For seven of them, removing just one or two carries would drop them below the league average again.
What does it mean to say that yards per carry is always more luck than skill? Well, for one thing, the correlation between yards per carry in one year and the next is extremely low. Not only that, the correlation between yards per carry between one 8-game sample and another 8-game sample in the same season is extremely low.
If a running back averages 5.00 yards per carry in one 8-game sample, based on regression we'd expect him to average 4.37 in the other. If a running back averages 3.50 yards per carry in one 8-game sample, we'd expect him to average 3.93 in the other. Thanks to the magic of regression to the mean, a chasmic 1.5 yard per carry difference shrunk to a barely noticeable 0.44 yard per carry difference.
So like I said at the top, any discussion of regression to the mean would be remiss not to lead off with yards per carry. This is the quintessential regression stat. It doesn't really measure how good a player it, it's always more a product of luck than skill, and it fluctuates wildly and randomly between samples.
Volume, on the other hand, is incredibly sticky. Backs who get a lot of touches with a low yards per carry average are likely, going forward, to get a lot of touches with a higher yards per carry average. On the other hand, backs who get a few touches with a high average are likely, going forward, to get a few touches with a lower average.
Predicting Regression
And this presents us with our first opportunity of the season to profit off of regression. Consider the following two groups:
Group A: Kareem Hunt, Carlos Hyde, Derrick Henry, Dalvin Cook, Chris Carson, Rob Kelley, and C.J. Anderson
Group B: Leonard Fournette, LeSean McCoy, Ezekiel Elliott, Le'Veon Bell, Jonathan Stewart, Ty Montgomery, Isaiah Crowell, and Melvin Gordon
Group A consists of every running back in the league with 20 carries and 4.4 yards per carry or better. Group B is every running back with at least 25 carries an a ypc average of 3.5 or worse, (minus Mike Gillislee, because short-yardage specialists and role players are potential regression-busters).
I'm not cherrypicking my groups, here. Other than Gillislee, this is every high-YPC back vs. every low-YPC back.
Per game, group A averages 14.4 carries for 81.8 rushing yards, a remarkable 5.70 yard per carry average. Group B averages 16.3 carries for 51.3 rushing yards, a pathetic 3.16 yard per carry average.
Despite the massive handicap of Group A averaging 60% more rushing yards per game and 80% more rushing yards per carry to this point, I predict that Group B will rush for more yards per game over the next four weeks. We'll revisit after week 6 and see how I did.