Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I break down a topic related to regression to the mean. Some weeks, I'll explain what it is, how it works, why you hear so much about it, and how you can harness its power for yourself. In other weeks, I'll give practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Justin Jefferson is one of the top performers in my sample, then Justin Jefferson goes into Group A, and may the fantasy gods show mercy on my predictions.
And then because predictions are meaningless without accountability, I track and report my results. Here's last year's season-ending recap, which covered the outcome of every prediction made in our seven-year history, giving our top-line record (41-13, a 76% hit rate) and lessons learned along the way.
Our First Prediction of the Year
Last week, I laid out the three key insights when predicting regression.
Principle #1: Everything regresses to the mean.
Principle #2: Not everything regresses at the same rate.
Principle #3: Not everything has the same mean.
I also provided a hypothetical example of how to leverage these insights, breaking receiver production into its component parts and noticing how certain components would regress more or less strongly than others, changing the overall level of production at different rates.
That's all well and good with hypothetical receivers; does it work with real ones, too? Why yes, it does.
This column's origins date back to 2015 when I wrote about how for receivers, touchdowns tend to follow yards but yards don't tend to follow touchdowns, then provided a list of the receivers who were averaging the most and fewest yards for every touchdown scored.
Within just two weeks, that list had completely flipped on its head. The "high-touchdown" receivers suddenly couldn't reach the end zone and the "low-touchdown" receivers couldn't stop scoring. I've revisited this prediction thirteen more times for Regression Alert, and for twelve of those predictions, the "low-touchdown" group immediately rallied and outscored the "high-touchdown" group over the next month.
On average, the high-touchdown receivers (our "Group A") were outscoring the low-touchdown receivers (our "Group B") by 15.5% at the time of the prediction. On average, Group B outscored Group A by 19.9% over the next month-- a 35.4% total swing.
Stochastic
Why do the low-touchdown receivers do so well here? Let's start with some new vocabulary.
sto·chas·tic
adjective
randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.
Touchdowns are stochastic. Over his career, Cam Newton rushed for 77 touchdowns in 155 games, an average of 0.5 touchdowns per game. We could say that's his "true production level", and over a sufficiently long timeline, we'd probably expect him to conform to that, averaging 0.5 touchdowns per game.
Despite that being his true production level, though, guess how many times Cam Newton rushed for half a touchdown in a game? As far as I can tell (and I have researched this topic extensively), it has never happened. Instead, he either scored zero touchdowns... or he scored one touchdown. (Sometimes, he scored two touchdowns, and once, he even rushed for three touchdowns.) Because they are binary outcomes, we can analyze Cam Newton's rushing touchdowns statistically, but we cannot predict them precisely.
Yards don't behave quite the same way. Over his career, Cam Newton averaged 38.6 rushing yards per game. But it's not like every week he's either getting you 0 yards or else he's getting you 75 yards. Instead, more games than not, he's getting you somewhere between 20 and 60 yards. His yardage total is much more consistent from game to game than his touchdown total.
Using Standard Deviations
One way to measure consistency is something called standard deviation, which measures how much something varies around the average. The standard deviation of Newton's rushing yardage is 24.5 yards. The standard deviation of Newton's rushing touchdowns is 0.65 touchdowns.
Now, these numbers are not directly comparable. Standard deviations for large values are naturally bigger than standard deviations for small values. (Consider: if you switched to "feet rushing per game" rather than "yards rushing per game", the standard deviation would triple despite the underlying game-to-game variation remaining unchanged. The standard deviation of "inches rushing per game" would be twelve times higher, still!)
But if you divide a player's standard deviation by that player's average, you get something called the coefficient of variation, or CV. CV is a way to compare how volatile different statistics are. The CV of Newton's yards is 64%, meaning it tends to vary by about 64% of his overall average. The CV of Newton's touchdowns is 130%. Touchdowns are much more random from week to week than yards are— in Newton's case, about twice as random, according to CV. (For those curious, the CV of Newton's rush attempts was 42%; "usage" stats like attempts tend to be more stable from week to week even than yards.)
Not only are they more unstable, but touchdowns are also much more valuable than yards. In most scoring systems, one extra touchdown is worth the equivalent of 60 extra yards. If Newton rushed for "too many" touchdowns early in the year, it could dramatically inflate his fantasy production to date. If he rushed for "too few", it could leave him far lower than we'd otherwise expect.
Touchdowns: How Many Is Too Many or Too Few?
Which raises an important question: how do we know how many touchdowns is "too many" or "too few"? It's easy to say that Newton's "true performance level" was 0.5 touchdowns per game with the benefit of hindsight. What about when he had 17 touchdowns in his first 20 games, an average of 0.85? How were we to know that that wasn't his "true production level"?
Enter yard-to-touchdown ratios. Some players are really, really good at getting yards but not quite as good at scoring touchdowns. For years, Julio Jones was the most famous example of this; for his career, he averaged 210 receiving yards for every touchdown he scored. This is a very high average, but there are other wide receivers in this general range: Andre Johnson averaged 203 yards for every touchdown, Henry Ellard averaged 212, etc.
Other players are really, really good at getting touchdowns but typically aren't commensurately good at getting yards. For his career, Davante Adams scores a touchdown for every 114 yards he gains receiving. Again, this is a very low average but not historically implausible; Dez Bryant averaged 102 yards for every touchdown, while Randy Moss was all the way down at 98 yards per touchdown.
Importantly, yard-to-touchdown ratio is not a measure of player quality. Over 2016 and 2017, Davante Adams averaged 940 yards and 11 touchdowns. Over 2021 and 2022, Davante Adams averaged 1535 yards and 12.5 touchdowns. It should go without saying that Adams played much, much better in the latter two years than he did in the former despite averaging a "worse" yard-to-touchdown ratio. All else being equal, a guy who gains 1,500 yards and 10 touchdowns is better than a guy who gains 1,000 yards and 10 touchdowns.
If you asked who was the best receiver in the NFL at various points over the last decade, you might plausibly have heard Jones (216 yards per touchdown), Justin Jefferson (187 yards per touchdown), Michael Thomas (180 yards per touchdown), DeAndre Hopkins (161 yards per touchdown), Antonio Brown (148 yards per touchdown), Stefon Diggs (147 yards per touchdown), Calvin Johnson (139 yards per touchdown), Cooper Kupp (138 yards per touchdown), Tyreek Hill (134 yards per touchdown), Odell Beckham (133 yards per touchdown), Ja'Marr Chase (127 yards per touchdown), or Adams (114 yards per touchdown). (Similarly, I could easily find mediocre or even bad receivers who span the whole yard-to-touchdown spectrum; Devin Funchess averaged 108 yards per touchdown, but he's no Davante Adams.)
Over the long term, receivers tend to average between 100 and 200 yards per touchdown with a median around 140, and the majority of the league clustered between 120 and 180. Any rate that falls in that range is plausibly sustainable and perhaps a true representation of a player's relative skill at scoring touchdowns. (It's also plausibly not; Stefon Diggs averaged 110 yards per touchdown from 2017 to 2018 and 190 yards per touchdown from 2019 to 2020. Both samples were fairly large-- at least 29 games in each-- and neither was representative of his "true" career rate of 147 yards per touchdown.)
But while any rate within the "sustainable band" may or may not be representative, any rate outside that band is definitely going to regress. And that's good for us because over small samples the stochastic nature of touchdowns means we see a lot of rates falling outside of the sustainable band.
So let's once again pit the receivers with a lot of yards but very few touchdowns against the receivers with a lot of touchdowns but very few yards and see what happens. There are currently 38 receivers who have played 3 games and scored at least 20 points (using standard scoring-- 1 point per 10 yards, 6 points per touchdown-- to exaggerate the yardage vs. touchdowns split).
Group A Receivers
Sixteen of those receivers are averaging fewer than 100 yards per touchdown. These sixteen receivers are: Jauan Jennings, Justin Jefferson, Malik Nabers, Rashid Shaheed, Chris Godwin, Khalil Shakir, Stefon Diggs, Xavier Worthy, Drake London, Marvin Harrison Jr., Mike Evans, Amari Cooper, Allen Lazard, Quentin Johnston, Jalen Nailor, and Andrei Iosivas. Let's go ahead and remove every one of those receivers who isn't averaging at least 40 yards per game-- that cuts Iosivas, Worthy, Nailor, and Cooper from the list; they're averaging 32 receiving yards per game, I don't think it's all that impressive to suggest that they won't keep scoring a touchdown a week going forward. The remaining twelve receivers will be our Group A.
(I know I've said that I don't get to pick my samples, but the code is more what you'd call "guidelines" than actual rules. I'm allowed to put my thumb on the scale if and only if it makes the prediction more likely to fail and therefore more impressive if it succeeds.)
Group B Receivers
On the other end, ten receivers are averaging around 180 yards per touchdown or more: Chris Olave, Brian Thomas Jr., Tyreek Hill, Jayden Reed, Amon-Ra St. Brown, Jameson Williams, Davante Adams, CeeDee Lamb, DeVonta Smith, and Nico Collins. This is our Group B.
After three weeks, receivers in Group A are averaging 11.67 points per game while receivers in Group B average 9.97, a 17% edge that's roughly average for this prediction. Group B, however, averages 13% more yards per game than Group A. (This advantage is usually bigger, but then, we don't usually cut all the lowest-yardage receivers from Group A first. Perhaps that was a mistake.)
The Prediction
Regardless, Group A is scoring one touchdown for every 74 yards, while Group B is scoring once for every 218 yards. I predict that over the next four weeks, both of those values will meet somewhere closer to the middle, and as a result, Group B receivers will outscore Group A.