Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017. Over four seasons, I have made 22 or 23 specific predictions (depending on how you count them), and 18 of them have proven correct, a hit rate of 78-82%.
What Is Regression To The Mean
For our first article of the year, I think it's important to nail down exactly what regression to the mean is and why it is so powerful. I'd like to illustrate it with an example from basketball.
The free throw attempt might be the purest act in all of sports. There's no defense. There's no weather. The distance and angle never change. It is exactly the same every time: one player, one ball, one hoop, one shot.
For his career, Steph Curry shoots 90% on free throws, but on a game-to-game level, there's a little bit of variance. Imagine that in Week 1 of the 2021-2022 season Curry makes 3 of 6 free throws. (This would be a wildly uncharacteristic game, but it's not impossible; Curry once shot 1-of-4 and has twice gone 4-of-7.)
Nobody in their right mind would look at this game and conclude that Curry was suddenly a 50% free throw shooter, right? Instead, we'd think this game was an outlier and expect him to go back to hitting 90% the rest of the way. Because 90% is Curry's long-term mean (or average), and we expect him to regress (or return) to it.
Just like Steph Curry has an innate average free throw percentage, so does every player have an innate average talent level. And just as Curry's game-by-game results can deviate from that average, so can every player's results deviate from their own true mean. And just like we'd expect Curry to return to his average, we should expect all players to return to theirs, as well.
That's regression to the mean in a nutshell. It's a concept we all intuitively understand, even if we don't talk about it in so many words.
And if we're going to take advantage of regression to the mean, there are four guiding principles we need to keep in mind.
Principle #1: Everyone regresses to the mean.
Principle #2: Everyone's mean is different.
The six leading receivers through one week are Tyreek Hill (197 yards), Deebo Samuel (189 yards), Amari Cooper (139 yards), Brandin Cooks (132 yards), Antonio Brown (121 yards), and Sterling Shepard (113 yards). All six of these men will average fewer yards per game going forward. I know this because I know no one's "true mean" is 115+ yards per game, let alone 150+ yards per game. Wes Chandler holds the single-season record with 129 yards per game in the strike-shortened 1982 season. Only three players have ever averaged 115 receiving yards over a full 16-game season.
But just because all four players will regress doesn't mean all four players will regress the same amount. Antonio Brown averages 84.8 yards per game for his career, the fourth-highest mark in history. He's already topped 100 yards per game three times in his career. It's not entirely implausible that he could add a fourth in 2021.
Sterling Shepard, on the other hand, averages 55.0 yards per game for his career. He's played five seasons and hasn't topped 70 yards per game in any of them. It's entirely possible that he's making a huge leap in Year 6, but unlikely that that leap is all the way to 100 yards per game.
Just because all six players are guaranteed to regress doesn't mean our expectations of all six players should be the same going forward.
Principle #3: Regression by itself doesn't change player order.
Let's say I have two mystery running backs. Player A is averaging 20 points per game (or ppg), and Player B is averaging 18 ppg. I tell you that you can have your pick between them. Who do you choose?
Player A is certainly the bigger outlier. He's almost certainly going to regress more than Player B. But Player B is going to regress, as well, and unless we know something else about them we have to assume that Player A will still be ahead afterward. Maybe they average 13 and 12 ppg going forward, but you still want the player who is scoring more today. Keep this in mind the next time you see someone merely point to a player's high fantasy point total and cry "regression".
Principle #4: Regression operates on multiple dimensions.
Our third principle tells us we can't just look at a statistic we care about (in this case, fantasy points), and apply the concept of regression directly. All that does is tell us that good players are likely to remain good— if slightly less so— and bad players are likely to remain bad— if also slightly less so.
But players are going to regress in several ways all at the same time. Jameis Winston threw a touchdown on 25% of his passes last week; his career average is just 5%. That touchdown total is going to come down substantially. On the other hand, Winston only threw for 148 yards last week; in his five seasons as a starter, he's never averaged fewer than 250 yards per game passing. That yardage total is going to rise substantially, too.
Some dimensions are more stable than others. Rush attempts are much more predictable from week to week than yards-per-carry. Yardage totals vary a lot less than touchdown totals. By focusing on the secondary elements of a player's production that are most likely to regress, we can find ways to change the order of the list we actually care about.
So, for instance, if we want to find players who will score fewer fantasy points, we might look at players who are scoring a lot of touchdowns right now. And if we want to find players who will score more fantasy points, maybe we look at players who have lots of targets but a low yard per target average.
By combining these principles, we can get one step ahead of our leaguemates. We can buy and sell tomorrow's production at today's prices and consistently reap a profit. All by simply understanding regression, how it works, and how we can put it to work for us.
Right now all of this is in the abstract. Starting next week I'll show you how to put it into practice. I'll show you how a simple list of players sorted from high to low can, with a little bit of discretion, become one of the most powerful buy-low, sell-high tools you'll ever find.