Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared.
What Is Regression To The Mean
For our first article of 2018, I think it's important to nail down exactly what regression to the mean is and why it is so powerful. I'd like to illustrate it with an example from basketball.
The free throw attempt might be the purest act in all of sports. There's no defense. There's no weather. The distance and angle never change. It is exactly the same every time: one player, one ball, one hoop, one shot.
For his career, Steph Curry shoots 90% on free throws, but on a game-to-game level, there's a little bit of variance. Imagine that in Week 1 of the 2018-2019 season Curry makes 3 of 6 free throws. (This would be a wildly uncharacteristic game, but it's not impossible; Curry once shot 1-of-4 and has twice gone 4-of-7.)
Nobody in their right mind would look at this game and conclude that Curry was suddenly a 50% free throw shooter, right? Instead, we'd think this game was an outlier and expect him to go back to hitting 90% the rest of the way. Because 90% is Curry's long-term mean, (or average), and we expect him to regress, (or return), to it.
Just like Steph Curry has an innate average free throw percentage, so does every player have an innate average talent level. And just as Curry's game-by-game results can deviate from that average, so can every player's results deviate from their own true mean. And just like we'd expect Curry to return to his average, we should expect all players to return to theirs, as well.
That's regression to the mean in a nutshell. It's a concept we all intuitively understand, even if we don't talk about it in so many words.
And if we're going to take advantage of regression to the mean, there are four guiding principles we need to keep in mind.
Principle #1: Everyone regresses to the mean.
Principle #2: Everyone's mean is different.
The four leading receivers through one week are Jared Cook (180 yards), Michael Thomas (180 yards), Julio Jones (169 yards), and Tyreek Hill (169 yards). All four of these men will average fewer yards per game going forward. I know this because I know no one's "true mean" is 150+ yards per game. Wes Chandler holds the single-season record with 129 yards per game in the strike-shortened 1982 season. Calvin Johnson's 122.8 yards per game is the best anyone has done over a full 16-game schedule.
But just because all four players will regress doesn't mean all four players will regress the same amount. Julio Jones averages 96.1 yards per game for his career, which is the highest mark in league history. Jones already owns two of the top-20 single-season averages. It's very plausible that he averages 100 yards per game this season.
Jared Cook, on the other hand, is in his 10th season and has never averaged more than 50 yards per game. His career average is 35.4. This week was just the 7th time in his career he's topped 100 yards. It is unthinkable that Cook will average 100 yards per game this season.
Just because all four players are guaranteed to regress doesn't mean our expectations of all four players should be the same going forward.
Principle #3: Regression by itself doesn't change player order.
Let's say I have two mystery running backs. Player A is averaging 20 points per game, (or ppg), and Player B is averaging 18 ppg. I tell you that you can have your pick between them. Who do you choose?
Player A is certainly the bigger outlier. He's almost certainly going to regress more than Player B. But Player B is going to regress, as well, and unless we know something else about them we have to assume that Player A will still be ahead afterward. Maybe they average 13 and 12 ppg going forward, but you still want the player who is scoring more today. Keep this in mind the next time you see someone merely point to a player's high fantasy point total and cry "regression".
Principle #4: Regression operates on multiple dimensions.
Our third principle tells us we can't just look at a statistic we care about, (in this case, fantasy points), and apply the concept of regression directly. All that does is tell us that good players are likely to remain good, (if slightly less so), and bad players are likely to remain bad, (if also slightly less so).
But players are going to regress in several ways all at the same time. If a quarterback has an abnormally high number of pass attempts, but an abnormally low yard per attempt average, we should expect his number of attempts to come down... but we should also expect his average per attempt to come up.
Some dimensions are more stable than others. Rush attempts are much more predictable from week to week than yards per carry. Yardage totals vary a lot less than touchdown totals. By focusing on the secondary elements of a player's production that are most likely to regress, we can find ways to change the order of the list we actually care about.
So, for instance, if we want to find players who will score fewer fantasy points, we might look at players who are scoring a lot of touchdowns right now. And if we want to find players who will score more fantasy points, maybe we look at players who have lots of targets but a low yard per target average.
By combining these principles, we can get one step ahead of our leaguemates. We can buy and sell tomorrow's production at today's prices and consistently reap a profit. All by simply understanding regression, how it works, and how we can put it to work for us.