Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared.
THE SCORECARD
In Week 2, I laid out our guiding principles for Regression Alert. No specific prediction was made.
In Week 3, I discussed why yards per carry is the least useful statistic and predicted that the rushers with the lowest yard-per-carry average to that point would outrush the rushers with the highest yard-per-carry average going forward.
In Week 4, I explained why touchdowns follow yards, (but yards don't follow back), and predicted that the players with the fewest touchdowns per yard gained would outscore the players with the most touchdowns per yard gained going forward.
In Week 5, I talked about how preseason expectations still held as much predictive power as performance through four weeks. No specific prediction was made.
In Week 6, I looked at how much yards per target is influenced by a receiver's role, how some receivers' per-target averages deviated from what we'd expect according to their role, and predicted that the receivers with the fewest yards per target would gain more receiving yards than the receivers with the most yards per target going forward.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 24% more rushing yards per game
|
Group B has 4% more rushing yards per game
|
SUCCESS!
|
Yards:Touchdown Ratio
|
Group A had 28% more fantasy points per game
|
Group B has 18% more fantasy points per game
|
1
|
Yards per Target
|
Group A had 16% more receiving yards per game
|
Group A has 4% more receiving yards per game
|
3
|
Last week, Group A pulled ahead of Group B on the strength of a monster performance by Isaiah Crowell. This week, Group B retook the lead thanks to a huge day from Todd Gurley, cementing regressions first win of the season. When betting on regression, longer timelines are always an asset... but more on that in just a bit.
For the first time since we made our yards per touchdown prediction, the "high-touchdown" group actually reached the end-zone more than the "low touchdown" group. Group B had still built up enough of a lead in the first two weeks that they enter the final week of the prediction with a comfortable advantage.
Our yards per target groups have already started to regress, with the "high YPT" Group A seeing their average fall by more than three yards. Watch out over the next three weeks as Group B tries to finish closing the gap.
Time is On Your Side
Why do I make predictions in this column about what will happen over the next four weeks? Because five is too many and three is not enough.
If I could, I'd make every prediction for the rest of the season. But the key to this column is accountability— I know that regression to the mean works, and I want to show it in action. For accountability, each prediction needs to have an end date so it can be graded and scored.
But given that each prediction needs a stated end, when should that end be? And here we get into the power of time. The more weeks a prediction covers, the more likely it is that random noise will wash out and regression will dominate the results.
Think of it this way: imagine that in any given week there's a 60% chance a player will regress to the mean and a 40% chance he'll maintain his unsustainable level of play. In any given week, 40% of my predictions will be wrong!
On the other hand, the odds of a player defying regression in two consecutive weeks is only 16%. The odds of maintaining an unsustainable level of play for three straight weeks is just 6%. The odds of doing it for four straight weeks are under 3%.
Every extra week a prediction runs increases the chances that prediction pays off. (And this is also why I try to make my comparison groups as large as is feasibly possible— the more players involved, the more likely regression shows up.)
Here's what this effect looks like in graphical form. After week 2 last year, I identified a group of high yard-per-carry running backs, (Group A), and a group of low yard-per-carry running backs, (Group B), and predicted Group B would outrush Group A going forward.
Here's how that prediction played out over the course of the season, comparing Group B's rushing total to Group A's over each 1-week, 2-week, 3-week, 4-week, and on through 14-week spans. I've applied a heat map; BLUE means that Group B averaged more yards per game, RED means Group A averaged more yards per game. (Grey means the comparison isn't available; we can't find a 6-week average after only three weeks, right?)
The first thing you might notice is that there are a lot of individual weeks where Group A outperformed Group B, (it happened in 4 out of 14 weeks). There are two two-week spans where Group A outrushed Group B, (weeks 3-4 and weeks 15-16). But by the time you get to three-week averages, Group B outperforms over every stretch. By the time you get to five-week averages, Group A never comes within 5 yards per game of Group B. By the time you get to eight-week averages, Group A never comes within 10 yards per game of Group B.
(This variance works both ways, actually. There were two one-week and two two-week stretches were Group B outrushed Group A by at least 40 yards per game. By the time you get down to nine-game stretches, Group B never outrushed Group A by fewer than 17 yards per game or more than 27 yards per game.)
In the interest of being accountable, I want my predictions to be as short as possible to minimize the amount of time before they come due. In the interest of being right, I want my predictions to be as long as possible to maximize the chances that regression has enough time to kick in. (Indeed, my biggest failed prediction last year would have been successful if tracked over a 12-week sample instead of a 4-week sample.)
It's hardly an exact science, but I find that four weeks tends to strike a good balance between those two competing goals.