Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared.
THE SCORECARD
In Week 2, I laid out our guiding principles for Regression Alert. No specific prediction was made.
In Week 3, I discussed why yards per carry is the least useful statistic and predicted that the rushers with the lowest yard-per-carry average to that point would outrush the rushers with the highest yard-per-carry average going forward.
In Week 4, I explained why touchdowns follow yards, (but yards don't follow back), and predicted that the players with the fewest touchdowns per yard gained would outscore the players with the most touchdowns per yard gained going forward.
In Week 5, I talked about how preseason expectations still held as much predictive power as performance through four weeks. No specific prediction was made.
In Week 6, I looked at how much yards per target is influenced by a receiver's role, how some receivers' per-target averages deviated from what we'd expect according to their role, and predicted that the receivers with the fewest yards per target would gain more receiving yards than the receivers with the most yards per target going forward.
In Week 7, I demonstrated how randomness could reign over smaller samples, but regression dominates over larger ones. No specific prediction was made.
In Week 8, I discussed how even something like average career length could be largely determined by regression-prone fluctuations in incoming talent. No specific prediction was made.
In Week 9, I looked at running backs scoring touchdowns at an unsustainable rate and posited that even Todd Gurley must return to earth.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 24% more rushing yards per game
|
Group B has 4% more rushing yards per game
|
SUCCESS!
|
Yards:Touchdown Ratio
|
Group A had 28% more fantasy points per game
|
Group B has 23% more fantasy points per game
|
SUCCESS!
|
Yards per Target
|
Group A had 16% more receiving yards per game
|
Group A has 13% more receiving yards per game
|
Failure
|
Yards:Touchdown Ratio
|
Group A had 26% more fantasy points per game
|
Group A has 112% more fantasy points per game
|
3
|
Mistakes were made!
After three weeks of hovering at rough parity, Group A had its best performance in week 9 and Group B had its worst, moving the post-prediction disparity almost all the way back to where it was pre-prediction. This is a bad loss for regression to the mean.
If we were to perform a postmortem we could doubtless come up with many explanations for why. Injuries hit Group B (Diggs, Hilton, Robinson, Enunwa) much harder than Group A (Kupp). Demaryius Thomas was phased out and then traded in Denver. In the end, Group A's yards per target dropped, but not as much as expected. Group B's yards per target increased, but not as much as expected. And most damagingly, Group B's target advantage cratered; they averaged about 3 more targets per game at the time of prediction, but only one more target per game since.
If I were to take away one lesson from the bad beat, it would be this: going forward, I should adjust yards per target not just for yards per reception, but also for quarterback. We know that a quarterback's yards per attempt is one of the most stable statistics around, and it probably should have been a tipoff that the two biggest underperformers both played with Joe Flacco, whereas two of the biggest overperformers played with Jared Goff. (On the other hand, another two of the biggest overperformers played with Derek Carr, and those two regressed as anticipated, so maybe this should be filed under "more study needed".)
As for last week's prediction that even Todd Gurley regresses... that top-line number looks really scary, but a lot of that is attributable to the fact that the top two backs in Group B, (Saquon Barkley and Joe Mixon), were on bye... and so were the two worst backs in Group A (David Johnson and Marlon Mack). The other issue is that Gurley did indeed regress, (turning in his worst game of the season and, in fact, the worst game of anyone in Group A), but the rest of Group A went off, averaging 130 yards and 2 touchdowns between them.
Group B faces a massive hole to climb out of and I'm not feeling great about their chances, but there's a reason we give them four weeks.
What is the purpose of Regression Alert?
Don't let the lede scare you; we have not yet reached the point in the season where Regression Alert becomes self-aware and begins to question its own existence. (That's usually sometime around Week 14.)
I get some variant of this question quite often, though. What is the purpose of Regression Alert? What lessons am I supposed to take away from it? Should I be selling the players in Group A? Buying the players in Group B? Both? Neither?
It's a tough question, and unfortunately one without a great answer. Theoretically, yes, you should be happy to sell high on the players in Group A and to buy low on the players in Group B. But "sell high" and "buy low" depend very much on what you can get. And the best players in Group A will typically remain very productive, (albeit not quite as productive), while the worst players in Group B will often remain unproductive, (albeit slightly less so). And beyond this, there's the fact that trades are typically cut off at some point in the season, are just hard to pull off in redraft leagues, and in dynasty leagues you're often going to have bigger considerations than how a player will perform over the next four weeks. (Like: how the player will perform over the next four years.)
In DFS there's a lot of value in knowing what kinds of performances are sustainable and what kinds you should be betting against. But DFS prices are pretty sharp and the people who set them know about regression, so the edge to be gained there isn't massive.
So why write a column all about regression to the mean when it's hard to find a lot of opportunities to put it into practice to improve our teams? Partly because regression to the mean is really, really important. It's probably the most powerful force in the fantasy universe, and understanding it will improve the choices you make when drafting, when setting lineups, when trading, and so on.
But mostly it's because regression to the mean is really, really misunderstood. Regression is very powerful when used as the midpoint of analysis, ("player X is probably going to see fewer touchdowns going forward, so he might fall behind other running backs who are producing similar fantasy point totals with more sustainable touchdown rates"). It's also completely useless when used as the endpoint of analysis, ("player X has scored a lot of points so far and will probably score fewer going forward"). That second statement is true, common, and useless. (Why? Because the other players around him will probably score fewer points, too, so it doesn't change the relative ordering any. The best remain the best and the worst remain the worst.)
Regression Alert is my attempt to demystify the most powerful force in fantasy. Yes, hopefully you find some actionable insights in the process. But even if you don't, understanding the nature and features of randomness makes us better, smarter, more forward-looking fantasy owners, and that pays dividends in all sorts of subtle ways.