For those who are new to the feature, here's the deal: every week, I break down a topic related to regression to the mean. Some weeks, I'll explain what it is, how it works, why you hear so much about it, and how you can harness its power for yourself. In other weeks, I'll give practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Justin Jefferson is one of the top performers in my sample, then Justin Jefferson goes into Group A, and may the fantasy gods show mercy on my predictions.
And then because predictions are meaningless without accountability, I track and report my results. Here's last year's season-ending recap, which covered the outcome of every prediction made in our seven-year history, giving our top-line record (41-13, a 76% hit rate) and lessons learned along the way.
Our Year to Date
Sometimes, I use this column to explain the concept of regression to the mean. In Week 2, I discussed what it is and what this column's primary goals would be. In Week 3, I explained how we could use regression to predict changes in future performance-- who would improve, who would decline-- without knowing anything about the players themselves.
Sometimes, I point out broad trends. In Week 5, I shared twelve years worth of data demonstrating that preseason ADP held as much predictive power as performance to date through the first four weeks of the season.
Other times, I use this column to make specific predictions. In Week 4, I explained that touchdowns tend to follow yards and predicted that the players with the highest yard-to-touchdown ratios would begin outscoring the players with the lowest. In Week 6, I explained that yards per carry was a step away from a random number generator and predicted the players with the lowest averages would outrush those with the highest going forward.
The Scorecard
Statistic Being Tracked | Performance Before Prediction | Performance Since Prediction | Weeks Remaining |
---|---|---|---|
Yard-to-TD Ratio | Group A averaged 17% more PPG | Group B averages 19% more PPG | 1 |
Yards per carry | Group A averaged 22% more yards per game | Group B averages 60% more yards per game | 3 |
Our "high-yardage" receivers continue to average more yards per game than our "high-touchdowns" receivers-- 69.2 to 63.5. Because touchdowns follow yards, they continue to average more touchdowns per game, too-- 0.57 to 0.45. Since fantasy points are a product of yards and touchdowns, it's no surprise they continue to lead in fantasy points per game, too.
Heading into last week, our Group A "high yard per carry" backs were collectively averaging 5.72 yards per carry. Last week that fell to 4.38, right in line with league average. Our Group B "low ypc" backs were averaging 3.47 yards per carry. Last week that rose to... 6.11-- a higher average than our "high ypc" cohort had to start!
This wasn't the result of a single player carrying the group. Four Group B backs played (Kyren Williams was on bye), all four of them received at least 14 carries, and the worst average any of the players posted was D'Andre Swift's 5.35. Two thirds of our "high ypc" backs had a worse average than the worst "low ypc" back.
This was, of course, a fluke-- albeit an illustrative one. I argued last week that yards per carry is largely random over small samples. Every back essentially flipped a coin, and it just so happened they all flipped heads. Maybe next week, they'll all flip tails and average 3.5 yards per carry again. Maybe half will flip heads, half will flip tails, and they'll average something near league average. Who knows?
I do know that volume advantages are much more stable, and whatever Group B's ypc winds up at over the next month, I bet its volume advantage will prove decisive.
Regression and Large Samples
The performance of our "low ypc" backs above highlights one of the key facts of regression to the mean: outlier performances are significantly more likely over small samples. This informs the selection of our groups-- because of the small samples provided by an NFL season, the most extreme values in any given statistic are most likely chance-driven outliers.
This also informs the nature of our predictions. If I flip a coin that's weighted to land on heads 60% of the time, that means there's still a 40% chance it lands on tails. Given those odds, landing on tails wouldn't be very surprising at all. But if I flipped the same coin a million times, the odds of seeing tails come up more often than heads dwindles down to nothing.
Continue reading this content with a PRO subscription.
"Footballguys is the best premium
fantasy football
only site on the planet."
Matthew Berry, NBC Sports EDGE