For those who are new to the feature, here's the deal: every week, I break down a topic related to regression to the mean. Some weeks, I'll explain what it is, how it works, why you hear so much about it, and how you can harness its power for yourself. In other weeks, I'll give practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Justin Jefferson is one of the top performers in my sample, then Justin Jefferson goes into Group A, and may the fantasy gods show mercy on my predictions.
And then because predictions are meaningless without accountability, I track and report my results. Here's last year's season-ending recap, which covered the outcome of every prediction made in our seven-year history, giving our top-line record (41-13, a 76% hit rate) and lessons learned along the way.
Our Year to Date
Sometimes, I use this column to explain the concept of regression to the mean. In Week 2, I discussed what it is and what this column's primary goals would be. In Week 3, I explained how we could use regression to predict changes in future performance-- who would improve, who would decline-- without knowing anything about the players themselves. In Week 7, I explained why large samples are our biggest asset when attempting to benefit from regression.
Sometimes, I point out broad trends. In Week 5, I shared twelve years worth of data demonstrating that preseason ADP held as much predictive power as performance to date through the first four weeks of the season.
Other times, I use this column to make specific predictions. In Week 4, I explained that touchdowns tend to follow yards and predicted that the players with the highest yard-to-touchdown ratios would begin outscoring the players with the lowest. In Week 6, I explained that yards per carry was a step away from a random number generator and predicted the players with the lowest averages would outrush those with the highest going forward.
The Scorecard
Statistic Being Tracked | Performance Before Prediction | Performance Since Prediction | Weeks Remaining |
---|---|---|---|
Yard-to-TD Ratio | Group A averaged 17% more PPG | Group B averages 10% more PPG | None (Win!) |
Yards per carry | Group A averaged 22% more yards per game | Group B averages 27% more yards per game | 2 |
We close out our "yard-to-touchdown ratio" prediction with one of the weirder weeks I can remember. Seven Group B receivers played last week, and only three gained more than 10 receiving yards. Two of them finished the day with negative yards-- DeVonta Smith with -2 and Jameson Williams with -4. As a result, our "high-yardage" receivers saw their yardage advantage almost completely disappear; they averaged 60.6 yards per game since our prediction compared to 60.2 for the "low-yardage" receivers.
Ironically, they were saved by their touchdown production-- 0.5 touchdowns per game vs. 0.43 for our "touchdown-heavy" group. As a result, they managed to flip the script and outscore Group A since our prediction.
Last Week, our "low-ypc" backs finished with a 6.1 ypc while our "high-ypc" backs had just 4.4, and I noted that since ypc was functionally random, that was as likely as not to flip back in Week 7. It did; our "high-ypc" backs averaged 5.5 (driven by an absurd 32 carries for 345 yards from Derrick Henry and Saquon Barkley) vs. 4.0 for our "low-ypc" backs. Order was restored to the universe.
Of course, our Group B backs got 17.5 carries vs. 13.1 for Group A, so they still rushed for about the same amount (69.3 yards per game vs. 71.4). That was always the prediction; given Group B's workload advantage, Group A would need to maintain an unsustainable ypc to keep pace. Over a single week, they were up to the challenge. Over an entire month? We'll see.
Unlikely Things are the Most Likely Thing
Every year I devote a few of my predictions to some of the most notoriously unreliable statistics in football-- yards per carry, touchdown rate, and the like. I discuss why they're so unreliable, share some research to that effect, and then demonstrate how we can exploit that unreliability to make a profit.
For my other predictions, I like to leave myself open to the universe. For instance, In 2021, an announcer during a Titans game mentioned Tennessee had scored twice as many rushing touchdowns as passing touchdowns, which sounded interesting. I checked how unusual that was historically and predicted regression. In 2022, I saw a tweet that the season featured the lowest average margin of victory since 1932. I drilled into the data and found that it wasn't being driven by outliers, so it was likely to regress a little, but not much. In 2023, the talk of the season was the decline in passing yards. I predicted that based on typical in-season trends, not only would they stay low, they'd actually get even lower.
Each of these things was unlikely. It was unlikely that the 2021 Titans would become the first team since the 2009 Dolphins to rush for double-digit touchdowns and pass for half as many as they rushed for. It was unlikely that 2023 would see a decade-long positive passing trend halted and reversed. It was staggeringly unlikely that 2022 would feature the closest NFL games since before World War II. If you were placing bets before the season, you likely could have gotten great odds on any of those.
But I know heading into the season that by the time midseason rolls around, something unlikely will be happening, even if I had no idea what it might be. Because "no unlikely outcomes" is far and away the most unlikely outcome of all.
Enter the 2024 Dallas Cowboys.
Dallas has played three road games and three home games so far this season. On the road, they're 3-0, winning by a combined total of 24 points. At home, they're 0-3, losing by a whopping 66 combined points. This is very unusual; teams typically play better at home than on the road.
(If we extend back to the last two games of 2023, Dallas has lost its last four home games by 82 points and won its last four road games by 52 points, a staggering 134-point swing.)
Timo Riske of Pro Football Focus studied extreme home/road splits to see what happened to the team going forward. I didn't need to read the results to have a fairly good guess about what it would show.
Bold prediction: they regressed to historical norms. https://t.co/LLjQkDO9gx
— Adam Harstad (@AdamHarstad) October 17, 2024
Why is it that in 2022 and 2023 I thought the close games and low passing totals were a meaningful trend, but in 2021 and 2024 I thought the Titans' run/pass splits and the Cowboys home/road splits were likely just statistical noise? Let's walk through a few things I look for when evaluating sustainability.
Continue reading this content with a PRO subscription.
"Footballguys is the best premium
fantasy football
only site on the planet."
Matthew Berry, NBC Sports EDGE