Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017. Over four seasons, I have made 30 specific predictions and 24 of them have proven correct, a hit rate of 80%.
The Scorecard
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I talked about yard-to-touchdown ratios and why they were the most powerful regression target in football that absolutely no one talks about, then predicted that touchdowns were going to follow yards going forward (but the yards wouldn't follow back).
In Week 5, we looked at ten years worth of data to see whether early-season results better predicted rest-of-year performance than preseason ADP and we found that, while the exact details fluctuated from year to year, overall they did not. No specific prediction was made.
In Week 6, I taught a quick trick to tell how well a new statistic actually measures what you think it measures. No specific prediction was made.
In Week 7, I went over the process of finding a good statistic for regression and used team rushing vs. passing touchdowns as an example.
In Week 8, I talked about how interceptions were an unstable statistic for quarterbacks, but also for defenses.
In Week 9, we took a look at JaMarr Chase's season so far. He was outperforming his opportunities, which is not sustainable in the long term, but I offered a reminder that everyone regresses to a different mean, and the "true performance level" that Chase will trend towards over a long timeline is likely a lot higher than for most other receivers. No specific prediction was made.
In Week 10, I talked about how schedule luck in fantasy football was entirely driven by chance and, as such, should be completely random from one sample to the next. Then I checked Footballguys' staff leagues and predicted that the teams with the worst schedule luck would outperform the teams with the best schedule luck once that random element was removed from their favor.
In Week 11, I walked through how to tell the difference between regression to the mean and gambler's fallacy (which is essentially a belief in regression past the mean). No specific prediction was made.
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
Yards per Carry | Group A had 10% more rushing yards per game | Group B has 4% more rushing yards per game | None (Win!) |
Yards per Touchdown | Group A scored 9% more fantasy points per game | Group B scored 13% more fantasy points per game | None (Win!) |
Passing vs. Rushing TDs | Group A scored 42% more RUSHING TDs | Group A is scoring 33% more PASSING TDs | None (Win!) |
Defensive Interceptions | Group A had 33% more interceptions | Group B had 24% more interceptions | None (Win!) |
Schedule Luck | Group A had a 3.7% better win% | Group B has a 38.1% better win% | 2 |
Group A had a monstrously good week intercepting the football, but it was too little too late. The Patriots and Texans both had a whopping four interceptions, the Colts had three, and the Buccaneers and Cowboys combined to chip in three more for good measure. Altogether, Group A managed 1.56 interceptions per game, well above even the lofty 1.29 interception per game average they had at the start of the prediction.
But one week does not a prediction make; Group B outperformed its prior per-game interception rate in all four weeks, and Group A underperformed its prior interception rate in three out of the four. Even with the interception explosion, Group A's per-game interception rate was 15% lower than its previous average, while Group B's was 50% higher. As a result, Group B notched a comfortable victory.
(As an aside: I had been tempted to start with a much stronger initial interception prediction that would have given Group A a 93% advantage, by far the biggest I'd ever granted in Regression Alert history. I backed out because I feared that 93% was too large a gap even for something as powerful as regression to the mean to overcome. Because of the Week 11 explosion, Group A would have finished that prediction with a 5% edge still. It would have been a dramatic reversal, but still would have granted us our first loss of the season. Sometimes discretion is the better part of valor.)
As for the schedule luck prediction, our "good but not lucky" squads are 10-4 over the last two weeks, while our "lucky but not good" teams are 4-8. Interestingly, our "unlucky" teams have benefited a little from schedule luck while our "lucky" teams have been hurt by it, which isn't a very surprising outcome because schedule luck is completely random.
(Our unlucky teams becoming lucky and our lucky teams becoming unlucky wasn't the likely outcome, either, though. Each group had a 50/50 shot at positive or negative luck, so there was about a 1-in-4 chance that we'd get a total luck swap, but also an equal 1-in-4 chance that the lucky teams would stay lucky and the unlucky teams would stay unlucky.)
The Arrow of Time and Regression to the Mean
Most of the time when we talk about regression to the mean in this space, we're talking about it from the beginning of the process. We take a starting state with large gaps between players or teams and predict an ending state where those gaps are much smaller.
But we can just as easily perform the process in reverse. We can take an ending state with small gaps and from it predict a prior starting state where those gaps were much bigger. (Technically, "predictions" about the past are referred to as retrodictions or postdictions.)
Retrodictions aren't as directly actionable. Unless you have a time machine, you can't exactly take advantage of this new knowledge. But they're a great way to test your understanding of a subject without having to wait for new evidence to come in. And having a good understanding of a topic ensures you'll be able to make better decisions going forward.
One of my great hobby-horses in fantasy football is the value of consistency. Or, more accurately, the lack of value. Fantasy GMs often prefer players they perceive as more consistent, saying they'd rather have a guy who scores 10 points every week for four weeks than a guy who scores 5 points in three weeks and 25 in the fourth. Both players totaled 40 points over the span, but the first was more consistent, he didn't leave you hanging in three of those weeks.
But setting aside whether consistency is predictable (it's mostly not), there's the question of whether it matters. How useful are those extra five points in those three weeks, really?
If you look at the current standings in your league, it might seem like those five points are extremely valuable. Picking one of my leagues at random, four out of twelve teams are fighting for a playoff spot and clustered between 166 and 171 points per game. With margins that small, an extra five points in any given week dramatically increase your chances of winning a given matchup.
That league is not an aberration. I challenge you to check the per-game averages in any of your own leagues. Typically there will be a couple of teams at the top that have clearly separated, a couple of teams at the bottom that have clearly separated, and then a big muddled middle with most of the league scoring within a handful of points per game of each other. And where one finishes in that muddled middle is usually the difference between making the playoffs or missing it. So it's no wonder that people believe consistency is so important.
But this is the invisibility of regression to the mean messing with our brains. Just because the end result is close doesn't mean the small-sample inputs that led to that result were particularly close. Indeed, if regression to the mean is causing teams to score similarly over the course of a full season, we could retrodict that the margins would be much larger over the span of a single game. In other words, most teams are close, but most games are not.
Is that the case? Let's take the clearest example I can find. In one of my leagues, one team has scored 1258.81 points and another has scored 1251.86. This is the closest-matched pair of teams I could find. But if you look at the weekly margin between the two teams, the gap in each individual game was 39.34, 16.53, 6.91, 5.38, 60.54, 24.21, 63.69, 34.23, 56.57, 59.85, and 32.84.
The average weekly margin between the two teams was 36.37 points! (That's not a result of a skewed distribution, either; the median margin was 34.23, very close to the mean.) The gap was twice as likely to be greater than 50 points (four times) than it was to be less than 15 points (twice). How useful would an extra 5 points in 75% of weeks (at the cost of 15 points in the other 25%) be in that series? Not useful at all. Assuming we're talking about exactly 5 points, it wouldn't actually flip a single outcome!
Think that's an aberration? The next-closest team pair I could find in my leagues scored 1848.00 and 1847.85 points. Weekly margins: 5.4, 29.45, 61.25, 40.3, 58.7, 15.4, 28.7, 58.4, 39.1, 22.9, 90.65. Average weekly difference: 40.93 points (median: 39.1), with as many games decided by fewer than 15 points as were decided by more than 90 points (1 each)!
Are my leagues unusual in this regard for some reason? I'm always worried about the effects of selection bias on interesting findings, so I ran an experiment on Twitter. This last week, Jonathan Taylor had one of the best single-game performances of any player in fantasy history. So I started a poll to see whether Taylor (who put up 50+ fantasy points in most formats) even impacted the results of his game.
Hey, can y’all help me out with something?
— Adam Harstad (@AdamHarstad) November 23, 2021
Decide which league is your “main” league and check out the team with Jonathan Taylor. Did it win or lose? Now, if you replaced JT’s score with a zero, would it have won or lost?
(Also pls retweet if comfortable to maximize reach.)
A less-scientific follow-up: if you voted “JT won / would have lost”, how much would the Taylor team have lost by?
— Adam Harstad (@AdamHarstad) November 23, 2021
It turns out that one of the best performances in fantasy football history didn't even matter in 51.1% of his matchups. Either the team lost with him, or it would have won anyway without him. And of the times where Taylor did make a difference, about 20% of the time even a handful of points would have been enough to secure a victory, everything beyond that was just padding.
If we want to multiply out probabilities and compare to the theoretical "consistent vs. boom/bust" debate from earlier... there was about a 10% chance that a consistent performer would have secured a win while a "bust" performance wouldn't have sufficed. But there was also about a 30% chance that "boom" performance would have secured a win while the consistent performer would have fallen short. Boom weeks happen 1/4th as often, yet were four times likelier to result in victory when they did happen. I didn't know how the results would turn out when I asked the question, but I love how neatly they illustrate the fact that points scored matters a lot but consistency barely matters at all.
Again, this is what we'd expect given what we know about regression to the mean: playoff races are tight, individual games are blowouts. But this is why I make predictions in the first place: if our model predicts something and that thing winds up being true, it makes us more confident in the quality of our underlying model. This is true when the predictions are about the past just as much as the future.
A Bonus Prediction for the Road
You may have noticed already, but I like to mix theory and practice in roughly equal measure in this column. About 50% of the time I want to talk about what regression is and how it works, while the other 50% is devoted to specific instances of it in action. But I also like to have at least two active predictions at all time, so this week I'll give you a little bit of both.
I wrote earlier in the season about how yard-to-touchdown ratios are one of my favorite regression targets. You're welcome to go back if you want to refresh yourself on the underlying theory. But in the interest of getting another projection on the board, let's run it back.
Right now there are six receivers who have 500 or fewer yards but 4 or more touchdowns. They are Corey Davis, Antonio Brown, Elijah Moore, Randall Cobb, DeAndre Hopkins, and Marquez Callaway. Collectively, they average 8.58 points per game in standard scoring. This is our Group A.
At the same time, there are nine receivers who have 500 or more yards but 3 or fewer touchdowns. One of them (A.J. Green) is right at the 500 yard / 3 touchdown mark, which puts him in the sustainable 180 yards per touchdown range. The other eight are Chase Claypool, Jakobi Meyers, Keenan Allen, Brandin Cooks, Courtland Sutton, Tyler Lockett, Kendrick Bourne, and A.J. Brown. Collectively, they average 7.84 points per game. This is our Group B.
Group A leads Group B in points per game by 9.6%. But Group B leads Group A in yards per game by 28.4%, and since touchdowns follow yards, Group B should handily outscore Group A over the next four weeks.