Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017. Over four seasons, I have made 30 specific predictions and 24 of them have proven correct, a hit rate of 80%.
The Scorecard
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I talked about yard-to-touchdown ratios and why they were the most powerful regression target in football that absolutely no one talks about, then predicted that touchdowns were going to follow yards going forward (but the yards wouldn't follow back).
In Week 5, we looked at ten years worth of data to see whether early-season results better predicted rest-of-year performance than preseason ADP and we found that, while the exact details fluctuated from year to year, overall they did not. No specific prediction was made.
In Week 6, I taught a quick trick to tell how well a new statistic actually measures what you think it measures. No specific prediction was made.
In Week 7, I went over the process of finding a good statistic for regression and used team rushing vs. passing touchdowns as an example.
In Week 8, I talked about how interceptions were an unstable statistic for quarterbacks, but also for defenses.
In Week 9, we took a look at JaMarr Chase's season so far. He was outperforming his opportunities, which is not sustainable in the long term, but I offered a reminder that everyone regresses to a different mean, and the "true performance level" that Chase will trend towards over a long timeline is likely a lot higher than for most other receivers. No specific prediction was made.
In Week 10, I talked about how schedule luck in fantasy football was entirely driven by chance and, as such, should be completely random from one sample to the next. Then I checked Footballguys' staff leagues and predicted that the teams with the worst schedule luck would outperform the teams with the best schedule luck once that random element was removed from their favor.
In Week 11, I walked through how to tell the difference between regression to the mean and gambler's fallacy (which is essentially a belief in regression past the mean). No specific prediction was made.
In Week 12, I showed how to use the concept of regression to the mean to make predictions about the past and explained why the average fantasy teams were close but the average fantasy games were not. As a bonus, I threw in another quick prediction on touchdown over- and underachievers (based on yardage gained).
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
Yards per Carry | Group A had 10% more rushing yards per game | Group B has 4% more rushing yards per game | None (Win!) |
Yards per Touchdown | Group A scored 9% more fantasy points per game | Group B scored 13% more fantasy points per game | None (Win!) |
Passing vs. Rushing TDs | Group A scored 42% more RUSHING TDs | Group A is scoring 33% more PASSING TDs | None (Win!) |
Defensive Interceptions | Group A had 33% more interceptions | Group B had 24% more interceptions | None (Win!) |
Schedule Luck | Group A had a 3.7% better win% | Group B has a 27.0% better win% | 1 |
Yards per Touchdown | Group A scored 10% more fantasy points per game | Group B has 23% more fantasy points per game | 3 |
In this space last week, I talked about how Group A had positive schedule luck at the time of the prediction but negative schedule luck through two weeks. Well, they were back to being lucky in Week 12, and now their overall schedule luck for the sample is completely neutral (they've won an average of 0.01 fewer games than expected based on their all-play record, as close to perfectly neutral luck as you'll find). Group B, on the other hand, has still been a little bit lucky (the seven teams combined have nearly a win more than you'd expect based on performance over the last three weeks).
But all of this talk of luck coming and going illustrates the point. Luck is totally random. It appears and disappears without any rhyme or reason. It's purely driven by chance. But underlying team performance is not random. Group A had an all-play winning percentage of 40.3% at the time of the prediction. They have an all-play win% of 43.3% in the three weeks since. Group B had an all-play win% of 60.4% before the prediction and 60.1% since. While luck can't be counted on going forward, underlying team quality certainly can.
As for our quick yard-to-touchdown prediction, there's not much to report yet. Half of Group A was out with an injury or a bye. Most of Group B played (with the exception of arguably their best receiver, A.J. Brown, who will miss most of the prediction after landing on injured reserve). Group B lived up to their reputation as yardage monsters with four out of seven players topping 80 yards and Kendrick Bourne chipped in two touchdowns (after scoring three the entire season prior), and Group B staked an early lead.
These Predictions Are All Coming True (And That Might Be Bad?)
So far our first four predictions have come true and my fifth is coasting in for a victory. (All we need is a single loss among Group A and a single win among Group B.) This is already the first time we've opened the season with four straight wins, and a fifth victory will mark the first time we've won five consecutive predictions at any point in a single season. Does this mean I'm getting better at predicting regression? Or does it mean I'm getting worse?
It may seem counterintuitive to suggest that too high of a winning percentage is a bad thing, but consider: if a madman hijacked a bus and told me he'd blow it up unless I kept my predictive accuracy above 90%, I wouldn't need Keanu Reeves or Sandra Bullock to help me keep those passengers alive. Over his last two games, Deebo Samuel is averaging 72.5 rushing yards per game and 13.5 receiving yards per game. I predict that over his next four games he'll average more receiving yards. Bonus prediction: he'll probably average fewer rushing yards, too. Those passengers can sleep easy. (Except for the fact that they'll apparently be stuck on a bus for a month while we wait for the prediction to resolve.)
Some regression predictions are so obvious as to be trivial. If a receiver has a 15-catch day or a 250-yard day, it goes without saying that he'll average fewer catches and yards going forward. No receiver in history has ever averaged 15 catches or 250 yards over any stretch of time. But any prediction of that nature isn't even worth making. If there's no chance of being wrong, you aren't saying anything worth saying.
In the past, I've used trading in dynasty leagues to illustrate why a 100% success rate is actually quite inefficient. Imagine that there are 100 potential trades out there and you have varying levels of certainty of how likely you are to win each one. There's one trade that you're 100% certain you'll win, one trade that you're 99% certain you'll win, and so on until you get down to the worst possible trade, which you estimate you have just a 1% chance of winning.
If you wanted to have a 100% success rate while trading, you could just take that 100% trade and call it a day. But is this best for your team? After all, you could also make the ten trades in the 90-99% range. You'd drop your overall success rate to 95%, but those ten trades are still exceedingly valuable and will make your roster a lot better in expectation.
Similarly, you can then add all the trades in the 80-89% range and all the trades in the 70-79% range. Suddenly you're "only" winning 85% of the trades you make, but again, the expected return on those 70-79% trades is extremely high and you're making your team a lot better by making them.
Eventually, you hit a point where it makes sense to stop trading. If your evaluation is correct, the trades in the 50-59% range are positive expected value for you still, but just barely. If you're not certain about your evaluation, maybe you want to limit yourself to trades that are 60% likely to come out in your favor or better, which would put your overall success rate around 80%.
Of course, things are not this simple in real life. It's impossible to say with any certainty that you have a 60% chance of winning a given trade. And at the same time, trading is not a solitary exercise, and no leaguemate would send you an offer that is 100% likely to go in your favor. But the overall concept stands.
If you're winning 100% of the trades you make in dynasty, you're not trading enough. Even if you're winning 90% or 80%, that indicates that there are further profits to be made by being more aggressive. I don't know what the ideal win rate would be, but somewhere in the ballpark of 60-70% will eventually result in a team that's stacked with few concerns that you're leaving value on the table.
Making predictions isn't a two-sided affair like trading in dynasty, so my ideal accuracy level is higher than that. Over the life of this column, I've hit on about 80% of predictions, which feels like a pretty good target. It's high enough that I know my process is good, but low enough that I feel like I'm still trying to say something interesting. I don't tend to think of predictions in these exact terms (I don't sit down and say "there's a 60% chance I'm right on this"), but I do try to craft predictions where I have genuine doubt about whether they'll go in my favor or not. And most importantly, when I stop getting some wrong, I take it as a sign to examine whether I'm growing too cautious, whether I'm leaving value on the table, whether I've stopped saying anything worth saying.
Normally this isn't the case. Even on my two "best bet" regression predictions (yards per carry and yards per touchdown), I always believe there's a genuine chance that they won't work out when I make them. I've had yards per carry predictions in the past where the prediction won over a 4-week window but would have lost over, say, an 8-week window. I'm positive that if I keep at this long enough, eventually Group A is going to manage to outrush Group B.
I think the real "problem" here (if we can agree that "all of my predictions are coming true" is a problem in the first place) is that everything regresses to the mean, and my performance on predictions is no exception. If every individual prediction has an 80% chance of being right, then if I make eight predictions is extremely likely I'll get at least one of them wrong. But there's still about a 17% chance that I'll get lucky and they'll all come true, anyway.
Looking at the predictions, nothing seems out of line with years past. We needed a miraculous fourth week to salvage our yards per carry prediction. Our interception prediction won, but a theoretical stronger version I had been considering would have lost. Maybe I'll try to be a bit more aggressive in the coming weeks, but overall this seems more like a lucky stretch than a conservative one. The outcome changes year to year but I trust the process. Let my theme song be V.R.E.A.M— Variance Rules Everything Around Me.
Mostly I just wanted to go on record that the goal in this column has never been 100% accuracy and explain why that's the case. And maybe get you thinking about other areas in fantasy football (or even in your life generally) where the goal is not being right 100% of the time, because being right 100% of the time means you're leaving good opportunities on the table. If you start aligning your fantasy team with the principles of regression, sometimes you're going to make trades or start/sit calls that don't work out in your favor. And that's fine. It's better than fine, it's good. Getting calls wrong sometimes is a sign that you're doing things right.
And I wanted to go on the record with that now when my track record is good and there's no question of ulterior motives, because I know some day in the future variance will conspire against me. Some day I'll be sitting at 3-3, and when that day comes I'll be able to point to this column and say "It's okay, these things happen. This, too, shall regress."