Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A, and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2019 and their final results, here's the list from 2018, and here's the list from 2017.
THE SCORECARD
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I talked about how the ability to convert yards into touchdowns was most certainly a skill, but it was a skill that operated within a fairly narrow and clearly-defined range, and any values outside of that range were probably just random noise and therefore due to regress. I predicted that high-yardage, low-touchdown receivers would outscore low-yardage, high-touchdown receivers going forward.
In Week 5, I talked about how historical patterns suggested we had just reached the informational tipping point, the time when performance to this point in the season carried as much predictive power as ADP. In general, I predicted that players whose early performance differed substantially from their ADP would tend to move toward a point between their early performance and their draft position, but no specific prediction was made.
In Week 6, I talked about simple ways to tell whether a statistic was especially likely to regress or not. No specific prediction was made.
In Week 7, I speculated that kickers were people, too, and lamented the fact that I'd never discussed them in this column before. To remedy that, I identified teams that were scoring "too many" field goals relative to touchdowns and "too many" touchdowns relative to field goals and predicted that scoring mix would regress and kickers from the latter teams would outperform kickers from the former going forward.
In Week 8, I noted that more-granular measures of performance tended to be more stable than less-granular measures and predicted that teams with a great point differential would win more games going forward than teams with an identical record, but substantially worse point differential.
In Week 9, I talked about the interesting role regression to the mean plays in dynasty, where the mere fact that a player is likely to regress sends signals that that player is probably quite good and worth rostering long-term, anyway. No specific prediction was made.
In Week 10, I explained why Group B's lead in these predictions tended to get smaller the longer each prediction ran and showed how a small edge over a huge sample could easily be more impressive than a huge edge over a small sample. No specific prediction was made.
In Week 11, I wrote that yards per pass attempt was an example of a statistic that was significantly less prone to regression, and for the first time I bet against it regressing.
In Week 12, I talked about "on pace" stats and how many of the players who wound up setting records were not the players who were "on pace" to do so.
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
Yards per Carry | Group A had 3% more rushing yards per game | Group B has 36% more rushing yards per game | Success! |
Yard to Touchdown Ratio | Group A averaged 2% more fantasy points per game | Group B averages 40% more fantasy points per game | Success! |
TD to FG ratio | Group A averaged 20% more points per game | Group B averages 36% more points per game | Success! |
Wins vs. Points | Both groups had an identical win% | Group B has a 4% higher win% | Failure |
Yards per Attempt | Group B had 14% more yards per game | Group B has 24% more yards per game | 2 |
Usage stats (pass attempts per game, targets per game, carries per game) are the most stable player stats we have from sample to sample. Our quarterbacks have demonstrated that beautifully so far. At the time of the prediction, Group A had averaged 11% more pass attempts per game. Since the prediction, they've averaged... 14% more pass attempts per game. In Week 11 they averaged 40.8 pass attempts and in Week 12 they averaged 41.0. Meanwhile, Group B averaged 35.6 and 35.8 pass attempts per game over those two weeks. Player usage has proven remarkably consistent.
But Group B's yards per pass attempt advantage has likewise remained remarkably consistent. Four out of five Group A quarterbacks are averaging under 7 yards per attempt since the prediction, and the lone exception (Daniel Jones at 7.88) has only played one game so far. Meanwhile, all five Group B quarterbacks are above 7 yards per attempt. As a result, Group B's yardage advantage hasn't just sustained itself, it's grown.
Do Players Get Hot?
It's widely acknowledged that succeeding in the fantasy playoffs is largely about securing players who all "get hot" at the right time. But is "getting hot" a real, predictable phenomenon? Certainly some players outscore other players in any given sample, but any time performance is randomly distributed you'd expect clusters of good games or clusters of bad games to occur by chance alone.
If a player has been putting up better games recently, does that indicate that he's "heating up" and will likely sustain that performance going forward? Or does it just mean that he just happened to string together a couple good games, but you'd expect he'd be no more likely to do that again? The fantasy community often believes the former, but I'll venture that the truth is much closer to the latter.
Indeed, looking at how a player has performed over the last three, four, or five games is almost always worse than looking at how he's performed over the last nine, ten, or eleven games. As I keep saying around here, large samples are more predictable than small samples. Ignoring half or more of a player's games doesn't give you a better idea of how well that player will perform in the near future; it gives you a worse idea.
So let's put this to the test. By my reckoning, there are 20 players who are averaging at least 25% more points per game over their last four games than they are averaging over the season as a whole. (At least, there are 20 such players after you strip out the guys whose performance level was so low that 25% isn't a very impressive difference. I'm looking only at players who are averaging at least 10 points per game over their last four games in PPR scoring.) These 20 players are: Demarcus Robinson, Wayne Gallman, Breshad Perriman, Jakobi Myers, Tyreek Hill, Duke Johnson Jr, Boston Scott, Kalen Ballage, Willie Snead, Marquez Valdes-Scantling, T.Y. Hilton, Diontae Johnson, Damiere Byrd, Curtis Samuel, J.D. McKissic, Rex Burkhead, Evan Engram, Robert Woods, Marvin Jones, and Kirk Cousins.
These 20 players combine to average 221.3 points per game over the course of the whole season, but that rises to 310.9 points per game over their last four games. Are they getting hot just in time for the playoffs, or is this just randomly-distributed performances getting distributed randomly? In other words, are they more likely to score closer to 311 points per game over the next four weeks, or closer to 221?
I will wager that not only are they likely to score closer to 221, their average over the next four weeks will be at least twice as close to their full-season average as it is to their last-four-games average. To reduce any unnecessary wonkiness, I'll exclude any player who doesn't play at least three games in the next four weeks (so that if, for example, Cousins gets hurt on the first play of the game next week I don't get the benefit of counting his average as 0 points per game).
If every player meets the 3-game minimum, this means the group must score below 251 total points per game for me to register a win. Will they? We'll have to check back in four weeks to find out.