Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes, I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples, (other than choosing which metric to focus on). If the metric I'm focusing on is yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions. On a case-by-case basis, it's easy to find reasons why any given player is going to buck the trend and sustain production. So I constrain myself and remove my ability to rationalize on a case-by-case basis.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of all my predictions from last year and how they fared. Here's a similar list from 2017.
The Scorecard
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I explained why touchdowns follow yards, (but yards don't follow back), and predicted that the players with the fewest touchdowns per yard gained would outscore the players with the most touchdowns per yard gained going forward.
In Week 5, I talked about how preseason expectations still held as much predictive power as performance through four weeks. No specific prediction was made.
In Week 6, I talked about why quarterbacks tended to regress less than other positions but nevertheless predicted that Patrick Mahomes II would somehow manage to get even better and score ten touchdowns over the next four weeks.
In Week 7, I talked about why watching the game and forming opinions about players makes it harder to trust the cold hard numbers when the time comes to put our chips on the table. (I did not recommend against watching football; football is wonderful and should be enjoyed to its fullest.)
In Week 8, I discussed how yard-to-touchdown ratios can be applied to tight ends but the players most likely to regress positively were already the top performers at the position. I made a novel prediction to try to overcome this quandary.
Statistic For Regression
|
Performance Before Prediction
|
Performance Since Prediction
|
Weeks Remaining
|
Yards per Carry
|
Group A had 20% more rushing yards per game
|
Group B has 30% more rushing yards per game
|
None (Success!)
|
Yard:Touchdown Ratio
|
Group A had 23% more points per game
|
Group B has 47% more points per game
|
None (Success!)
|
Mahomes averaged 2.2 touchdowns per game
|
Mahomes averages 2.0 touchdowns per game
|
1 (Failure?)
|
|
Yard:Touchdown Ratios
|
Group B had 76% more point per game
|
Group B has 84% more points per game
|
3
|
Once again, Patrick Mahomes II reminds us that predicting regression for individual players means one injury grinds our prediction to a halt. We knew what we were doing when we made the prediction and we don't regret trying something different, but we'll take the loss. As always, the goal of Regression Alert is never perfection, the goal is for the times we're right to outnumber the times we're wrong by a large enough margin that we can turn a profit.
Our novel tight end prediction had an up-and-down first week. With so many more tight ends in Group A than in Group B the odds of one of them blowing up for a big game are just high enough to keep me nervous. In Week 8, Darren Fells was the complication du jour; he scored two touchdowns and led all tight ends in scoring. But everyone else played their part, with the rest of the Group A tight ends disappointing and the Group B tight ends largely having solid (if not spectacular) days, and the prediction remains reasonably well situated with three weeks to go.
Let's Talk about Yards Per Target
Since I've made my thoughts on yards per carry abundantly clear, you might think I have a similar axe to grind against yards per target. Indeed, I hate the statistic, though for different reasons than yards per carry.
For starters, people think of yards per target as an efficiency stat. Yards, they believe, measure a receiver's production, while targets measure his opportunity, so yards per target is simply a receiver's production per unit of opportunity.
But a target is not the unit of opportunity for receivers. A route is the unit of opportunity. If five receivers run a route on a given play and the quarterback throws to one of them only to have the ball fall incomplete, the player who earned the target probably played the best of those five receivers. He either got the most open, or if nobody got open then he was the guy the quarterback most trusted to make a play despite being covered. When we calculate yards per target, however, an incomplete pass goes down as a negative for the targeted receiver and as nothing at all for the other receivers.
The other major problem with yards per target is that it is very strongly influenced by where the receiver is being targeted. Quarterbacks complete roughly 60% of passes when targeting a receiver 10 yards down the field. Ignoring yards after the catch, 10 yards * 60% completion rate would give us an average of six yards per target. Quarterbacks complete about 45% of passes when targeting a receiver 20 yards down the field. Ignoring yards after the catch again, 10 yards * 45% completion rate would give us an average of nine yards per target.
The deeper down the field quarterbacks are throwing, the more yards per target they average. You see a bit of this variation in yards per attempt (which is just yards per target from the quarterback's perspective), which is slightly biased in favor of deep passers. But ultimately NFL quarterbacks have to throw the ball all over the field to keep the defense honest, so the variation in how deep quarterbacks throw is relatively small.
But wide receivers don't need to run routes all over the field. Instead, you frequently see specialists who are primarily running deep routes, and other specialists who are primarily running short routes, and even if both players are exactly as good the former will average many more yards per target than the latter just because of the structural bias in the statistic.
Indeed, if we know a player's average depth of target (or aDoT) we can predict next year's yards per target average better than if we instead knew this year's yard per target average. (Which is a pretty fantastic proof that yards per target does, in fact, regress to the mean.)
To account for this in the past I have calculated expected yards per target based on a player's yards per reception average (which I've used as a proxy for his aDoT). Theoretically, this has allowed me to make regression predictions for yards per target. In practice, there's one last (pretty big) problem, though.
First, a refresher on the theory behind how regression to the mean works. Every player has a "true production level", some inherent baseline that we'd expect them to perform at over a long timeline. Over a short timeline, however, random factors will cause large deviations from that baseline. If we can isolate the random deviations, we can bet on players returning to their true baseline and make a profit.
Statistics like interception rate and yards per carry work great for this process because the variation in the statistic is almost entirely just random deviations, so they regress pretty strongly. Two years ago, I thought yards per target worked the same way; if you adjusted for target depth most of the variation was just going to be random noise and therefore a strong candidate to regress.
Over the last two years, however, I've learned that target depth is only part of the driver behind differences between yard per target averages. Another very large driver is talent. In other words, the "true production level" at yards per carry is pretty similar for good running backs and for bad running backs. Likewise, the "true yard-to-touchdown ratio" for players is relatively uncorrelated with player quality. You have good players who have high ratios and good players who have low ratios. You have bad players with high ratios and bad players with low ratios.
Because of this, when I compare the best and worst performers in these metrics I inevitably get a mix of good and bad players in Group A and a mix of good and bad players in Group B.
Yards per Target doesn't work like that. Even if I adjust for depth, of the top 50 receivers in the NFL in terms of targets per game, 74% are *still* outperforming expected yards per target. Six receivers are averaging at least 6 targets per game and at least 2 more yards per target than expected: Stefon Diggs, Tyler Lockett, Amari Cooper, Chris Godwin, Michael Thomas, and Davante Adams. Six receivers are averaging at least 6 targets per game and at least 0.4 fewer yards per target than expected: Curtis Samuel, Robby Anderson, Auden Tate, Preston Williams, DeVante Parker, and Mike Williams.
Yeah, that first list is a much better group of receivers. It's not your imagination. Yards per target over expectation strongly correlates with player quality, which means the best players in yards per target over expectation are simply better players than the worst players in yards per target over expectation. So if I make a prediction that pits the best performers against the worst performers I'm predicting guys who probably genuinely do have lower "true performance levels" to outperform guys who genuinely do have higher "true performance levels". Which isn't how regression to the mean works.
I think the three most valuable words a fantasy analyst can say are "I don't know", so that's what I'm going to say here. Yards per target regresses (again, remember, depth of target predicts yards per target better than yards per target predicts itself). But this column is set up to automatically generate a list of underperformers and pit them against a list of overperformers, and I simply don't know how to automatically generate such a list here. The adjustments that I've tried have still left me in a situation where all of the "overperformers" are actually just great players who aren't really overperforming at all.
Based on his yards per reception average, Michael Thomas is currently outperforming "expectations" by 2.56 yards. Last year, he outperformed it by 2.57 yards. Two years ago, he outperformed it by 1.09 yards. As a rookie, he outperformed it by 1.99 yards. Evidence strongly suggests that Michael Thomas isn't outperforming his "true production level" by two yards per target, it suggests his true production level is two yards per target better than average.
Now, I could go through every receiver and estimate what their true production level is by hand and come up with a hand-picked list of guys I think are likely to regress (the answer is Stefon Diggs). But cherry-picking samples runs counter to the ethos here at Regression Alert. It moves this column from "look at how powerful regression to the mean is, it can predict the future with incredible accuracy without any human intervention at all" to "look at how good I am, I can predict the future with incredible accuracy (or more likely not)".
I'm not giving up on the statistic entirely. Maybe someday I'll come up with a satisfactory way to generate a list of overperformers and underperformers that doesn't just give me a list of good players and bad players. At the moment, though, I just wanted to be up-front about how I used to believe something (deviations from expected yards per target were largely noise), and now I know that I was wrong about that.