Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
The Scorecard
Returning readers, you know how this works by now, but for new readers here's the deal. Every week I take a look at a specific statistic that is prone to regression and identify high and low outliers in that statistic, and then I wave my hands in the air and shout “regression!”
But since predictions aren't any fun without someone holding your feet to the fire afterward, I don't stop there. I lump all of the high outliers into Group A. I lump all of the low outliers into Group B. I verify that Group A is outperforming Group B. And then I predict that Group B will outperform Group A over the next four weeks.
I don't get to pick and choose my groups, beyond being free to pick and choose what statistics are especially prone to regression. If I'm tracking yards per target, and Antonio Brown is one of the high outliers in yards per target, then Antonio Brown goes into Group A and may the fantasy gods show mercy on my predictions.
And then, groups chosen and predictions made, I track my progress. That's this.
In Week 2, I outlined what regression was, what it wasn't, and how it worked. No prediction was made.
In Week 3, I listed running backs with exceptionally high and low yards per carry averages and predicted that the low-ypc cohort would outperform the high-ypc cohort over the next four weeks.
In Week 4, I looked at receivers who were overperforming and underperforming in yards per target and predicted that the underperformers would outperform the overperformers over the next four weeks.
In Week 5, I compared the predictive accuracy of in-season results to the predictive accuracy of preseason ADP. Outside of a general prediction that players would tend to regress in the direction of their preseason ADP, no specific prediction was made.
In Week 6, I looked at quarterbacks who were throwing too many or too few touchdowns given the amount of passing yards they were accumulating, then predicted that the underperformers would score more fantasy points than the overperformers going forward.
In Week 7, I looked at receivers who were catching too many or too few touchdowns based on their yardage total, then predicted that the underperformers would score more fantasy points than the overperformers going forward.
In Week 8, I revisited yards per carry, again predicting that the high-carry, low-ypc group would outrush the low-carry, high-ypc group going forward.
In Week 9, I went back to yard to touchdown ratios, predicting that the low-touchdown group would close the gap substantially with the high-touchdown group going forward.
In Week 10, I discussed the pitfalls of predicting regression over 4-week windows. No specific prediction was made.
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
yards per carry | Group A had 60% more rushing yards per game | Group B has 16% more rushing yards per game | None (Win!) |
yards per target | Group A had 16% more receiving yards per game | Group B has 11% more receiving yards per game | None (Win!) |
passing yards per touchdown | Group A had 13% more fantasy points per game | Group A has 17% more fantasy points per game | None (Loss) |
receiving yards per touchdown | Group A had 28% more fantasy points per game | Group B has 1% more fantasy points per game | None (Win!) |
yards per carry | Group A had 25% more fantasy points per game | Group B has 34% more fantasy points per game | 1 |
rushing yards per touchdown | Group A had 21% more fantasy points per game | Group B has 0% more fantasy points per game | 2 |
Another week, another closed prediction. A 1% advantage for Group B here might not seem like much, but remember how big the lead was coming in. Group A averaged 10.02 points per game over the first six weeks. That fell to 8.67 over the last four. Meanwhile, Group B rose from 7.82 points per game over the first six weeks to 8.72 over the last four.
The most impressive part, for me, is exactly how that gap was closed. Both groups averaged nearly the same number of yards per game in our four-week sample, (61.7 ypg for Group A, 62.0 ypg for Group B). But Group B— the group that couldn't reach the end zone to save its life through six weeks— actually scored more touchdowns over the last four weeks than Group A! If this doesn't convince you that touchdowns are largely random and prime for regression, I'm not sure what will.
The other two outstanding predictions both had strong weeks, as well. Given the small samples involved, (just five players per group, which after byes sometimes means as few as three performances per week), the standings in those predictions tend to be pretty swingy— remember, Group A in yards per touchdown ratio led last week by 147%, but this week Group B has completely reversed that and now leads by 0.4%. (I rounded in the table above.)
With our regression predictions standing with a strong 3-1 record so far, we'll see if they can add another win or two in the coming weeks.
Weighted Coins
Hopefully you'll forgive me for not diving into a prediction for a second consecutive week, but by this point, you've seen all of my favorite tricks. Yard to touchdown ratio, yards per carry, and yards per target... these are the most volatile metrics on the block.
Statistics are often said to be either descriptive or predictive. (Really great statistics can sometimes be both.) Descriptive statistics tell you what happened and why. Predictive statistics tell you what is going to happen and why.
An ideal candidate for regression is any statistic that is strongly descriptive but weakly predictive, and my favorite regression metrics all fall into that bucket. Catching a lot of 60-yard touchdowns tells us a lot about how many points you've scored to date, but very little about how many points you'll score going forward.
My goal with this column isn't just to tell you who is going to regress, it's to equip you with the tools and understanding of how regression operates so you can tell for yourself what kind of production is sustainable and what kind is not. Give a man a burger and you'll feed him for a day; give a man a Five Guys franchise and you'll feed him until his arteries give out.
So since you've already seen my favorite tricks, I want to hammer a bit more on the conceptual side.
I say it every week: regression operates on longer timescales. The prediction that came due this week is the perfect illustration of this. I predicted touchdown regression after six weeks, and for two more weeks that regression failed to materialize. One week after the prediction was made, Group A still led Group B in points per game by 27%. Two weeks later, it had increased that lead to 60%. Group A had nine touchdowns during those weeks, while Group B had just four.
When you identify something that's bound to regress and it doesn't regress, that's pretty discouraging, especially if you've already invested resources into your belief, if you've bought Robert Woods or Marqise Lee or Rishard Matthews or Demaryius Thomas with the anticipation that the tap was about to open and the touchdowns would begin to flow.
But the biggest weapon in the entire arsenal when predicting regression is time.
Seven receivers had at least 300 yards in their team's first seven games without reaching the end zone. 49 total games, 2995 yards, 0 TDs.
— Adam Harstad (@AdamHarstad) November 12, 2017
They've scored 9 touchdowns (and counting) in the 12 player-games since.
Touchdowns follow yards, y'all. pic.twitter.com/J3CQvPfT0I
Make that 10 touchdowns (and counting). https://t.co/vb6t8ScBcg
— Adam Harstad (@AdamHarstad) November 12, 2017
After taking a beating the last couple weeks, Regression to the Mean has been really pouring it on today.
— Adam Harstad (@AdamHarstad) November 13, 2017
Demaryius brings this group up to 11 touchdowns in 13 games, with time to potentially add to that.
— Adam Harstad (@AdamHarstad) November 13, 2017
These dudes just went from a 0-TD-a-year pace to a 13.5-TD-a-year place. #TouchdownsFollowYards https://t.co/vb6t8ScBcg
Think of regression like a weighted coin. If there's a coin that comes up heads 60% of the time, there's still a really good chance on any given flip that it will come up tails. If you're gambling using that coin, you should really want to make as many flips as possible.
“Making more flips” means diversifying your portfolio wherever possible. Some of the individual players in Group B wound up being major disappointments. Pierre Garcon gained 66 yards in two weeks and then was lost for the season. Danny Amendola doesn't even have an injury to excuse the 67 yards he's gained in three games.
But the two lowest-scoring players in Group B through six weeks were Robert Woods and Marqise Lee... but they actually were the first and third highest-scoring receivers in Group B on a per-game basis! (Adam Thielen came in second.)
This is a big reason why I don't pick and choose who goes into my groups. Had I tried to limit myself just to stars who were primed to regress, I'd have been worse off. Antonio Brown, Julio Jones, T.Y. Hilton, Demaryius Thomas, Kelvin Benjamin, and Keenan Allen combined to average 7.98 points per game. Adam Thielen, Pierre Garcon, Robert Woods, Marqise Lee, Rishard Matthews, and Danny Amendola combined for 9.65.
Again, Garcon and Amendola were the two biggest busts in the sample, and Matthews underperformed as well. But if I'd tried to eliminate potential misses, I'd have also weeded out potential hits. So I select the metric for regression, and whoever it tells me is going to regress is who I bet on. The sample is the sample. (If anything, the unintuitive nature of the results is kind of the point.)
The second way to get more flips of this weighted coin, (besides refusing to weed out results that make you uncomfortable), is simply to give it more time. And this is why I'm always harping on how regression operates on longer timescales. Sometimes things go wrong in the short term— as we saw with my passing yardage to touchdown ratio prediction. But just like how over enough flips a weighted coin is going to favor the side it's weighted towards, over a long enough timeline regression is going to be undefeated.