Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Cooper Kupp is one of the top performers in my sample, then Cooper Kupp goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. At the end of last season, I provided a recap of the first half-decade of Regression Alert's predictions. The executive summary is we have a 32-7 lifetime record, which is an 82% success rate.
If you want even more details here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017.
The Scorecard
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
STATISTIC FOR REGRESSION | PERFORMANCE BEFORE PREDICTION | PERFORMANCE SINCE PREDICTION | WEEKS REMAINING |
---|---|---|---|
Yards per Carry | Group A had 10% more rushing yards per game | Group B has 16% more rushing yards per game | 3 |
When I made last week's prediction, our "high-YPC" group was averaging 6.41 yards per carry and our "low-YPC" group was averaging 3.81 yards per carry. As a point of comparison, league average among RBs is 4.38. In our first week, the "high-YPC" running backs averaged 4.23 yards per carry and the "low-YPC" running backs averaged 4.44.
Is this the result of a lone outlier performance? Quite the opposite-- the "high-YPC" group is the one with a lone outlier dragging their average up. Cordarrelle Patterson had 17 rushes for 141 yards for the high-ypc cohort, an average of 8.29 yards per carry; no one else in the group topped 4.5. Meanwhile, half of the "low-ypc" backs topped 4.5, with a median value for the group of 4.56 compared to 3.80 for Group A.
Given that Group B backs were higher-volume to begin with, the moment Group A's yard per carry advantage disappeared it took their rushing yardage advantage with it. Group B has outgained Group A by 16% through one week, though there's a lot of time left in the prediction.
PLAYING THE HITS
If you go see Lynyrd Skynyrd live, you know they're playing Sweet Home Alabama and Freebird. The Stones are going to play (I Can't Get No) Satisfaction. KISS is going to play Rock and Roll All Nite and Detroit Rock City, and of course, Ozzy is eventually going to get around to Crazy Train.
Similarly, Regression Alert loves delving into the back catalog for obscure stats and deep cuts from time to time, but we know where our bread is buttered and we aren't shy about serving up the hits, either. Last week we played our old classic "Yards Per Carry is Pseudoscience". This week we have our seminal work "Touchdowns Follow Yards (But Yards Don't Follow Back)". Next week we're going to really drive the crowd nuts with our smash "Revisiting Preseason Expectations". But that's getting ahead of ourselves.
First, let's talk about touchdowns. Actually, before we talk about touchdowns, let's talk about vocabulary.
sto·chas·tic
adjective
randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.
Touchdowns are stochastic. Over his career, Cam Newton rushed for 70 touchdowns in 140 games, an average of 0.5 touchdowns per game. We could say that's his "true production level", and over a sufficiently long timeline, we'd probably expect him to conform to that, averaging 0.5 touchdowns per game.
Despite that being his true production level, though, guess how many times Cam Newton rushed for half a touchdown in a game? As far as I can tell (and I have researched this topic extensively), it has never happened. Instead, he either scores zero touchdowns... or he scores one touchdown. (Sometimes he scores two touchdowns, and once he even rushed for three touchdowns.) Because they are binary outcomes, we can analyze Cam Newton's rushing touchdowns statistically, but we cannot predict them precisely.
Yards don't really behave like that. Over his career, Cam Newton averaged 38.6 rushing yards per game. But it's not like every week he's either getting you 0 yards or else he's getting you 75 yards. Instead, more games than not, he's getting you somewhere between 20 and 60 yards. His yardage total is much more consistent from game to game than his touchdown total.
One way to measure consistency is something called standard deviation, which measures how much something varies around the average. The standard deviation of Newton's rushing yardage is 24.5 yards. The standard deviation of Newton's rushing touchdowns is 0.65 touchdowns.
Now, these numbers are not directly comparable. Standard deviations for large values are naturally bigger than standard deviations for small values. (Consider: if you switched to "feet rushing per game" rather than "yards rushing per game", the standard deviation would triple despite the underlying game-to-game variation remaining unchanged.)
But if you divide a player's standard deviation by that player's average, you get something called the coefficient of variation, or CV. CV is a way to compare how volatile different statistics are. The CV of Newton's yards is 64%, meaning it tends to vary by about 64% of his overall average. The CV of Newton's touchdowns is 130%. Touchdowns are much more random from week to week than yards are— in Newton's case, about twice as random according to CV. (For those curious, the CV of Newton's rush attempts was 42%; "usage" stats like attempts tend to be more stable from week to week even than yards.)
Not only are they more unstable, but touchdowns are also much more valuable than yards. In most scoring systems, one extra touchdown is worth the equivalent of 60 extra yards. Which means if Newton caught the high side of variance and scored a few extra touchdowns early in the year, it could dramatically inflate his fantasy production to date. And if he caught the low side of variance and failed to reach the end zone, it could leave him far lower than we'd otherwise expect.
Continue reading this content with a PRO subscription.
"Footballguys is the best premium
fantasy football
only site on the planet."
Matthew Berry, NBC Sports EDGE