For those who are new to the feature, here's the deal: every week, I break down a topic related to regression to the mean. Some weeks, I'll explain what it is, how it works, why you hear so much about it, and how you can harness its power for yourself. In other weeks, I'll give practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Justin Jefferson is one of the top performers in my sample, then Justin Jefferson goes into Group A, and may the fantasy gods show mercy on my predictions.
And then because predictions are meaningless without accountability, I track and report my results. Here's last year's season-ending recap, which covered the outcome of every prediction made in our seven-year history, giving our top-line record (41-13, a 76% hit rate) and lessons learned along the way.
Our Year to Date
Sometimes, I use this column to explain the concept of regression to the mean. In Week 2, I discussed what it is and what this column's primary goals would be. In Week 3, I explained how we could use regression to predict changes in future performance-- who would improve, who would decline-- without knowing anything about the players themselves. In Week 7, I explained why large samples are our biggest asset when attempting to benefit from regression. In Week 9, I gave a quick trick for evaluating whether unfamiliar statistics are likely stable or unstable.
Sometimes, I point out broad trends. In Week 5, I shared twelve years worth of data demonstrating that preseason ADP held as much predictive power as performance to date through the first four weeks of the season.
Other times, I use this column to make specific predictions. In Week 4, I explained that touchdowns tend to follow yards and predicted that the players with the highest yard-to-touchdown ratios would begin outscoring the players with the lowest. In Week 6, I explained that yards per carry was a step away from a random number generator and predicted the players with the lowest averages would outrush those with the highest going forward. In Week 8, I broke down how teams with unusual home/road splits usually performed going forward and predicted the Cowboys would be better at home than on the road for the rest of the season.
The Scorecard
Statistic Being Tracked | Performance Before Prediction | Performance Since Prediction | Weeks Remaining |
---|---|---|---|
Yard-to-TD Ratio | Group A averaged 17% more PPG | Group B averages 10% more PPG | None (Win!) |
Yards per carry | Group A averaged 22% more yards per game | Group B averages 38% more yards per game | None (Win!) |
Cowboys Point Differential | Cowboys were 90 points better on the road than at home | Cowboys are 12 points better at home than on the road* | 8 |
As they so often do, our "low-ypc" group not only outgained our "high-ypc" group, they averaged more yards per carry while doing so; Group A fell from 5.72 ypc before the prediction to 4.75 after, while Group B rose from 3.47 to 4.84. Amusingly, Group B averaged 5.66 ypc in even-numbered weeks and 3.97 ypc in odd-numbered weeks. Yards per carry, as I often say, is just a step away from a random number generator.
Group A saw tremendous performances from Saquon Barkley and Derrick Henry, so you might think that a few terrible performances elsewhere dragged the entire group down, but that's not the case. In fact, you could remove the bottom 50% of the sample (the worst 7 out of the 14 backs) and Group B still would have outgained Group A (even though removing the bottom 50% would have increased Group A's lead at the start from 22% to 38%.)
As for the Cowboys, they lost their second road game of the year and still haven't played at home. An injury to quarterback Dak Prescott means things could get quite interesting for the prediction down the stretch. We'll have to see how it plays out, though.
(Most) Quarterback Stats Don't Regress As Much
When considering regression, I find it useful to think of production as a result of a combination of both skill (or factors innate to the player) and luck (or factors outside the player's direct control). Statistics that are more luck than skill tend to regress quicker and more significantly than statistics that are more skill than luck.
I've linked to research by Danny Tuccitto where he found that running backs needed 1,978 carries before their ypc average reached a point where it represented more skill than luck. Using the same methodology, Danny has also found that a quarterback's yards per attempt average (or YPA) stabilized in just 396 attempts. For a running back, this represents eight years of 250-carry seasons. For a quarterback, that's less than a season's worth of attempts (only one team -- the 2022 Chicago Bears -- has finished with fewer than 400 pass attempts since 2010).
As a result, you're not going to see me predicting regression very often for quarterback stats like yards per attempt. (In fact, the last time I did so was in 2020 when I predicted-- successfully -- that yards per attempt wouldn't regress.)
But there is one quarterback statistic that I love to badmouth, one that is terrible, horrible, no good, very bad. That statistic is interception rate.
Why Does Interception Rate Regress So Much?
The statistics that regress more strongly are those that are the result of a relatively higher level of luck than skill. Interception rate does have a skill component. Leaguewide, quarterbacks threw an interception on 2.3% of their attempts last year. For his career, Aaron Rodgers only throws an interception on 1.4% of his attempts, the best rate in history. Jameis Winston, on the other hand, throws one every 3.4% of attempts. Over 600 pass attempts, that's the difference between 8 interceptions (Rodgers), 14 interceptions (league average), or 20 interceptions (Winston).
We know that's a real difference because the samples involved are so big. Rodgers has thrown nearly 8000 career pass attempts. Winston has attempted nearly 3000 passes. The league as a whole attempted more than 18,000 passes last year. These are all significantly greater than the 1,681 attempts that Tuccitto calculated were required for interception rate to stabilize.
But that 1,681 attempt threshold is a lot closer to a running back's 1,978 carry requirement than the 396 attempts necessary for yards per attempt. Why is this?
First: Interceptions Are Heavily Influenced By the Situation
Remember how the league average interception rate last year was 2.3%? On plays where a team was trailing by two scores or more (9 or more points), that rose to 2.6%. When playing with a 2-score lead, that fell to 2.0%. These might not seem like huge differences, but over a 600-attempt season, that's an extra 3 or 4 interceptions.
To some extent, this is selection bias. Consider this 2016 game between the Patriots and the Jets. Quarterbacks threw 0 INTs on 23 attempts with the lead vs. 3 INTs on 24 attempts while trailing. But the Patriots won that game 41-3; all of the attempts with a lead came from Tom Brady, while all of the attempts while trailing belonged to either Bryce Petty or Ryan Fitzpatrick. And it's no surprise that Brady threw fewer interceptions.
To the extent that good quarterbacks spend more time with the lead and good quarterbacks throw fewer interceptions, we should expect quarterbacks with the lead to throw fewer interceptions. But even when you control for the quarterback, the effect persists.
Looking just at Tom Brady, he threw 5771 attempts with the lead in his career and was intercepted on just 1.7% of them. He threw 4373 attempts while trailing and was interception on 2.1% of them. Every other quarterback will show a similar pattern.
And this is good. When a team trails, especially when it trails big or trails late, it needs to take bigger risks to get back into the game. Taking bigger risks will lead to more interceptions, but it will also maximize your chances of a comeback. On the other hand, when a team is ahead, it wants to take fewer risks to make a comeback as difficult as possible for the other team. Some of the best quarterbacks see some of the biggest differences in their interception rate while leading vs. trailing simply because the best quarterbacks tend to be really good at calibrating their risk/reward decision-making to the needs of the moment.
But remember, "luck" is "factors outside of the player's control", and quarterbacks don't have a ton of control over whether their team is leading or trailing at any given point. It depends a lot on how good the opponent is, how well the defense is playing, if special teams are holding up their end of the bargain, etc. So we should expect the situations a quarterback plays in to vary significantly from one sample to the next, and that should impact their expected interception rate.
Continue reading this content with a PRO subscription.
"Footballguys is the best premium
fantasy football
only site on the planet."
Matthew Berry, NBC Sports EDGE