Over the years, I’ve written several times about unintuitive findings and the need to constantly evaluate received wisdom on its own merits. One advantage is that this has helped build a reputation as a contrarian with an interest in crazy ideas, which in turn has helped get me in touch with like-minded individuals.
A great example of that at work came this past August, when a man named Mike Filicicchia from the Analytics Fantasy League on MFL tweeted the following to me:
@AdamHarstad My research shows that at every position, players score worse when they start than when they’re benched for fantasy.
— Mike Filicicchia (@mikefili) August 22, 2015
Now this had my attention. If true, it carried pretty dramatic implications
What sort of implications?
For a long time in fantasy football, there have been proponents of a concept called “streaming” certain positions. The idea was that instead of having a set starter at the position, an owner would instead cycle through several players on their roster or on the street. Instead of seeking a top player, the owner would seek out lesser players with top matchups.
The appeal of the strategy is obvious. It is trivially easy to demonstrate that players score more points in “good matchups”, (i.e. against bad defenses), than they do in “bad matchups”, (i.e. against good defenses). In fact, that claim is tautologically true based on how we define good and bad matchups. Good matchups are games against teams that give up a lot of fantasy points. Bad matchups are games against teams that give up very few fantasy points.
Consider Tom Brady, for instance. In many scoring formats, he is the #1 fantasy quarterback of 2015 so far. He has thrown, through 13 weeks, for 4138 yards, 38 touchdowns, and just 6 interceptions.
Now consider the New Orleans’ defense. Their defense has been atrocious, on pace to set all-time records for futility. Through 2015, their opponents have combined to pass for 3624 yards, 36 touchdowns, and just 6 interceptions.
If a fantasy owner simply played the quarterback facing New Orleans every week, he would get production nearly equal to that of the #1 overall quarterback. That’s the power of a favorable matchup.
In fact, proponents of streaming will argue, with proper matchup selection, an owner can take a pair of QB2-level players and combine them for QB1-level production. Just take the good games from one quarterback, pair them with the good games from the other quarterback, and streamers gain top-tier production on a shoestring budget.
While it’s easy to demonstrate the power of matchups, proponents of streaming have, to my knowledge, never demonstrate one key, crucial aspect of the formula; our ability to predict and benefit from those matchups in advance. That’s the key on which the entire theory rests. Instead, we have a theory that was so self-evidently true we never tested whether it was actually true.
After all, it’s one thing to say with the benefit of hindsight that a team should start quarterbacks against the New Orleans Saints. But if we did not recognize that New Orleans was a defense to pick on until the season was half over, we would have missed out on much of the benefits.
That’s why Mike’s tweet was so interesting to me. If hindsight is 20/20, what is foresight?
Putting it to the test
I still had the set of leagues that I used to calculate real-world positional baselines, so I quickly pulled them up and started checking the claim. It held up; for nearly every position in nearly every league, players averaged fewer points in games they started than they did in games they were benched.
The implications of this were pretty big. Big enough that I wanted to make absolutely sure of the finding before going forward with it. After all, perhaps most of the owners in those leagues simply weren’t good at playing matchups. Or aware that playing matchups was a viable option.
I immediately decided I would focus in on quarterback and defense in particular; those are the two positions that are most impacted by matchups, those are the two positions with the largest available pool of potential starters available on waivers, and as a result, those are the two positions most likely to be streamed.
With that decided, what I needed were some leagues with verifiably skilled owners with extensive experience. Owners who were demonstrably more willing than the market at large to view streaming matchups as a viable option. Owners equipped with the best analysis and tools in the industry. And I needed the leagues to be highly competitive, with enough at stake to keep everyone trying their best to the bitter end.
In short, I needed FESL.
What is FESL?
FESL stands for Footballguys Expert Staff Leagues, and they are exactly what they sound like; they’re the place where Footballguys staffers compete against each other. FESL is a high-stakes league, perhaps the highest; at stake is glory, immortality, and the eternal knowledge that you pitted your skills against the best of the best and walked away victorious. (There may also be a bit of money involved, but it’s really all about the glory.)
FESL this year featured 42 total participants. Those participants were spread across four leagues, each named after a legendary coach. FESL Walsh featured 12 teams, while FESL Lombardi, FESL Shula, and FESL Belichick were all 10-team leagues.
With 42 participants across four leagues, FESL offered the perfect testing ground for this theory; all staff members were equipped with all of Footballguys tools and weekly projections. Further, FESL leagues are notorious for waiting at quarterback and defense, the two most-commonly-streamed positions.
How notorious? Well, consider: according to MFL, Andrew Luck’s ADP in PPR leagues with an August start date was 7th overall. Aaron Rodgers’ was 13th overall. In FESL, Andrew Luck came off the board on average with pick 20. Aaron Rodgers lasted to pick 34.
That’s right, a consensus mid-1st and early-2nd pick wound up going in the late 2nd and late 3rd. Luck went as high as 16th and as low as 25th. Rodgers went as high as 22nd and as low as 48th. This, despite a QB-friendly scoring system, and despite three of the FESL leagues being 10-teamers, which tend to make quarterbacks relatively more valuable.
Defense likewise was clearly devalued over consensus in the initial drafts. For starters, the 42 owners picked only 57 total defenses, meaning just a third of owners exited the draft with two units, (and an average of 18 defenses were available on waivers in each league).
Likewise, each defense was drafted much later than ADP. Seattle, the consensus top defense, had an ADP of 88th overall this last August. They were the top defense selected in each FESL league, but with an ADP of 115. By ADP, there were four defenses selected within the first 120 picks per league. In FESL, there were only three defenses selected in the first 120 picks in all four leagues combined.
And again, remember that this happened despite three of the four FESL leagues featuring just 10 teams. In a 12-team league, pick #120 represents the end of the 10th round. In a 10-team league, it’s the end of the 12th round, meaning each individual owner was drafting even more skill players before finally grabbing a defense.
So overall, we had a large sample of 42 talented, experienced owners who demonstrably devalue the quarterback and defensive positions. These owners were highly motivated and seeking out every edge they could find, and they were equipped with the best tools available to evaluate matchups.
In short, if anyone could successfully stream defenses and quarterbacks, it would be this crew.
Measuring success
I’d selected my leagues to study, and I’d established the positions I would focus on. Next, I decided to limit my study only to the first thirteen weeks of the season to avoid capturing eliminated teams who ceased setting lineups.
Next, I needed some way to measure what “successful” quarterback and defense streaming looked like. I settled on the following: I would take a player’s average across all weeks as his “baseline”. I would then look exclusively at his average in the weeks that his owner started him, and I would subtract from that the player’s overall baseline.
This would give me how much a player outperformed or underperformed expectations in weeks he was started. I could then multiply this by the total number of weeks he was started to figure out the raw total underperformance or overperformance value.
Let’s give an example. In my FESL league, I spent most of the year using a combination of Drew Brees and Blake Bortles at quarterback. Across every one of his games for the season, Drew Brees averages 28.382 fantasy points per game in FESL scoring. Blake Bortles averages 26.675 points per game.
I started Brees nine times, and in those nine games he averaged 27.439 points per game. Based on his season-long averages, I’d expect nine starts to net me 255.44 points. Instead, my nine starts netted me 246.95. As a result, playing matchups with Drew Brees cost me 8.49 points over the full season.
On the other hand, I started Blake Bortles twice. In those two games, he averaged 30.00 points. Based on his per-season average, I’d expect Blake Bortles to have produced 53.35 fantasy points in two games. Instead, I got 60 fantasy points in two games. The net result was an additional 6.65 fantasy points through judicious use of matchups for Bortles, almost enough to offset my inefficiency in starting Brees.
Unfortunately, the story doesn’t end there; as you can see, that only adds up to 11 starts. In the other two starts of the season, I used Joe Flacco and Colin Kaepernick. Flacco averaged 22.685 points per game this year, while Kaepernick averaged 16.483. In the games I started them, however, I received just 12.45 and 5.95 points. In those two games alone, I lost 20.768 fantasy points compared to expectations.
In total, my starting quarterbacks on the season scored 22.6 points fewer than I would have expected just based on their total season-long averages. That’s an average of 1.74 points per game lost due to my own inefficiency at playing matchups.
Expanding the scope
But that’s just one owner at one position; as I mentioned, there were 42 owners overall. How did the entire sample do using this methodology? The answer is fairly bleak.
In FESL Walsh the league as a whole lost 54.3 points while playing matchups at quarterback, for an average of 0.35 points per game. In FESL Lombardi, the loss was 138.7 points, or 1.07 points per game. In FESL Shula, the loss was 72.81 points, or 0.56 points per game. FESL Belichick came the closest to breaking even; the loss there was a paltry 0.6 points, or 0.005 points per game.
Using a similar methodology for defense, FESL Walsh lost 94.93 points to matchups, an average of 0.61 per game. FESL Lombardi lost 44.43 points, or 0.34 per game. FESL Shula lost 28.43 points, or 0.22 points per game. FESL Belichick lost 135.18 points, or 1.06 points per game.
All four leagues underperformed at both quarterback and defense on the season. Adding together all of those underperformances, the leagues combined to lose 569.41 points, or more than a full point per game.
Dissecting the results
Does this mean an owner can’t get better results by playing matchups? It does not; anecdotally, it’s possible to get production that is better than the sum of its parts.
James Brimacombe in FESL Belichick, for instance, started a combination of Cam Newton, Eli Manning, and Colin Kaepernick, and outperformed season-long averages with all three. In total, his quarterbacks scored 29.77 points more than would be expected based on their per-game averages. (Though it should be noted that it only outperformed starting Cam Newton every week except his bye by a much smaller 4.3 point margin.)
But Brimacombe was the exception and not the rule. Overall, across all leagues, owners were more likely to sit players in their good games and start them in their bad games.
In other words, the idea that playing matchups leads to better production doesn’t hold up. Starting 2nd-tier quarterbacks and defenses in favorable matchups is a phenomenal theory, but in the real world it runs face-first into the fact that we simply don’t know ahead of time who are the 2nd-tier quarterbacks and defenses, and which are the favorable matchups.
We don’t find that information out until we have the benefit of hindsight. After New Orleans gives up a lot of points to quarterbacks we can note that New Orleans is a defense that gives up a lot of points to quarterbacks, for all the good it does us at that point.
When conducting postmortems on seasons, we can easily identify the combinations of players and matchups that would have enabled us to dominate our league on a budget. But the fact that we didn’t play that perfect combination last year should be our first clue that we’ll be unlikely to do so next year, either.
As such, streaming players even at the most matchup-dependent positions should likely be viewed as an option of last resort. If you are desperately hurting at a position, seeking out matchups can be seen as a way to try to stop the bleeding.
But when it gets right down to it, there’s simply no substitute for getting a good player and playing him every week.