(Originally published on March 20, 2022)
This is part of a series of posts on sabermetrics and the mathematics of baseball. You can find more here.
On May 19, 2021, Corey Kluber of the New York Yankees no-hit the Texas Rangers. This came just a day after Spencer Turnbull and the Tigers no-hit the Seattle Mariners. No hitters are pretty rare, so seeing two in back-to-back days seems unlikely enough, but this was already the sixth no hitter of a young 2021 season. What's more, three separate teams were on the losing end of two no hitters each: Texas, Seattle, and Cleveland. This was an MLB first, and the 2021 season ended with a record 9 no hitters in total.
That got me thinking; how rare should we expect this event to be? More broadly, how likely (or unlikely) is it to see any number of no hitters in a given season, and how many different ways might the winning and losing teams be configured?
I put together a quick heuristic analysis that suggests that the likelihood of this --- exactly six no hitters over the first 638 games of a season, with three teams getting no hit twice each --- is about 0.000000206, or 0.0000206%. To put that infinitesimally small number in perspective, we would expect this to occur about once in every five million stretches of 638 games.
The assumptions and calculations that led to this number are intended to be rather back-of-the-napkin, but nevertheless lead to some interesting mathematics and follow up questions!
To estimate the likelihood of seeing any number of no hitters in a season, we will make the basic assumption that each MLB game has some fixed probability of being a no hitter or not. Let's denote this probability by
Let's start by computing the probability that there are zero no hitters in the season: all 2430 MLB games in a season are played without seeing a single no hitter. Using our assumptions, each game has a
What about the probability of exactly one no hitter in a season? There is a
More generally, we can calculate the probability of seeing exactly
Below is a table showing these probabilities, as well as the probabilities for at most
Number of no hitters, N |
Full Season (G = 2430) | First 638 games | ||
---|---|---|---|---|
Prob(exactly N) | Prob(at most N) | Prob(exactly N) | Prob(at most N) | |
0 | 0.0683 | 0.0683 | 0.494 | 0.494 |
1 | 0.183 | 0.252 | 0.348 | 0.843 |
2 | 0.246 | 0.498 | 0.123 | 0.965 |
3 | 0.220 | 0.718 | 0.0287 | 0.994 |
4 | 0.148 | 0.866 | 0.00504 | 0.9992 |
5 | 0.0791 | 0.945 | 0.000706 | 0.99991 |
6 | 0.0353 | 0.980 | 0.0000823 | 0.999991 |
7 | 0.0135 | 0.994 | 8.21e-5 | 0.9999992 |
8 | 0.00452 | 0.998 | 7.15e-6 | 0.99999994 |
9 | 0.00134 | 0.9995 | 5.53e-7 | 0.999999996 |
This model suggests that we should expect half of seasons to have 2 or fewer no hitters, while half of seasons have 3 or more. So perhaps it shouldn't be surprising that in the 24 seasons since 1998, there have been exactly 12 seasons each of 2 or fewer no hitters and 3 or more no hitters! (Though note that I am including the shortened 2020 season, which I probably shouldn't.) It also predicts between 5 and 6 of those 24 seasons are expected to have exactly 3 no hitters, and indeed there were 6 such seasons.
On the other hand, there have been two seasons with 7 no hitters (2012 and 2015), which seems very unlikely, as well as our unicorn 2021 season, with its 9 total no hitters, 6 of which came in the first 638 games. The probabilities of these two occurrences are quite low, as highlighted above.
Not only were the number of no hitters in the 2021 season unexpected, but so were the distribution of their losers. Never before had three teams been no hit twice each in a season. To model this, let's assume that each MLB team is equally likely to be on the losing end of each no hitter. To be clear, we're sweeping a few things under the rug, but I'll address them later. Here, the key assumption is that the likelihood of each configuration of losing teams is proportional to the number of ways it could occur.
For example, three teams losing two no hitters each will be the configuration denoted (2,2,2). This can happen in (30 choose 3) = 4060 different ways, since this is the number of three team combinations out of the 30 MLB teams. Other configurations are calculated similarly. For instance, the configuration (2,1,1,1,1) represents 1 team losing two no hitters and four other teams losing one each; this can happen in 30(29 choose 4) = 712530 ways, since there are 30 choices for the team that loses two no hitters, then (29 choose 4) ways to choose the four teams to lose one no hitter from the 29 remaining teams. All of the possible configurations for 6 no hitters, the number of ways they can occur, and their respective probabilities are shown below.
Configuration | # of ways | Probability |
---|---|---|
(1,1,1,1,1,1) | 593775 | 0.366 |
(2,1,1,1,1) | 712530 | 0.439 |
(2,2,1,1) | 164430 | 0.101 |
(2,2,2) | 4060 | 0.00250 |
(3,2,1) | 24360 | 0.0150 |
(3,1,1,1) | 109620 | 0.0675 |
(3,3) | 435 | 0.000268 |
(4,2) | 870 | 0.000536 |
(4,1,1) | 12180 | 0.00750 |
(5,1) | 870 | 0.000536 |
(6) | 30 | 0.0000185 |
Total | 1623160 | 1 |
According to this heuristic, there was only a 0.25% chance that three teams lost two no hitters each! Perhaps even more surprisingly, in a season (or stretch of games) in which six no hitters occur, the most likely configuration of losers is (2,1,1,1,1) at 43.9%. The chance of having six different losers, i.e. configuration (1,1,1,1,1,1), is only 36.58%, meaning one might expect at least one repeat loser in six no hitters.
We can repeat these calculations with other numbers of total no hitters. For instance, if there were
For a choice of
Experts might recognize more general phenomena appearing once again here!
Given
It turns out the answer is
We also see partitions popping up! A partition of
So where did that 0.0000206% come from? If we take the probability of 6 no hitters occuring in the first 638 games to be 0.0000823, and the probability that three separate teams were on the losing end of two no hitters each is 0.00250, then multiplying them together gives the probability of both occuring to be 0.000000206. Thus we might expect this to happen about once in every
Looking at the full 2021 season, the probability of exactly 9 no hitters occuring with the losing teams configuration (3,2,2,1,1) is
My first question --- and the first one a friend asked me when I mentioned this calculation --- was: is every outcome expected to be rare? Are we asking for too much by requesting a fixed number of no hitters to occur and for those no hitters to have some given configuration of losing (or winning) teams? If every possibility is unlikely, but one has to happen, then we shouldn't be surprised when seemingly rare events occur!
I can say with some confidence that this is not the case, at least for small values of
Other scenarios with small
As
This whole approach hinges on our heuristic assumptions, namely that each MLB game has a fixed probability of being a no hitter and that for any given no hitter, each team is equally likely to be on the winning or losing end. Both of these are flawed!
Let's first consider the effect of the teams playing. Looking at the 2021 MLB season summary, we see that the Seattle Mariners had the fewest hits in the league, while the LA Dodgers pitching allowed the fewest hits. This leads me to believe that a Dodgers-Mariners matchup might be more likely to end up in the a no hitter than any other! I also expect a team like the Astros, who led the league in hits, would be harder to no hit, or a team like the Orioles, who gave up the most hits, to be less likely to throw a no hitter. Still, our assumptions made these calculations fairly easy and gave results that seem at least somewhat reasonable. Trying to tease out each team's likelihood of being on either end of a no hitter separately sounds like a massive headache.
Even if we could analyze each team separately, I might expect player level effects as well, namely from the starting pitcher. The best pitchers are usually good at preventing hits, which should translate to more no hitters, right? Perhaps equally interesting are catchers, since they appear in many more games than pitchers. Are there catcher metrics that correlate well with the number of no hitters caught over their career?
Even if we believe our assumptions are valid, we're entirely ignoring how the value of