Who was the better homer hitter: Schmidt (a league-leading 37) or Murphy (who placed with 44)?
Same type of question, thornier scenario. In 1906, HoMEr Nap Lajoie racked up more value than any other player in the AL (10.0 Wins Above Replacement or WAR), and in 1907 his 7.6 Wins led the league again. Which season was better?
Math majors (which I was not one of) will recognize where this is headed—standard deviation. Which sounds like Emile Durkeim territory, but, sorry sociology majors, it’s not. Standard deviation is a powerful concept for comparing historical players against one another. You might picture it like this:
- A league of nothing but studs ‘n scrubs = easiest to dominate = high standard deviation
- A league of complete equals = toughest to dominate = low standard deviation
So first off, let’s dispense with the statistical talk and call high-standard deviation leagues “easy” and low standard deviation leagues “tough”—as in easy to dominate or tough to dominate. And let’s refer to the idea of standard deviation as the “toughness” of a league.
Clearly, position players have relatively little control over one another’s overall performance. So, the toughness of a league is part of a player’s context, just like the size of his home park or the length of the season he plays in. And toughness isn’t the same as the quality of play in the league. Like with our Lajoie example above, a league can be easy one year and tough the next with the same mix players and conditions.
For the purpose of the HoME, I want to strip away contextual illusions to put players on an equal footing so I can better compare them. After doing some very basic research, I’m incorporating an adjustment for toughness into my system. [I’ll save the nuts and bolts for the end of this post so that those who like to peek under the hood of my statistical Yugo can tell me what I’m doing wrong, and those that don’t care can get on with life.]
The following graph shows what the adjustment looks like over time and why it’s important. The historical average is 1.00; any league above 1.00 is a tougher league to dominate and anything below is easier. [As they say, click to embiggen.]
You can spot both large, long-term historical trends in the toughness of the leagues but also a lot of year-to-year bumpiness. For instance the 1920s and 1930s when the NL is clearly tougher for a long time, but where both leagues bounce around a lot.
So what causes toughness to change annually and over time? Here’s a few things that the graph may suggest:
- Length of schedule: Shorter seasons, appear to be easier to dominate (see 1918, 1981, 1994, and any season before 1885), perhaps because fluky good or bad seasons are more likely in a smaller number of games (the Small Sample Size effect).
- Contraction: Contraction makes leagues tougher by cutting out the dead wood, and the effect can linger (see the 1890s).
- Expansion: Different expansions can have different effects. But in the modern game, while they usually lead to an immediately easier league, sometime within a couple years, a circuit starts to toughen up a lot, perhaps because increased competition for scarce talent leads to the opening of alternative markets, flushing the game with more total talent.
- Integration: This one is really interesting. Notice that after Jackie Robinson’s 1947 rookie season, the NL went into a long period of relative easiness that coincided with the introduction of numerous African-American stars into the senior circuit. The AL, much slower to integrate, went into a long period of relative toughness. The quality of play in the NL was surely higher than in the AL, but because the AL mostly drew from the same old talent pool, it got tougher while the gap widened between Jackie, Willie, and Bad Henry on one hand and the white guys trying to keep their jobs on the other. A lot. The situation didn’t equalize until the 1970s.
One factor that’s surprisingly hard to discern from the chart is scoring levels. It’s not immediately clear whether runs/game has an effect on toughness or not.
Does all this numbering around really mean anything to how I’m viewing a HoME candidate, or how we might look at players generally? The answer is…yes, but sometimes a lot more than others.
Here’s two contrasting examples. In 1986 and 1987, Tony Gwynn led the NL in WAR each year with 6.6 and 8.5 respectively. The NL of 1986 was one of the toughest leagues after integration with a 1.13 adjustment factor. The 1987 NL was a little tough at 1.04. That’s not nearly enough to draw them even (they adjust to 7.2 to 8.8), but if the Left-Digit Effect [warning: pay-link but with helpful, free abstract] applies to our perceptions of ballplayers as it does to commodities, we’ve moved the needle a tick or two.
But what about the most extreme example?
The largest one-year jump in toughness occurs between the 1906 and 1907 AL. In 1906 Nap Lajoie led the AL in WAR with exactly ten Wins. In 1907 he led again, at 7.6 this time. When we adjust for the toughness of the leagues, that gap of 2.4 wins nearly disappears with 1906 knocked down to 8.9 WAR and 1907 amped up to 8.4 WAR. My take: our perception that his 1906 season is significantly more impressive than his 1907 is in part an illusion of context due to one of the leagues being way tougher than the other.
In the end, however, we know that players don’t crank out the same season every year. Their own performance varies due to all kinds of reasons (injuries, fatigue, divorce, marriage, fitness or its lack, weather, Bobby Valentine, whatever). But that doesn’t mean that our perceptions of their seasons are always well formed.
Does this change my view on Gwynn or Lajoie a lot? Probably not. It’s when we get down to candidates at the very edges of the HoME, the last thirty players we select, the last two guys at each position, that every last bit or byte of information can make or break a player’s case.
Good thing we’re still a long way off from there.
So how does this toughness adjustment work?
FIGURING THE ADJUSTMENT FACTORS
- For every Major League season, find the standard deviation of Wins Above Average (WAA) per PA for every player with 1 or more PA per scheduled game. For the NA, 1981, and 1994, when teams played different numbers of games, use the median games played.
- Average all these annual STDEVs to find the Major League historical average.
- Divide the Major League historical average STDEV by a given league’s STDEV, and the result is the adjustment factor for that season.
APPLYING THE ADJUSTMENT TO INDIVIDUAL PLAYER SEASONS
This example with Nap Lajoie should make it clear:
YEAR WAA ADJ adjWAA DIFF WAR adjWAR --------------------------------------------- 1906 7.6 * 0.86 = 6.5 -1.1 + 10.0 = 8.9 1907 5.5 * 1.20 = 6.3 +0.8 + 7.6 = 8.4
WAYS AND MEANS AND MEDIANS
- Why use WAA instead of WAR? WAR is WAA (which includes batting, field, running, positional adjustment, etc.) plus the wins a player generates by being better than a freely available replacement. Replacement is a constant that is multiplied by playing time. Therefore, adjusting WAR would mean you are not conserving the relationship between the player and an average player. For example: an average player (2.0 WAR in full-time play) in the 1986 NL would become an above-average player (2.26 WAR) if you multiplied the toughness factor by WAR. By multiplying by WAA, he’s still a 2.0 WAR player, which he should be because all of his value comes from being above replacement. To adjust for schedule length or dole out catching bonuses, both of which are really opportunity adjustments, we can multiply WAR because the value above replacement is a function of playing time.
- Why use WAA/PA instead of just WAA? WAA does have a playing-time aspect to it, but unlike a player’s value above replacement, it is dependent on individual performance. So, shorter-season schedules lead to smaller WAA totals. Including those shorter-season toughnesses would skew the historical toughness average too low. By using a rate state instead, all leagues can be included.
- But isn’t the relationship between runs and offensive events in the first 20 or so years of baseball really muddy because there were so many errors and crazy base running? I’d like to pretend that question doesn’t exist…. Until we have play by play data for those early years, we won’t know exactly what that relationship looks like. But then again, no one else knows this stuff either. Sometimes you go to WAR with the army you have.
- Why use a cut-off of 1 PA per scheduled game? In a graph of the 1930 season that a friend showed me (hat tip, Brett), as PAs declined, variance went crazy. Yet most of a league’s PAs are taken by regulars and semi-regulars. Also, I don’t care how tough it is for the scrubs, only how tough it is for the guys that matter to the HoME. Any cutoff is arbitrary, so as I looked at the graph, I picked this one since it looked right and was easy to figure. How’s that for scientific!
- What is the historical Major League average for league toughness? The average is .0039 rounded. The easiest league is .0055 (1872 NA), the toughest is .00305 (1908 AL).
- Is this for pitchers, too? No, this is only for hitters. I haven’t studied pitchers at all, and I have no reason to believe that there’s a duplicate or even an inverse relationship between toughness for them compared to hitters.
- Would you stand by this work in a peer-reviewed setting? Absolutely not. I’m not a trained statistician, but I used excel, so at least the calculations are likely to be correct.