you're reading...
Negro Leagues

Negro Leaguers and Standard Deviation, Part I

A lonnnng time ago now, we presented findings about how standard deviation may color our perceptions of any given MLB season. The rough answer is that for whatever reason, in some years performance is bunched closely together so that the highest WAR total in the land is under 7.0, and in other seasons, it’s practically the wild west, and we see players racking up WAR at every integer between -2 and 12.

I created a seasonal adjustment factor to compensate for this phenomenon, which I use in my home-cooked WAR. As I’ve rolled out a few articles recently about the Negro Leagues, I’ve begun to wonder about the effect standard deviation might have on blackball players.

There are several indicators that suggest player performance varied more widely in the Negro Leagues than in MLB:

  • Leagues came and went with all the bumpiness that accompanies startups and expansions
  • Especially before the late 1930s, teams came and went during the season as well as between seasons
  • Players in some cases jumped from team to team or league to league
  • The best Negro League teams were further apart from the worst than their counterparts in the majors
  • The Negro Leagues had many teenage players and 40+ players
  • Field conditions were likely worse than on MLB diamonds
  • In some seasons, for example, 1938 through 1948, international leagues signed away large numbers of players who then required replacement
  • Negro League teams typically had much shorter rosters than big league teams
  • The Negro Leagues played much shorter schedules, which means some jumpier stats didn’t have time to stabilize as they would in the 154-game slate
  • Negro League talent procurement and development was likely not as systematized and routinized as white organized baseball.

That’s a lot of indicators that variance among players, between leagues, and between seasons might have swung wider than the majors. Further clouding the picture is the sheer number of leagues we’re talking about. To properly evaluate Negro League players, we’d need to know not only about the Negro Leagues themselves, but also about various Caribbean leagues (winter and summer), the Mexican League, and, for Integration-era players, the minor leagues as well as certain independent leagues.

So I, your trusty servant, decided to look into things, and I pulled out my trusty spreadsheets, opened BBREF and the Negro League Database, and got to work.

The Method

For now, I’ve only worked up hitting stats. To keep this reasonably simple, here’s what I did:

  • Calculate the RC/27 for each player in the league that either BBREF or the Negro Leagues Database indicates qualified for the batting title. I use Bill James’ version from Win Shares because it accounts for the effect of an individual player on his lineup, and because it’s a little easier to deal with than Base Runs.
  • Find the STDEV of RC/27 among these players.
  • Compare the STDEVs found in step 2 to the MLB-wide STDEV for the corresponding season.
  • Adjust the quotient in step 3 such that the difference between average (1.0) and step three is reduced by half.

The result is a STDEV factor.

A note of caution. Many leagues, including MLB, did not tally some or all among caught stealing, GIDP, intentional walks, HPB, SF, strikeouts, and even walks in various seasons. We’ve avoided calculations that don’t involve walks, and we’ve worked around the lack of caught stealing by assuming that hitters will be caught stealing 80% as often as they are successful. That’s a 55% success rate, approximately the MLB average for most of the time span we’re dealing with. In some cases, if too little information exists, we haven’t included the season in our researches.

The Negro Leagues

Let’s start with the Negro Leagues themselves. That term refers to a collection of at least 8 different loose affiliations and actual organizations. The Negro Leagues Database does not yet have complete information for all seasons. Nor does it currently have park factors or strength of schedule adjustments. Ideally, these would be made before doing the STDEV calculation, but we didn’t make this adjustment for big leaguers either. We’ll take it in chunks of time so we can fit more information in.

EAST = Independent clubs in the east
NAC = National Association of Colored Professional Clubs of the United States and Cuba
WEST = Independent clubs in the west

YEAR     MLB |     EAST     |     NAC      |    WEST
       STDEV |  STDEV  ADJ  |  STDEV  ADJ  | STDEV  ADJ
=========================================================
1905   1.39  |  3.68  0.69  |              |
1906   1.16  |  7.81  0.57  |              |  3.53  0.66
1907   1.03  |              |  1.96  0.76  |
1908   1.10  |              |  2.30  0.74  |  3.40  0.66
1909   1.10  |              |              |  6.34  0.59
1910   1.24  |  2.96  0.71  |              |  7.61  0.58
1911   1.46  |  2.62  0.78  |              | 14.55  0.55
1912   1.58  |  4.42  0.68  |              |  3.33  0.74
1913   1.31  |  2.76  0.74  |              |  6.84  0.60
1914   1.27  |  3.85  0.67  |              |  2.87  0.72
1915   1.21  |  2.97  0.70  |              |  2.08  0.79
1916   1.26  |  3.09  0.70  |              |  3.03  0.71
1917   1.18  |  3.78  0.66  |              |  7.90  0.57
1918   1.09  |  2.53  0.72  |              |  1.97  0.78
1919   1.41  |  2.47  0.79  |              |  4.80  0.65

We can see already the whopping difference in STDEVs, and the proportionally whopping adjustment that can result from it.

Here’s 1920–1932, a very active time for league formation and for league destruction thanks to the Great Depression.

NNL = first version of Negro National League
ECL = Eastern Colored League
EWL = East West League (only played in 1932, for convenience placed in the ECL column)
EAST = Independent clubs in the east
IND = Independent clubs

YEAR     MLB |      NNL     |    ECL/EWL   |     EAST     |   IND
       STDEV |  STDEV  ADJ  |  STDEV  ADJ  |  STDEV  ADJ  | STDEV  ADJ
=======================================================================
1920   1.90  |  1.89  1.00  |              |  3.07  0.81  |
1921   1.81  |  2.30  0.89  |              |  3.09  0.79  |
1922   1.78  |  2.29  0.89  |              |  4.61  0.69  |
1923   1.89  |  2.13  0.94  |  2.17  0.94  |              |  1.74  1.04
1924   1.92  |  1.35  1.21  |  1.94  0.99  |              | 
1925   1.81  |  2.16  0.92  |  2.34  0.89  |              |
1926   1.59  |  1.98  0.90  |  2.12  0.87  |              |
1927   1.80  |              |              |              |
1928   1.84  |              |  3.04  0.80  |  4.19  0.72  |
1929   1.82  |              |              |              |
1930   2.03  |              |              |  3.55  0.78  |
1931   1.80  |              |              |  3.00  0.80  |
1932   1.74  |              |  2.20  0.90  |              |  4.13  0.71

With more organized leagues bringing a higher level of owner and team into the festivities, the NNL’s and ECL’s STDEVs both dropped rapidly from the independent teams’ of the previous decades. These two leagues and the EWL in 1932 were nearly on par with the majors in terms of STDEV especially compared to the independents and the previous era. But even the Eastern indies in this period moved toward MLB’s level of variance. That said, whiteball moved toward blackball as well. The sudden surge in run scoring in the 1920s increased the variance among MLB hitters’ performance.

Now onto the final phase of the Negro Leagues, the more stable era of 1933–1944.

NNL = second version of Negro National League
NAL = Negro American League
IND = Independent clubs


YEAR     MLB |      NNL     |      NAL     |     IND
       STDEV |  STDEV  ADJ  |  STDEV  ADJ  |  STDEV  ADJ
=========================================================
1933   1.73  |  2.07  0.92  |              |  7.02  0.62
1934   1.74  |  2.77  0.81  |              |  2.48  0.85
1935   1.67  |  2.50  0.83  |              |
1936   1.93  |  2.79  0.85  |              |  4.62  0.71
1937   1.94  |  3.39  0.79  |              |
1938   1.81  |  2.66  0.84  |              |
1939   1.62  |  2.70  0.80  |  2.27  0.86  |
1940   1.56  |  1.96  0.90  |  1.83  0.92  |
1941   1.89  |  2.45  0.89  |  2.17  0.94  |
1942   1.59  |  1.92  0.91  |  1.96  0.90  |
1943   1.30  |  3.30  0.70  |  1.73  0.88  |
1944   1.60  |  2.63  0.81  |  2.71  0.80  |

Generally, the NNL and NAL stayed relatively close to the majors. Mexican League defections and World War II probably increased performance variation overall in 1943 and 1944. The big leagues had whole farm systems full of replacements of decent quality and a huge white population (and light-skinned Latino population) to draw from. Black Americans numbered hundreds of millions fewer and so were more difficult in some ways to find reasonable replacements for.

Latin Leagues

Some of the information that follows includes calculations based on data that won’t be available on the Negro Leagues Database for a little while yet. I happened to have access to it, and it is ultimately all derived from Pedro Cisneros’ Mexican League encyclopedia. The information for the various Cuban leagues is all from the Negro Leagues Database.

CWL = Cuban Winter League (la Liga general de base ball de la República de Cuba)
PV = Cuban Summer League (el Premio de verano)
GP = Grand Winter Championship (el Gran premio invernal)

YEAR     MLB |      CWL     |      PV     |      GP
       STDEV |  STDEV  ADJ  |  STDEV  ADJ |  STDEV  ADJ
=========================================================
1902   1.39  |  1.82  0.91  |             |
1903   1.39  |  1.27  1.03  |             |
1904   1.39  |  1.53  0.89  |  2.53  0.74 |
1905   1.39  |  1.45  0.98  |  2.06  0.84 |
1906   1.16  |  1.45  0.90  |  2.08  0.78 |
1907   1.03  |  1.66  0.81  |  2.67  0.69 |
1908   1.10  |  1.71  0.82  |  1.99  0.78 |
1909   1.10  |  2.63  0.71  |             |
1910   1.24  |  2.19  0.78  |             |
1911   1.46  |  2.29  0.82  |             |
1912   1.58  |  1.97  0.90  |             |
1913   1.31  |  1.70  0.89  |             |
1914   1.27  |  2.11  0.80  |             |
1915   1.21  |  2.67  0.73  |             |
1916   1.26  |  1.88  0.84  |             |
1917   1.18  |              |             |
1918   1.09  |  1.44  0.88  |             |
1919   1.41  |              |             |
1920   1.90  |  1.83  1.02  |             |
1921   1.81  |              |             |
1922   1.78  |  2.53  0.85  |             |
1923   1.89  |  2.00  0.97  |             |  1.76  1.03
1924   1.92  |              |             |
1925   1.81  |              |             |
1926   1.59  |              |             |
1927   1.80  |  2.74  0.83  |             |

The Cuban winter leagues show roughly the same range of standard deviation that the latter-day Negro Leagues did. The early summer league looks similar, if a little tighter than, the NAC did.

YEAR     MLB |     MXL
       STDEV | STDEV  ADJ
===========================
1937   1.94  |  3.68  0.76
1938   1.81  |  2.28  0.90
1939   1.62  |  1.98  0.91
1940   1.56  |  2.20  0.85
1941   1.89  |  2.24  0.92
1942   1.59  |  2.29  0.85
1943   1.30  |  1.66  0.89
1944   1.60  |  1.74  0.96
1945   1.32  |  1.97  0.83
1946   1.55  |  1.74  0.95
1947   1.46  |  1.23  1.09
1948   1.59  |  1.85  0.93
1949   1.50  |  1.65  0.95
1950   1.43  |  1.67  0.93
1951   1.50  |  1.81  0.91
1952   1.17  |  2.05  0.79
1953   1.49  |  2.02  0.87
1954   1.65  |  2.05  0.90

La Liga comes in consistently close to the big leagues for quite some time in terms of the spread of its hitters’ performance. Drawing on a large native population that only rarely made it to the Big Leagues, taking the cream of the crop from the Negro Leagues, and pinching a few players in 1946–1947 from MLB and the US minors, Mexico reduced its overall spread in talent and performance. It rates as a little more tightly bunched than the NNL and NAL of the same period.

Here’s an overall look at the entire span of time for each of the leagues mentioned above. The MLB column includes only those seasons that correspond to the seasons with available data for each respective Negro or Latin league.

Average Standard Deviation 1902–1954
           YEARS | STDEV | MLB STEDEV
=====================================
CWL   1902–1927* |  1.94 |  1.38
PV    1904–1908  |  2.26 |  1.18
EAST  1905–1931* |  3.58 |  1.48
WEST  1906–1919* |  5.25 |  1.26
NAC   1907–1908  |  2.13 |  1.06
NNL1  1920–1926  |  2.02 |  1.81
ECL   1923–1928* |  2.32 |  1.81
IND   1923–1936* |  4.00 |  1.81
GP    1923       |  1.76 |  1.89
EWL   1932       |  2.20 |  1.74
NNL2  1933–1944  |  2.60 |  1.70
MEX   1937–1954  |  2.01 |  1.56
NAL   1939–1944  |  2.11 |  1.59
*Indicates span includes discontinuous seasons

Here we see the strong effect of a league structure. The East, West, and Independent teams show a far higher degree of variance (about 50%–100%) than the more structured league setups. Other than those three, however, the rest of the leagues show a fairly narrow range of STDEVs, roughly a half run or less among them. Setting aside the East, West, and independents for a moment, MLB shows a similar overall range but with a little more clumping around the 1.80 level.

Let’s remember that the spread of performance in a league shares many markers with the league’s overall quality of competition. But factors beyond those indicating quality of play influence variance, and others that influence quality may not affect STDEV as much. The long and short of it is this: Standard Deviation is a real thing, and it is a statistical thing. I adjust for it because as a statistical thing, all statistics derived from the league’s record will be influenced by the degree of variance. And that variance is outside an individual player’s immediate control. Just as his park, his league, his run environment, the strength of the schedule he faces, and many other factors that have an impact on his numbers, subtly or not so subtly.

Next time out, we’ll check in on the Integration-era minor leagues to see how they compare to the big leagues. Then in a final article, we’ll recap by showing how adjusting for STDEV may change our perceptions of several Negro League stars.

Advertisements

Discussion

One thought on “Negro Leaguers and Standard Deviation, Part I

  1. Great stuff Eric, looking forward to the next in this series!

    Posted by Ryan | June 14, 2017, 3:49 pm

Tell us what you think!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Institutional History

%d bloggers like this: