archives

negro league translations

This tag is associated with 2 posts

Negro Leagues: Measuring the Quality of Competition

How good were the Negro Leagues? If you’re considering translating Negro League statistics to a Major League setting, you have to have an answer to this question. If you want reasonable translations, you have to have a really good answer to the question. If you want all your translations to line up systematically, you have to answer that same question for many, many leagues. So today, that’s what we’re considering.

In some sense, organized baseball has answered parts of the question for us. Under the National Agreement, the minor leagues have been classified since the early 1900s. We’ll soon see how those classifications have changed repeatedly over the years, but fans today recognize this structure:

  • MLB: The AL and NL
  • AAA: The Pacific Coast and International Leagues (and the Mexican League)
  • AA: The Eastern, Southern, and Texas Leagues
  • Hi-A: The California, Carolina, and Florida State Leagues
  • Lo-A: The Midwest and South Atlantic Leagues
  • Short-Season A: The New York-Penn and Northwest Leagues
  • Rookie (non-complex): The Appalachian and Pioneer Leagues
  • Complex Rookie: The Arizona Summer and Gulf Coast Leagues
  • Foreign Rookie: The Dominican Summer League

Today, the classification of these leagues represents a ladder that young players climb on the way to The Show. In past eras, however, the National Agreement based classification on the size of the population the league served. But when you think about, this was a strong proxy for quality. Of course, the larger the area you drew from the more talent you could scout locally, the more ticket sales you could do. But remember, unlike the organized minors of today, until the 1960s or so, most minor league teams were trying to win their league’s pennant. So fans of the time also exerted more pressure on minor league squads to win. The point is this: The ladder existed then and exists now.

So what’s this got to do with Black Ball? Simply put, if we can figure out the quality of play at each minor league level, we may be able to place the Negro Leagues and other independent leagues that signed dark-skinned players into the framework. It’s a method that can produce a reasonable and familiar estimate of play.

Here’s a timeline of minor league classifications presented for puzzlement/enjoyment. The hashed arrows indicate that a league shifted to a new level. It’s a pretty wacky timeline, so…this is a blues riff in B, watch me for the changes, and try to keep up. Okay?

What if we knew the discount (if you will) off of major league performance for each of these leagues? That is, if we knew that a person created 100 runs in AA, what percentage of those 100 runs created would he give back by moving up to the majors?

Luckily for you and me, others have plumbed these very depths and done the math. I pieced together information from an excellent article by Ben Lindbergh at the former Grantland as well as some of Clay Davenport’s work to reach some SWAGs for conversion rates for leagues in the current minor-league classification system. To the best of my ability to use Google, I haven’t been able to find an updated table that includes all levels and indy and international leagues and their conversion rates to MLB.

To provide some context, let’s see how the discount structure works using two players’ 2016 seasons. Mike Trout led the AL with 148 runs created, 64 more than average in his 681 plate appearances. Lorenzo Cain created 52 runs in 434 PAs, exactly average for a player in his playing time in the AL of 2016. What would we expect these guys’ major league performance to be if they had created the same number of runs in AAA? Or in AA?


                   CONV
LEVEL LEAGUES      RATE  TROUT CAIN
======================================
MLB   AL  NL       1.00   148   52
AAA   IL  PCL      0.80   118   42
AA    EL  SL  TXL  0.72   107   37
Hi-A  CAL CAR FSL  0.62    92   32
Lo-A  MWL SAL      0.58    86   30
SSA   NYP NWL      0.50    74   26
NCRK  APP PIO      0.50    74   26
CXRK  AZS GCL      0.50    74   26
FNRK  DMS VZS      0.50    74   26
AAA   MXL          0.49    73   25

When you are as good as Trout was in 2016, you can be playing as far down on the farm as LoA and still produce an approximately average MLB season. Even down in Rookie ball, you’re not yet at replacement. On the other hand, Cain plummets to roughly replacement level in Hi-A. Now, obviously, this bootstrapping-like method has limitations. Guys in the Arizona Summer League have probably never seen a great breaking ball and won’t until they hit A ball. But on the whole, it appears defensible because it’s telling us that a Trout-like season by a veteran player in Lo-A would only appear as about average in the bigs. As we’ll see below, this may make good sense.

Let’s bust thing out a little further to include some foreign and independent leagues.


                          CONV
LEVEL LEAGUES             RATE  TROUT CAIN
============================================
MLB   AL NL               1.00   148   52
INT   NPB                 0.90   133   47
AAA   IL PCL              0.80   118   42
WINT  DMW                 0.80   118   42
AA    EL SL TXL           0.72   107   37
IND   ATLANTIC            0.72   107   37
WINT  AZF PRWL VZWL MXWL  0.72   107   37
INT   CUBA                0.63    93   33
Hi-A  CAL CAR FSL         0.62    92   32
IND   AA CANAM            0.62    92   32
Lo-A  MWL SAL             0.58    86   30
IND   FRONTIER            0.58    86   30
SSA   NYP NWL             0.50    74   26
NCRK  APP PIO             0.50    74   26
CXRK  AZS GCL             0.50    74   26
FNRK  DMS VZS             0.50    74   26
AAA   MXL                 0.49    73   25
INT   KBL                 0.49    73   25

That’s a pretty reasonable spread to work from, right? So how would the Negro Leagues fit into this? The Negro Leagues are variously described as anywhere between Nippon Pro Baseball and AA quality. That would put them in the range of 0.75–0.90 of MLB. I suspect the truth is they come in at both ends of this spectrum at different times in history. Did I mention that the Negro Leagues are complicated?

The Negro National League and Eastern Colored League of the 1920s were probably close to NPB level leagues. The talent was well concentrated in those leagues, and while the cream of the crop were Hall-level players, the very bottom end were probably Hi-A or Lo-A players. The spread of talent was larger than in MLB, but the cream got more playing time, and the really bad teams with mostly nobodies tended to play fewer games and/or fold quickly. Compare that to the early 1940s. At that time, the league’s biggest names jumped to Mexico and/or went to war. Pending further research, the combination of the two seems likely to me to have lowered the quality of play to AA quality. Once the color line was broken and the exodus of talent hastened, the quality of play sank rapidly.

On the flip side there’s the Mexican League. With so many black stars jumping to it, the league’s quality rose steeply in 1940 and 1941 and ebbed and flowed in the 1940s. It imported several quality MLB players in 1946–1947 before Happy Chandler started handing out suspensions for signing a contract with la Liga. Although this requires more investigation, we can make some initial guesses. Today’s Mexican League draws primarily from Mexico and surrounding countries. Despite its AAA classification, Clay Davenport’s studies show it’s at about a Rookie ball level. Mexico’s best have rarely proven to be superior quality major leaguers. Fernando Valenzuela being an exception that proves the rule. So, add to that Rookie-level league a couple dozen high-profile MLB stars and some veteran AA and AAA players, and what would happen? The ceiling would absolutely rise, but so would the floor because the lesser native players would garner fewer appearances. So the guess here is that the league of the 1940s rose to about an overall AA level. Maybe a tad more, maybe a tad less depending on how much talent it imported for any given season.

Now here’s a kicker. In some seasons, the Cuban Winter League might have been NPB level or better. The Cuban leagues only included three or four teams each season. Numerous Negro Leagues stars made the trip south (or returned to their homeland in Martin Dihigo’s case), their numbers were augmented by the very best Cuban and Latin American players, as well as occasional white minor league or major league players (especially native sons Dolf Luque, Armando Marsans, Rafael Almeida, and Mike Gonzalez). The number of players who appeared for a team fluctuated from small (15) to large (25), and the championships were hotly contested. Talent burst at the seams of the league, though, like the Negro Leagues, this may have been more true some years than others.

Now, finally, we arrive at the organized minor leagues themselves. They are of concern for players who transitioned into organized baseball during integration. If Negro Leagues expat Marv Williams hit .401 at the age of 32 with 45 homers in the 1952 Arizona-Texas League, what does that mean? His stats (with an extrapolation for his walks and other peripherals at known career rates) probably compute to a runs-created total around 150, very close to Trout’s 2016 total. The Arizona-Texas league was a C-level league. Consulting my chart above on the history of league classifications, Williams probably played in a context around Hi-A or Lo-A level by our current nomenclature. It might well mean that Williams’ performance translates to very near the major league average despite the gaudy numbers (especially because the AZTX league was a very high-octane loop with 7.1 runs/game. Yeah, you read that right, 7.1). And that makes sense, doesn’t it? If Lorenzo Cain played all of 2016 in the Midwest League, wouldn’t we expect him to destroy it just like Williams did the AZTX?

So at the very worst, bootstrapping from today’s minor league setup gives us a strong foundation to build conversion factors from. There are issues with it, though. Leagues, especially lower level leagues, from the Integration era were typically populated with older players than they are today. As much as two or three years older. That doesn’t mean, however, that they are as talented as today’s younger players. But it’s a thing. Also, with so many more and more localized leagues back then, we can’t say for sure that something like the Arizona-Texas league wasn’t worse (or better) than other leagues at the same classification. This is also true today in some measure, but the spread must have been much wider back then. Still, despite these issues, we can probably work with some confidence because baseball as a game hasn’t changed much. The minor leagues are minor for a reason, and the big leagues have always used them as a means for procuring and developing talent.

No one has ever said that the translation of Negro Leaguers stats into a major league context will provide highly accurate assessments of performance. Not possible given the limitations of the data. We would instead hope to achieve a reasonably accurate assessment. The definition of accurate remains open in this context, and details like the difference between A and AA ball require attention and a flexible concept of “correct.” But if we go down this path, we could only do our best to arrive an answer that passes the sniff test and doesn’t have any glaring mathematical errors.

Advertisements

What makes the Negro Leagues hard to analyze

Continuing our series of think-alouds about Negro Leaguers, I want to respond to feedback from our great friend Verdun2 who strongly encourages us to dive on in. So let’s focus on why we haven’t committed to that electoral process yet. The most basic reason is this: The Negro Leagues can be frustratingly complicated and hard to keep simple.

To draw an analogy, think about building a bridge. We want to build something that spans a river so we can drive from the town on one side to the town on the other side. Simple! Well, not so fast. First we need to do a traffic study to know how many vehicles would likely use the bridge. We need to know what kinds too because tractor-trailers can’t use every bridge. We also want to study the economic impact on the area. How much would this bridge improve commerce on both sides of the span? Oh, and noise pollution. Whosever house is near the termini of the bridge might need to invest in earplugs and will certainly consider bringing suit to stall the project if property values will take a whack. We probably also need to do an environmental impact study for the fauna and flora in the waterway itself. Of course, we have to find just the right site, one narrow enough to make the project viable, on ground that can handle the pilings, but also situated such that nearby existing roads can handle the increased traffic. Depending on the location, we might also have to determine whether we’re obliged to do an archaeological study. Do we need to make a drawbridge to accommodate river traffic? Earthquake proofing? Hurricane proofing? Big thing is, who pays for all of it? Do the towns on both sides split the bill? What if one will benefit more than the other? Worse yet, what if the towns are in different states? Now we’ve got to get the project through two legislatures plus two town councils and maybe even two county commissions. Is this going to be a toll bridge?

Likewise, the Negro Leagues start with the simple idea that we elect the 29 best Negro Leaguers ever. Things go cray-cray almost immediately thereafter. Before 1920, the Negro Leagues weren’t even leagues! They were independent touring teams who might loosely affiliate to play for larger gates on the weekend in between barnstorming runs. After 1920, teams still barnstormed even with an official slate of league games. In the late 1930s, the Mexican League started siphoning off talent, and those guys might have played in better leagues in Mexico than in the US. When baseball finally integrated, all bets were off. Players dispersed to the four corners of organized baseball, into every league and classification imaginable.

To evaluate these players well, we need to be able to put all these players onto a level footing. Each season needs to be evaluated separately and then brought into synch with all the other seasons. Worse yet, because some later Negro Leaguers entered MLB (not worse for them but for analyzing them), we have to put all the seasons of all the players we look at onto an MLB footing. We can’t compare Oscar Charleston to Willard Brown to Minnie Miñoso without putting them onto a major-league scale. Furthermore, we can’t then compare them with MLB players across history without leveling them up to MLB.

No worries, we have a simple solution. We’ll just translate their stats from their league of origin to the majors! All we need to do that is to compare them to the average of their destination league and then apply a quality of competition discount, and voila. Well, and we need to adjust their numbers for their home park. And possibly for standard deviation in the destination league because it was very top-heavy. Oh, and we’ll need to rescale their translation to the destination league’s run-scoring environment. Oh, oh, and we’ll need to figure out how to assign playing time, and when we do that determine a fair way to extrapolate the translated stats to full-season because the Negro Leagues and minor leagues often played shorter schedules than the big leagues. Defense…that’ll be tricky.

Each of these items I’m jokingly mentioning here is real, and they each require data and a protocol to figure them. For one single season, we need at least the following background data:

  1. The player’s stats in the originating league
  2. The player’s originating team’s games
  3. The team’s park factor
  4. The originating league’s totals
  5. A translation factor for the originating league
  6. WThe runs/game for the destination league.

Depending on how specific we get, things become more complex yet. We may need to run comps for things like baserunning or fielding. We’ll need that extrapolation protocol of course. We also need a way to scale up innings for pitchers. If we want to adjust for standard deviation, that’s a whole nother conversation and batch of data. We’ll want a protocol for double checking a translation against real MLB players.

That park factor? We’re going to have to calculate our own, and we don’t have home/road splits, so we’re going to have to make a less precise calculation based on RS/RA, especially because some of the Negro League and minor league parks were quirkier than Fenway by orders of magnitude.

Are you getting the picture yet?

So this is why we haven’t yet decided whether we will proceed. We don’t yet have a sense of how long merely gathering up the data will take. Actually, let’s check in on that too. See, the thing is that while much of the data is now extant, it’s not always utterly complete, and sometimes it is simply not available. For some leagues we do not know league totals for important things like doubles, triples, and homers (the Mandak League) or anything at all (most sub-AAA minor leagues prior to 1950). Anyone out there got a lead on the Venezuelan or Domincan summer leagues of the 1950s?

Probably the most difficult information we’ll need to locate is another factor we’ll need to determine ourselves. The translation factor is the engine driving the bus here. There aren’t many guideposts to go from. While MLEs (minor league equivalencies) have been around since the mid-1980s (thank you, Bill James!), the calculations don’t go backward in time from there. We have no specifics on how good the Mexican League of 1940 and 1941 were, let alone the Negro Leagues. Working up studies for every league we’d be looking at would take years. We have some ideas for how this could work, which is a subject for the future, but for now, this is one of those issues we have to feel our way through before undertake this process.

We’re thinking hard about how this could go. The ramping-up period could be a few months to gather all the data and test all the protocols. And with the Negro League Database growing by a few seasons each year, we have a strong incentive to take it slowly anyway. The more patience we have, the more data we’ll ultimately have at our disposal.

So you’ll have to keep waiting for an official announcement as we assess the viability of this part of the project for us and what kind of timeline we could accomplish it on. We warn you that it might roll out much slower than our other elections have, but if that’s the case, it’ll be because we are two complicated guys who want to get this very complicated task done as well as we are able. After all, our name is on each plaque too.

Institutional History

Advertisements
%d bloggers like this: