Negro Leagues

What makes the Negro Leagues hard to analyze

Continuing our series of think-alouds about Negro Leaguers, I want to respond to feedback from our great friend Verdun2 who strongly encourages us to dive on in. So let’s focus on why we haven’t committed to that electoral process yet. The most basic reason is this: The Negro Leagues can be frustratingly complicated and hard to keep simple.

To draw an analogy, think about building a bridge. We want to build something that spans a river so we can drive from the town on one side to the town on the other side. Simple! Well, not so fast. First we need to do a traffic study to know how many vehicles would likely use the bridge. We need to know what kinds too because tractor-trailers can’t use every bridge. We also want to study the economic impact on the area. How much would this bridge improve commerce on both sides of the span? Oh, and noise pollution. Whosever house is near the termini of the bridge might need to invest in earplugs and will certainly consider bringing suit to stall the project if property values will take a whack. We probably also need to do an environmental impact study for the fauna and flora in the waterway itself. Of course, we have to find just the right site, one narrow enough to make the project viable, on ground that can handle the pilings, but also situated such that nearby existing roads can handle the increased traffic. Depending on the location, we might also have to determine whether we’re obliged to do an archaeological study. Do we need to make a drawbridge to accommodate river traffic? Earthquake proofing? Hurricane proofing? Big thing is, who pays for all of it? Do the towns on both sides split the bill? What if one will benefit more than the other? Worse yet, what if the towns are in different states? Now we’ve got to get the project through two legislatures plus two town councils and maybe even two county commissions. Is this going to be a toll bridge?

Likewise, the Negro Leagues start with the simple idea that we elect the 29 best Negro Leaguers ever. Things go cray-cray almost immediately thereafter. Before 1920, the Negro Leagues weren’t even leagues! They were independent touring teams who might loosely affiliate to play for larger gates on the weekend in between barnstorming runs. After 1920, teams still barnstormed even with an official slate of league games. In the late 1930s, the Mexican League started siphoning off talent, and those guys might have played in better leagues in Mexico than in the US. When baseball finally integrated, all bets were off. Players dispersed to the four corners of organized baseball, into every league and classification imaginable.

To evaluate these players well, we need to be able to put all these players onto a level footing. Each season needs to be evaluated separately and then brought into synch with all the other seasons. Worse yet, because some later Negro Leaguers entered MLB (not worse for them but for analyzing them), we have to put all the seasons of all the players we look at onto an MLB footing. We can’t compare Oscar Charleston to Willard Brown to Minnie Miñoso without putting them onto a major-league scale. Furthermore, we can’t then compare them with MLB players across history without leveling them up to MLB.

No worries, we have a simple solution. We’ll just translate their stats from their league of origin to the majors! All we need to do that is to compare them to the average of their destination league and then apply a quality of competition discount, and voila. Well, and we need to adjust their numbers for their home park. And possibly for standard deviation in the destination league because it was very top-heavy. Oh, and we’ll need to rescale their translation to the destination league’s run-scoring environment. Oh, oh, and we’ll need to figure out how to assign playing time, and when we do that determine a fair way to extrapolate the translated stats to full-season because the Negro Leagues and minor leagues often played shorter schedules than the big leagues. Defense…that’ll be tricky.

Each of these items I’m jokingly mentioning here is real, and they each require data and a protocol to figure them. For one single season, we need at least the following background data:

  1. The player’s stats in the originating league
  2. The player’s originating team’s games
  3. The team’s park factor
  4. The originating league’s totals
  5. A translation factor for the originating league
  6. WThe runs/game for the destination league.

Depending on how specific we get, things become more complex yet. We may need to run comps for things like baserunning or fielding. We’ll need that extrapolation protocol of course. We also need a way to scale up innings for pitchers. If we want to adjust for standard deviation, that’s a whole nother conversation and batch of data. We’ll want a protocol for double checking a translation against real MLB players.

That park factor? We’re going to have to calculate our own, and we don’t have home/road splits, so we’re going to have to make a less precise calculation based on RS/RA, especially because some of the Negro League and minor league parks were quirkier than Fenway by orders of magnitude.

Are you getting the picture yet?

So this is why we haven’t yet decided whether we will proceed. We don’t yet have a sense of how long merely gathering up the data will take. Actually, let’s check in on that too. See, the thing is that while much of the data is now extant, it’s not always utterly complete, and sometimes it is simply not available. For some leagues we do not know league totals for important things like doubles, triples, and homers (the Mandak League) or anything at all (most sub-AAA minor leagues prior to 1950). Anyone out there got a lead on the Venezuelan or Domincan summer leagues of the 1950s?

Probably the most difficult information we’ll need to locate is another factor we’ll need to determine ourselves. The translation factor is the engine driving the bus here. There aren’t many guideposts to go from. While MLEs (minor league equivalencies) have been around since the mid-1980s (thank you, Bill James!), the calculations don’t go backward in time from there. We have no specifics on how good the Mexican League of 1940 and 1941 were, let alone the Negro Leagues. Working up studies for every league we’d be looking at would take years. We have some ideas for how this could work, which is a subject for the future, but for now, this is one of those issues we have to feel our way through before undertake this process.

We’re thinking hard about how this could go. The ramping-up period could be a few months to gather all the data and test all the protocols. And with the Negro League Database growing by a few seasons each year, we have a strong incentive to take it slowly anyway. The more patience we have, the more data we’ll ultimately have at our disposal.

So you’ll have to keep waiting for an official announcement as we assess the viability of this part of the project for us and what kind of timeline we could accomplish it on. We warn you that it might roll out much slower than our other elections have, but if that’s the case, it’ll be because we are two complicated guys who want to get this very complicated task done as well as we are able. After all, our name is on each plaque too.



9 thoughts on “What makes the Negro Leagues hard to analyze

  1. I, for one, will wait. You two have a habit of getting it right.

    Posted by verdun2 | May 8, 2017, 7:40 am
  2. No rush, we will be excited for when the time comes to recognize these greats!

    Posted by Ryan | May 8, 2017, 12:18 pm
  3. You might want to wait – there will be some exciting new data available this year on ballparks, opponent strength, plus more data overall….

    Posted by KJOK | May 8, 2017, 4:05 pm
  4. You guys have done a tremendous job. I think that of the 29 to be elected there are probably, by my count, 18 that are more or less sure things (Gibson, Santop, Leonard, Lloyd, Wells, Homerun Johnson, Jud Wilson, Charleston, Torriente, Steanes, Minoso, Irvin, Doby, SJ Williams, Paige, Dihigo, Rogan, Ray Brown) and a couple of others that are probables. I really look forward to those remaining few selections though because that’s where we always see the really interesting selections from your thorough research and analysis like Jose Cruz, Tony Phillips, Sal Bando, Chuck Finley & Joe Brown. Count me in as another reader who really hopes you undertake the Negro League challenge.

    Posted by Craig | May 8, 2017, 8:18 pm
    • Thanks, everyone for the encouragement and the vote of confidence! We will keep writing about this process to flesh out our thoughts and see whether all y’all think our ideas make sense.

      Posted by eric | May 8, 2017, 9:02 pm
  5. I’ve rebuilt 1933 and 1943 for Seamheads. but I’d exercise extreme caution. The data we have available today is better than anything floating around ten years ago, during the Hall of Merit era, Down the road, if you have questions about those two specific season, let me know. I could serve as a guide. They’re even more complicated than you could imagine. Sometimes in ways which might make analysis easier, but more often than not in ways which make it tougher. Each year needs to be examined individually, because no two are alike, which will make for a grueling path forward. Going slow on this is the right move. And one note on your excellent Cool Papa Bell post: the stolen bases are horribly under reported for several seasons, depending on what city one played in (and the newspaper providing the coverage), and it clearly skewers his speed profile. Take 1933 for example, after examining the box scores and the cities where they played, I figured the Pittsburgh Crawfords stolen bases were probably under reported by as much as 50%, meaning Bell likely had 22 SB in 66 games, not 11. But take that info with caution: the factor used for under reported SBs for other teams from 1933 is probably something different than the Crawfords. (different schedule, different opponents, different newspaper coverage: ah, the headaches!)

    Posted by SCOTT SIMKUS | May 9, 2017, 2:16 pm

