archives

MLEs

This tag is associated with 1 post

Major League Equivalencies for Negro Leagues Hitters

About a month ago, we told you about our Major League Equivalency (MLE) protocol for Negro Leagues pitchers. That 26-step protocol, swelled as it is by subroutines of all sorts, will seem genuinely straightforward compared to this what we’re about to unleash. But stick with it, the truth is out there, and we’re trying to use every tool we can to get at it. And, hey, we’d like to know if we can make it better, so your feedback is super helpful. On the other hand, this monster of a post is about 6,500 words, so if you want to just trust us, you can. But this sucker is here for your reference and ours if we ever need it.

As we go along, discovering  more nuances, nooks, and crannies, we may have occasion to edit this methodology. When we do, edits will appear in red, and those elements affected will be shown in gray to indicate that they are no longer up to date.

The Big Picture

Our goal in creating MLEs is twofold. First, we want every Negro Leagues player’s records to be recontextualized onto a level, neutral platform. That’s because in the Negro Leagues, the different teams, leagues, parks, seasons, whathaveyou, were not of uniform quality and behavior. In fact, they varied far more than the majors did. So just among the Negro Leaguers themselves, we need this to help us make wise electoral decisions. On top of that, however, we want to get a sense of how these guys compare to MLB players so that we can place their achievements into a context that’s more familiar to us.

We don’t have a lot of interest in creating component stat lines: homers, RBI, strikeouts, HPBs. That stuff is all fascinating in its way, but we don’t place specific importance on those figures here at the Hall of Miller and Eric. We prefer to look holistically at what the combination of those recorded events meant in context—value, if you prefer. So we’re after WAR, just as we are with pitchers.

The process of locating a Negro Leaguer’s value in his collected statistics (at the Negro Leagues Database and elsewhere) follows a similar path to what we did with pitchers, in as much as we find his rate of production, recontextualize it with z-scores, adjust for Quality of Play and park effects, then apply it to an MLB playing-time estimate. Simple enough, right? But we also want to pay attention to things like what position the player would have played in the majors, his fielding value, his baserunning value, and, depending on what seasons in MLB history we’re talking about, his double-play-avoidance value.

The work we do initially as we translate performance and arrive at an initial playing-time estimate requires very little manual intervention on our part. It’s when we get into running and fielding that we are forced to make some decisions on our own. That’s where the human element comes in and careful judgment becomes our watch-phrase.

But when we get done, we have an estimate of what kind of value a fellow would rack up in the big leagues. It’s not a perfect estimate, though it’s the best we currently know how to do. We would also express the caveat that MLEs are likely best absorbed first at the career level then, with careful discernment, at the seasonal level. Due to shorter schedules and concomitantly increased volatility in stat lines, and despite some effort on our part to dampen that volatility, year-by-year MLEs simply won’t be as reliable as a career value. Which itself is an estimate, not the holy gospel of the Negro Leagues.

But other than that, they are perfect indicators of performance and value….

Before we get going, let’s define our terms

This is just going to go down easier if we use the same lingo. And since I can’t hear you to agree to a particular jargon, you’ll just have to use mine. These terms will pop up a lot.

  • Originating League/Team: The team he actually played for
  • Destination League: The league we are translating his stats into and creating an equivalency for
  • Quality of Play (QOP):  Which assumes that MLB is 1.0, and everything else is discounted from it
  • Translated: Stats that been transformed from the originating league into the destination league’s run and league-quality context; an intermediate step en route to the fuller equivalent performance
  • Equivalent: Stats whose basis is in translated figures but that include further adjustments to place the player into a broader MLB context and ensure that small samples don’t overly skew the results.

There’ll be the usual alphabet soup along the way, and we’ll define the acronyms as we go.

The Process…in Prose

So let’s begin by my explaining this protocol in English, then we’ll run through a real-life example from the career of the great Oscar Charleston.

Translating Actual Performance

1) Find the rate of player’s offensive performance.
2) Compare to his own league.
3) Place into MLB context.
4) Adjust for the quality of his league.
5) Adjust for his park.
6)Adjust for his strength of schedule.

Creating an Initial Estimate of Playing Time and Batting Runs Above Average (Rbat)

7) Express as RC, then figure his translated Rbat,then express the result per PA.
8) Create a rolling average of Rbat to create the final MLE rate of batting performance.
9) Estimate the player’s games played into an MLB schedule based on his in-season and career durability records.
10) Apply the destination league’s PA/game to those estimated games.

So, now we have an initial estimate of the player’s MLB Rbat for the season. But before we go any further, we need to fine-tune our playing-time estimate because everything else after this depends upon it.

Fine-tuning Playing Time

11) First, we look at any batting seasons at the beginning or end of a player’s career. If they are well below average, we will not consider them part of his MLE—either he’d have been in the minors or, at the other end of things, aged out of the game. A general rule of thumb is that after age 38, two seasons well below average batting is probably enough to retire someone.
12) Next, we look at the player’s biographical material. If there are any injuries that would keep him out of the lineup, we see if our PA estimate reflects it appropriately. Because of the less stable nature of the Negro Leagues, we also look for league/team jumping, and other oddball movements that would affect playing time but didn’t occur in MLB. We adjust accordingly.
13) Now we look for players with similar careers and styles to our man. We look for players at his position (or a similar one), roughly during his time, with similar offensive and defensive profiles (OPS+ and Rfield are good barometers here), and who fall roughly within a similar number of seasons as a regular. Once we have a bunch of them identified, 10–12 is best when possible, we look at their playing time, especially for and across the ages that we are including in our MLE. We look for a rough number of plate appearances to shoot for.
14) Armed with the information in #11, we adjust very early and very young seasons first to fit both general trends among all players and specific trends among comps. This should get us pretty close to our goal. Beyond that, we can add or trim as necessary to get to a career with a reasonable playing-time approximation.

So, now we have an initial estimate of the player’s MLB Rbat for the season and a more realistic estimate of his playing time. We can move on.

Estimating Baserunning

BBREF uses a regression-based formula derived from stolen-base information to estimate baserunning for seasons prior to 1953 (after which they have complete play-by-play data to rely on). But we don’t have quite the same level of information they do for our Negro Leagues players. We do, however, have enough to take a swing at it in a different way. We can do something along the lines of the investigations of prewar baserunning we did for Sam Rice and others. Here’s how we’ll do it.

15) For each season with the necessary information, find a player’s stolen bases per opportunity.
16) Find the same for his teams.
17) Find the same for his originating leagues.
18) Adjust his rate for his own team’s tendency to run or not.
19) Compare the team-adjusted rate to his league and figure his percentage of steals above or below league average.
20) Now, find MLB players from the PBP era with similarly long careers and find similar percentages of SB above the league. Because Negro Leagues boxscores may not always have carried stolen base info, it’s OK to pad this by as much as double for players with very speedy reputations, so, for example, 125% of league becomes 150%. (This is kind of a pain, so we’ve only used four to six at a time, 10 would be better.)
21) Find the average Rbaser/PA of those MLB comps and apply it on a per-season basis to the player’s estimated PAs.
22) If the candidate has a pronounced decline in his net steals versus the league, sculpt the trajectory of his running runs appropriately.

Estimating Double-Play Avoidance

We only estimate runs from double-play avoidance (Rdp) from 1948 onward because this is when BBREF’s data kick in.

23) Identify lots of MLB batters of the same handedness, similar Rbaser, and similar career length, and calculate their Rdp/PA.
24) Apply the group’s average in Step 17 to each season of the player’s career.

Estimating Fielding Runs

The samples in the Negro Leagues are pretty small, so we need to mix together the DRA information we see in the Negro Leagues Database with a good dose of real MLB careers. This will give us a value to plug into the column called Rfield on BBREF, though we aren’t using Total Zone or Defensive Runs Saved as they do because the Negro Leagues Database helpfully uses DRA (Defensive Regression Analysis) instead. But first, we need to determine our man’s position.

25) Determine the player’s position for a given season or career by examining where he played in the originating league and how well he played there. If he started at shortstop, was bad at it, then moved to another position and was average or good, we might consider putting him at the latter position all the time. This is a subjective judgment, and we should look at real big-league careers for examples.
26) Find the player’s DRA/G rate in his originating league at whatever position or positions he will be placed at in our MLE.

November 26th, 2017: Please note that we’ve created a more objective means to generate fielding estimates, so step 26 is now out of date. Here’s what step 26 looks like now:

26) Find the player’s career DRA/154 games.

26a) Find the fifty players with the highest number of appearances in the Negro Leagues at the player’s position (because the Negro Leagues Database lacks DRA for some seasons where it lacks games, only count the appearances in seasons that include DRA), and for each figure their career DRA/154 at that position.

26b) Find the standard deviation of DRA/154 among the 50 players in 26a.

26c) Repeat the process in steps 26a and 26b for the major leagues for the same period of time the Negro Leagues Database covers (1887––1945 as of this writing), substituting BBREF’s Rfield for DRA. (Technically, I used the BBREF Play Index, setting it to seek shortstops with more than 50% of their games at the same position, and returning their career Rfield. This will have to be close enough because otherwise, we’d be at it for weeks.)

26d) Divide the player’s DRA/154 (step 26) by the standard deviation of the Negro Leagues players at his position (step 26b) then multiply by the standard deviation result for MLB players in step 26 c. This is the player’s MLB career Rfield/154.

27) Apply that rate to the number of games in a season imputed by the estimated PAs we’ve assigned earlier in this process.
28) Check whether his defensive performance declined over time, and make any seasonal adjustments necessary to mirror that.
29) Double check against real MLB careers to see if the number of fielding runs generated are reasonable.

Determining Positional Runs

30) We do this exactly the same way that BBREF does here, based on the position we have assigned the player.

Calculating Runs and Wins Above Average

31) Now for each season we add the player’s Rbat, Rbaser, Rdp, Rfield, and Rpos to get his Runs Above Average (RAA).
32) To convert that to Wins Above Average, we follow BBREF’s instructions here.

Calculating Replacement Runs and Wins Above Replacement

33) We calculate replacement runs (Rrep) just as BBREF instructs us here.
34) Next we turn those Rrep into the player’s replacement-level wins per BBREF’s instructions here.
35) Finally, we add the WAA and replacement-level wins to get WAR.

The last thing we need to do is simply make sure that the MLE isn’t out of whack with real players and leagues. If we’ve estimated Josh Gibson for 1000 Rbat, we’re making a mistake. We are also making a mistake if we estimate him at 200. It’s always good to double check our work.

We’re now topping 1600 words. Are we there yet? No. Now we’ll get into the dirty-finger-nail details. Let’s take Oscar Charleston for a spin, and see how this all plays out in reality. There’s going to be a lot of moving parts, and if you’re following along at home, you’ll want to get another beer now.

A real example: Oscar Charleston, 1921

Now, we’ll run through this with Oscar Charleston’s 1921 season. This will reveal some nitty gritty details about what performance measures we use, and how we place players into an MLB run-context. I’ve only given you a framework, but you can use any old measurements or transformations you want to.

OSCAR CHARLESTON 1921
Originating league: Negro National League (NNL)
Originating team: St. Louis Giants
Destination league: 1921 NL

We have chosen our default destination league as the National League. We use the AL only when a player’s first appearance is in it.

1) Find the rate of player’s offensive performance

I’ve chosen to use Bill James’ Runs Created (the 2002 version) due to its relative simplicity. Not that any run estimators are all that simple, and we’re going to turn it into RC/PA. But first I need to address three small things: strikeouts, grounded into double plays (GIDP), and reached on error (ROE). BBREF creates estimates for these and/or simply includes them in its batting-runs estimate for players. Although we want to maintain some degree of compatibility with them, very little of the data we will work with includes this information. To keep things as simple as we can, we will not be assigning a player an estimate of what the average hitter in his league would accumulate in those categories. By not including them, we simply assume the player in question and every other player in the league are average in these categories. By doing so, when we place the player into an MLB context, we won’t need to make any further adjustments for the lack of this information. We just assume once again that he is average in these regards in MLB. For most hitters, these are not a huge source of credits or debits, but it will help or hinder certain types of hitters. Sometimes you go to translation with the data you have. It’s worth noting, however, that we will estimate player’s GIDP-avoidance value for seasons after 1948, as you’ll see below, so at least there they can recoup or de(?)coup some value.

We will only be using Charleston’s 1921 NNL season, and we will not include his play in five games against Major League players. That might sound odd, but with such small samples involving only two teams, and the MLB team not always comprised solely of MLB caliber players, it gets dicey fast.

 

Now we can run the RC2002 formula. BBREF does not include steals and caught stealing in batting runs (Rbat), nor intentional walks and sac bunts, so we don’t either. This has gone on plenty long, and you can find James’ equation at the link I shared.  Charleston bashed his way to 93 runs created in 339 PA, or 0.273/PA. Hitting .433/.512/.736 (249 OPS+) will do that.

2) Compare to his own league

As we did with pitchers, we’re using z-scores. Charleston’s 0.273 RC/PA came in a league with a 0.109 mean RC/PA. The league’s Standard Deviation in that department was 0.092:

( 0.276 – 0.109 ) / 0.092 = 1.78

3) Place into MLB context

The NL of 1921 had a mean RC/PA of .099 and STDEV of .083, therefore:

( 1.78 * .083 ) + .099 = 0.246 RC/PA

4) Adjust for the quality of his league

We rate the NNL of 1921 at just above AAA level, 0.85 of MLB. But we use it at two-thirds strength. I ran a wimpy little regression (says the untrained statistician) in excel that suggested the length of the schedule was perhaps responsible for a third or so of the variation observed in the Negro Leagues. Thus:

( ( ( 1 – 0.85 ) * 0.33 ) + 1 ) * 0.246 RC/PA = 0.221 RC/PA

5) Adjust for his park

Using the same park-factor calculations as we showed you in the article on our MLE process for pitching, we get a 1.13 park factor for the 1921 St. Louis Giants. We use it at half strength since teams typically play half their games at home.

( ( ( 1.13 – 1 ) / 2 ) + 1 ) * 0.221 RC/PA = 0.208 RC/PA

6) Adjust for his strength of schedule

If this information is available, we multiply by the strength of schedule discount if the player’s RC/PA is positive, or we divide if it is negative. We don’t yet have that information available, so we needn’t take action here.

We have now translated Oscar Charleston’s 1921 batting performance into a neutral 1921 NL context. Now we need to create an initial estimate for his playing time.

7) Express as RC, figure his translated Rbat, then express the result per PA

Charleston batted 323 times, so we multiply by his 0.208 RC/PA to get 70.6 RC. We turn that into Rbat by subtracting the number of runs an average 1921 NL hitter would accumulate in 339 PAs. The NL mean in step 3 was .099 RC/PA, which in 339 PA is 33.6. So Charleston picked up 37.1 Rbat in those 339 PA. That boils down to .109 Rbat/PA.

8) Create a rolling average of Rbat to create the final MLE rate of batting performance.

Same thing we did with our pitchers’ performance. 5-year rolling average centered on the year in question.

( Year N * 0.60 ) + ( Year N+1 * 0.15 ) + ( Year N-1 * 0.15 ) + ( Year N+2 * 0.05 ) + ( Year N-2 * 0.05 )

In Charleston’s case:

( .109 * 0.60 ) + ( 0.120 * 0.15 ) + ( 0.072 * 0.15 ) + ( 0.048 * 0.05 ) + ( 0.06 * 0.05 ) = 0.100 Rbat/PA

9) Estimate the player’s games played into an MLB schedule based on his in-season and/or career durability records.

We will credit the player all the games he actually played then apportion the rest of the destination league’s schedule based on his career games played per team game.

( ( 154 – team games ) * career games / total team games ) + games

The St. Louis Giants played 79 games, and Charleston appeared in 77 of them. At the career level, we only count those games in seasons we will include in his final MLE. We are not counting 1915 and 1916, so Charleston played 1132 games among his team’s 1233 contests, or 92%. Thus

( ( 154 – 79 ) * 0.92 ) + 77 = 146

We cap this at 95% of the destination league’s schedule to avoid having too many seeming iron men.

10) Apply the destination league’s PA/game to those estimated games

The 1921 NL had 4.26 PA per game per lineup slot, so 146 * 4.256 = 621 PA

If we stopped here, Oscar would have 62.3 Rbat, which is a healthy total. No we’re going to fine-tune our playing time estimates and then work on running, DP avoidance, and fielding.

11) First, we look at any batting seasons at the beginning or end of a player’s career. If they are well below average, we will not consider them part of his MLE—either he’d have been in the minors or, at the other end of things, aged out of the game. A general rule of thumb is that after age 38, two seasons well below average batting is probably enough to retire someone.

Charleston got a lot of playing time at ages 18–20. As we’ve demonstrated previously, almost no one appears very often at that age. But players do appear. Charleston’s equivalent production for those years isn’t bad, so we’ll include them, but instead of as a full-time player, we’ll give him a very small number of PAs at 18 and 19, with an increasing number at age 20, then roughly full-time play from age 21 onward.

Charleston also played deep into his forties. Most players start to sputter badly around 37 or 38. But his last productive equivalent year is probably age 37. So we’ll wind up his career at age 39 with sharp decreases in his playing time.

12) Next, we look at the player’s biographical material. If there are any injuries that would keep him out of the lineup, we see if our PA estimate reflects it appropriately. Because of the less stable nature of the Negro Leagues, we also look for league/team jumping, and other oddball movements that would affect playing time but didn’t occur in MLB. We adjust accordingly.

Charleston rarely missed a game, and we could find no evidence of extended absence due to injury. His 1919 season does have a weird two-team element to it that sheds light on how careful we need to be. The amazing Gary Ashwill told us that in 1919, Charleston played in 24 of the Chicago American Giants’ 24 league games through August 3rd. However, in the last days of July, race riots broke out in the city, and the American Giants’ home field (Schorling Park) was occupied by the Illinois state militia, forcing the cancelation of a series with the Atlantic City Bacharachs. The Giants were in Detroit when the riots erupted and stayed put but had no games scheduled elsewhere. Meanwhile, the Detroit Stars and Hilldale Club were about to kick off a series. So the three clubs ended up played a three-way doubleheader on August 3rd. Charleston played in the first game for his own team versus Detroit, then for Detroit in game two, while other Chicagoans played for Detroit as well. He played in three others for Detroit as well. Charleston rejoined his own club for a contest with the Stars on August 9th, and overall played 17 out of their final 18. If you just looked at the stats, it would appear that Charleston played 41 of 42 for Chicago and 5 of 41 for Detroit, or 46 out of a possible 83 games. Or perhaps you’d split the dfference and call it 46 of 63 or something. In fact, he played 24 of 24 for Chicago, 5 of 5 for Detroit, and 17 of 18 for Chicago, for a grand total of 46 of 47. Only in the Negro Leagues.

It would be good for us to quickly discuss winter league play here as well. We include it for sure, and we ultimately combine it with summer play. First, however, we work it through all the steps for translating offensive performance. Right before step 8, however, we combine any same-season stints by taking an average weighted by PAs. For purposes like this, we consider Opening Day of the Negro Leagues season as day one of a given calendar year. So the 1927–1928 winter ball season, or any winter ball played in the first three or four months of 1928 all counts toward 1927’s batting record.

13) Now we look for players with similar careers and styles to our man. We look for players at his position (or a similar one), roughly during his time, with similar offensive and defensive profiles (OPS+ and Rfield are good barometers here), and who fall roughly within a similar number of seasons as a regular. Once we have a bunch of them identified, 10–12 is best when possible, we look at their playing time, especially for and across the ages that we are including in our MLE. We look for a rough number of plate appearances to shoot for.

Charleston is a little difficult because the only centerfielders prior to the 1960s who hit like him are Cobb and Speaker. Charleston’s record doesn’t suggest that he was able to keep the high-octane performances of his twenties going into his thirties. More like Ken Griffey, Jr., in this regard than Cobb and Speaker. So centerfielders actually don’t provide a useable set of comps.

So we turned to heavy-hitting corner outfielders who aren’t on the Ruth level: Rbat greater than 300 but not above 600. Defensively, Charleston wasn’t great anywhere in the outfield, at least according to DRA. We take DRA’s arm runs out of the picture entirely because they aren’t all that trustworthy. But Charleston was a good first baseman. So, in general, we’re not overly worried about comping defensive performance, except we don’t want any slow-footed sluggers because Charleston was athletic, at least until he put on weight in his late twenties and thirties.

The list that BBREF’s Play Index spat back included: Al Simmons, Paul Waner, Sam Crawford, Jesse Burkett, Fred Clarke, Goose Goslin, Willie Keeler, Zack Wheat, Sherry Magee, Joe Kelley, Harry Heilmann, Joe Medwick, and Ed Delahanty. These guys had a median 9519 career PA. But we also need to account for differing schedules, especially prior to 1904. When we adjust everything to a 154 game slate, these come out at 9622 on average and a median of 9992. So our target looks like 9600–10000 PA.

14) Armed with the information in #11, we adjust very early and very young seasons first to fit both general trends among all players and specific trends among comps. This should get us pretty close to our goal. Beyond that, we can add or trim as necessary to get to a career with a reasonable playing-time approximation.

Among the comps we’ve selected, none played at age 18, four at age 19 (averaging 60 PA), seven at age 20 (average 145 PA, median 57). At age 21, they stepped up to about 350–400 PA, then from age 22 through 34 were full-time players. They began to sputter at 35, at age 36, as a group, their playing time is in full decline, and we can start to hear the death rattle at 37. Just three of them appeared after age 39. So this gives us a good idea of the shape of a career, and we’ll use Charleston’s actual production in concert with this information from comps and our info on very old and young players noted above to create estimates for his time. In 1921 specifically, Charleston was 24, and the 621 PA initial estimate we created is solid.

Now for baserunning.

15) For each season with the necessary information, find a player’s stolen bases per opportunity.

With no play-by-play information, we calculate opportunities as times on base minus extra-base hits. So for Charleston in 1921 we get:

32 SB / ( ( 123 H – 44 XBH ) + 41 BB + 5 HPB ) = 0.256 SB/OPP

16) Find the same for his teams.

By the same formula, the St. Louis Giants stole 0.161 bases per opportunity

17) Find the same for his originating leagues.

The league’s SB/OPP was 0.12.

18) Adjust his rate for his own team’s tendency to run or not.

( lgSB/OPP / tm SB/OPP ) * SB/OPP

For Charleston:

0.12 / 0.161 * 0.265 = 0.198 adjSB/OPP

19) Compare the team-adjusted rate to his league and figure his percentage of steals above or below league average.

0.198 adjSB/OPP / 0.12lgSB/OPP = 159%

Charleston stole 59% more bases than his leagues.

20) Now, find MLB players from the PBP era with similarly long careers and find similar percentages of SB above the league. Because Negro Leagues boxscores may not always have carried stolen base info, it’s OK to pad this by as much as double for players with very speedy reputations, so, for example, 125% of league becomes 150%. (This is kind of a pain, so we’ve only used four to six at a time, 10 would be better.)

Charleston was known as a fast player, at least early on, so we’ll give him a little padding. We located a few long-career players in MLB’s play-by-play era who stole 70% to 90% more than their leagues: Barry Bonds (+79%), Paul Molitor (+88%), Craig Biggio (+74%), and Omar Vizquel (+82%).

21) Find the average Rbaser/PA of those MLB comps and apply it on a per-season basis to the player’s estimated PAs.

We figured this and expressed it per 10,000 PA to give us some context. As it turns out, these guys averaged 35 Rbaser/10,000 PA. We decided to push up to +40 because these fellows’ lines include a lot of seasons after age 37 (when Charleston is going to get his last big hit of playing time) and Gant broke a leg in the middle of his career.

22) If the candidate has a pronounced decline in his net steals versus the league, sculpt the trajectory of his running runs appropriately.

That said, Oscar’s stolen base totals went into the toilet after age 31, so we’re going to need to stack up most of his baserunning value in the first half of his career then decrement him after age 31. When we did this, we got 37 total, and for 1921 he earns 3 runs on the bases.

Now for GIDP avoidance.

23) Identify lots of MLB batters of the same handedness, similar Rbaser, and similar career length, and calculate their Rdp/PA.

We only take this step for seasons after 1948. BBREF uses play-by-play data to determine how man double play opportunities a player had and then compares the player to the league’s average. We can’t do that, so we do our best. Handedness, speed, ground ball/fly ball rates, and strikeout rates are the main determinants for GIDP rates. Most of the time, we only have the first two, so we find comps with the same career length based on handedness and our Rbaser estimation. We see how many Rdp the comps had per PA.

24) Apply the group’s average in Step 23 to each season of the player’s career.

We don’t need to do this for Charleston since 1921 is before our 1948 cutoff. In fact, in most cases, we won’t need to do this. But if BBREF should add more PBP and calculate Rdp for pre-1948 seasons, we’ll follow suit.

Now, for fielding, things are going to get really mushy.

25) Determine the player’s position for a given season or career by examining where he played in the originating league and how well he played there. If he started at shortstop, was bad at it, then moved to another position and was average or good, we might consider putting him at the latter position all the time. This is a subjective judgment, and we should look at real big-league careers for examples.

Charleston was known to play a very shallow centerfield and was often compared to Tris Speaker. However, he put on weight and was essentially done as a top-flight outfielder by his late 20s. Seeing this, we’ve kept him in centerfield until age 28, shifted him to leftfield until age 32, then to first base for the rest of his career.

26) Find the player’s DRA/G rate in his originating league at whatever position or positions he will be placed at in our MLE.

For an outfielder, we probably need to disregard DRA’s arm value. In fact, Charleston’s arm value is negative, but he was known for having a rifle, so we need to build that into our estimate. In addition, Charleston’s career range value for centerfield is negative, but we are missing six seasons of defensive stats for him. When the numbers and the stories of a career don’t match, we can’t dismiss the narrative. So we’ve given Charleston an average of 0.05 fielding runs per game, which figures to around 6 or 7 runs in a full season. For a sense of scope, Andruw Jones and Kevin Keirmeier rack up 20+ runs in their best seasons. Let’s run through the rest of his career too.

We are going to place him in left field beginning at age 29. We learned earlier that Charleston was slowing down rapidly around this age due to putting on weight, so in leftfield, he’ll start out above average by about the same rate he was in center, then quickly drop down to below average in four years. Finally when he moves to first base, where he had decent DRA totals, we’ll make him a little above average. He ends up with +51 defensive runs.

November 26th, 2017: Please note that we’ve created a more objective means to generate fielding estimates, so step 26 is now out of date. Here’s what step 26 looks like now for Charleston:

26) Find the player’s career DRA/154 games.

Charleston’s career DRA/154 games in centerfield through age 29 (when we’ll move him to left field) was 14.8 runs.

26a) Find the fifty players with the highest number of appearances in the Negro Leagues at the player’s position (because the Negro Leagues Database lacks DRA for some seasons where it lacks games, only count the appearances in seasons that include DRA), and for each figure their career DRA/154 at that position.

26b) Find the standard deviation of DRA/154 among the 50 players in 26a.

That standard deviation is 17.01.

26c) Repeat the process in steps 26a and 26b for the major leagues for the same period of time the Negro Leagues Database covers (1887––1945 as of this writing), substituting BBREF’s Rfield for DRA. (Technically, I used the BBREF Play Index, setting it to seek shortstops with more than 50% of their games at the same position, and returning their career Rfield. This will have to be close enough because otherwise, we’d be at it for weeks.)

For MLB it is 3.14.

26d) Divide the player’s DRA/154 (step 26) by the standard deviation of the Negro Leagues players at his position (step 26b) then multiply by the standard deviation result for MLB players in step 26 c. This is the player’s MLB career Rfield/154.

14.8 / 17.01 * 3.14 = 2.73 runs/154 games

27) Apply that rate to the number of games in a season imputed by the estimated PAs we’ve assigned earlier in this process.

We mentioned just now that we’re giving him 0.50 Rfield per game, which in 1921 will net him 7 Rfield in 146 games.

With our new method, Charleston will get 2.73 runs / 154 games * 146 games = 2.6 Rfield

28) Check whether his defensive performance declined over time, and make any seasonal adjustments necessary to mirror that.

We covered all of this a little earlier.

29) Double check against real MLB careers to see if the number of fielding runs generated are reasonable.

Yes, we think so. From 1871–1960, among players who played at least 40 percent of their games in centerfield, +31 runs would place 24th between Steve Brodie and Terry Moore. The true greatest like Speaker, Carey, Ashburn have more than 80 Rfield, and many of the players with higher Rfield totals have many fewer PAs. Additionally, some 19th Century players, playing under shorter schedules, would rank higher than Oscar given a 154-game slate, or if they already rank higher, would put much more distance between them and him.

Now we’ve got all the difficult stuff out of the way, and we’re on the WAR express.

30) Figure positional runs. We do this exactly the same way that BBREF does here, based on the position we have assigned the player.

In 1921, Oscar accumulates -3 of these. Centerfield was a much more offense-oriented position then than now.

31) Now for each season we add the player’s Rbat, Rbaser, Rdp, Rfield, and Rpos to get his Runs Above Average (RAA).

In 1921, we estimate equivalent values of 62 Rbat, 3 Rbaser, 3 Rfield, and -3 Rpos, which total to 65 RAA. Recall that we didn’t calculate Rdp, but if we did, we’d include it here.

32) To convert that to Wins Above Average, we follow BBREF’s instructions here,

That reckons to 6.4 WAA.

33) We calculate replacement runs (Rrep) just as BBREF instructs us here.

21 of those Charleston in 1921.

34) Next we turn those Rrep into the player’s replacement-level wins per BBREF’s instructions here.

2.2 of those.

35) Finally, we add the WAA and replacement-level wins to get WAR.

And 17 hours later, we’ve got him at 8.6 WAR. That figure would have placed second in the 1921 NL and third in MLB among hitters. Here’s the leaderboard if we include pitchers:

  1. Babe Ruth: 12.6
  2. Red Faber: 11.0
  3. Rogers Hornsby: 10.8
  4. Oscar Charleston: 8.6
  5. Urban Shocker: 8.5
  6. Burleigh Grimes: 8.0
  7. Dave Bancroft: 7.4
  8. Sad Sam Jones: 7.3
  9. Carl Mays: 7.5
  10. Frankie Frisch: 6.9
  11. Harry Heilmann: 6.8

At the career level, among all hitters from 1871–1960, Charleston ends up with:

  • 9910 PA (22nd)
  • 626 Rbat (14th)
  • 62.8 WAA (13th)
  • 98.8 WAR (12th)

Given Charleston’s reputation as among the very elite of Negro Leagues players, this MLE could be conservative. And that’s OK if it is because we’re still missing a little bit of data for him (1929 summer plus some winter seasons), and because we’d always rather underpromise and overdeliver, at least metaphorically speaking. What I mean is that if we come in with numbers that are sky high, they aren’t going to be credible. We need to arrive at estimates that resemble real MLB players. It’s too easy to trumpet someone’s greatness on the basis of our figures then realize we’ve made an embarrassing error in logic or computation or data entry. When a player has less data attached to him than Charleston (for example, see our write up on Bullet Rogan), we can’t go all-in on half a career. We need to temper our estimates in the absence of data that could just as easily deflate them as inflate them. In general, we got to be careful. We want this to be about the players, not about us.

We’re exhausted from mansplaining all of this. We can only begin to imagine the depth of horror you’ve experienced reading along. If you have suggestions for improvements, we are all ears. We aren’t perfect at this, we’re just trying our best. Next, we’re going to evaluate position players whose Negro Leagues careers launched them into the Hall of Fame or Hall of Merit. We’ll begin with catchers, so fans of Josh Gibson, Biz Mackey, Louis Santop, and Quincy Trouppe should tune in for that one. And we’ll also give you a delightful bonus surprise.

Advertisements

Institutional History

Advertisements
%d bloggers like this: