For those who like the gory details, this post’s for you. I’m going to go into exhausting detail about our latest MLE method. For those of you don’t like the details, see you next Thursday. Or maybe the Thursday after because I’m running the complete method for hitters this week and for pitchers next week.
However, I have given you a couple treats. Visit the Negro Leagues section of the site and you will find links to the brand new MLEs, both career value lines and yearly lines for selected players.
The following document outlines the process for creating major-league equivalencies for Negro Leagues position players. You’ll first see a short prose version for those who only want the gist. For those who want to know exactly how the sausage is made, a more fully elaborated version comes after that which includes an example from Oscar Charleston’s career.
This approach has taken several years to finally narrow down to and a great deal of support from many people, especially Gary Ashwill, Kris Gardner, Kevin Johnson, and Howard Miller. If you find any ways to improve this routine, please email me at firstname.lastname@example.org. I’m always looking for opportunities to improve.
Abbreviations and Assumptions
- G: Games
- PA: Plate appearances
- QoP: Quality of Play
- RAA: Runs above average
- Rbat: Batting runs above average
- SD: Standard deviation
- wOBA: Weighted on-base average
- wRC: Weight runs created
- Z: Zscore
- Rbaser: Runs from baserunning
- SB/G: Stolen bases per game
- Rdp: Runs from avoiding grounded into double plays
- DRA: Defensive Regression Analysis figures from the Negro Leagues Database
- WAA: Wins above average
- WAR: Wins above replacement
The process will be described from a single-season point of view. We will refer to the single season in question as n. We will also mention seasons
- n+1: The season immediately after n
- n-1: The season immediately before n
- n+2: The season two years after n
- n-2: The season two years before n
- Find the player’s wOBA.
- Use his Z to recontextualize his wOBA into an MLB context.
- Turn his MLB wOBA into wRC and adjust that figure for his original league’s QoP.
- Determine how many RAA he would generate per PA in MLB.
- Estimate his MLB playing time in PA and G; multiply his PA by #4 to determine his MLB Rbat.
- Find the player’s SB in his original league, they will count toward his total.
- Subtract his G from 154 and multiply the difference times his career SB/G rate.
- Add #1 and #2 together and match that total with the Rbaser table in the appendix.
- Create a set of players of comparable handedness, playing time, and Rbaser and determine their Rdp/PA
- Determine their Rdp by multiplying #1 times their PA
- Find the player’s career DRA at all positions.
- At each position divide career DRA by games played at that position.
- Using the typical fielding trajectory of MLB players (see Appendix), assign a fielding value to each season of a player’s career based on the position he plays in each year and his career fielding rate.
- Use the WAR explainer at Baseball-Reference.com to determine the player’s positional value
- Calculate the player’s WAA and WAR as closely as possible to Baseball-Reference.com’s instructions.
We’ll work through Charleston’s fabled 1924 season. That will appear in blue.
1) Gather a player’s known statistics from the Negro Leagues, MLB, the minor leagues, and any foreign summer or winter leagues.
- Only use seasons where league-wide data is available
- Only use seasons where the following data is available for all players: G, AB, H, 2B, 3B, HR, BB; HPB, SB, IBB, and SH are also helpful but not required.
- Do not include All-Star games, playoff/post-season games, nor short winter series that pit white teams against Black or Cuban teams against Black
- It’s not possible to compare a player to a league-average player in a short series—to see why, imagine having five Toyotas and five Ferraris and trying to define the average car.
In 54 games, Charleston batted 205 times, picked up 83 hits, 22 doubles, 5 triples, 15 homers, 20 steals, 28 walks, and 3 sacrifices. Hit-by-pitch data was is not available for this season.
2) For season n, find the player’s weighted on-base average (wOBA)
I scale all seasons, Negro Leagues or otherwise, to a .330 league-average wOBA for ease of interpretation using the wOBA calculator at Triples Alley. In addition, I’ve used park factors supplied by the amazing Kevin Johnson, and where he or BBREF doesn’t have any, I’ve created my own. Eric[dot]Chalek[at]gmail if you want more information on this process. Using the linear weights derived from Charleston’s 1924 Eastern Colored League plus the 0.9812 park factor from Kevin and scaled to .330, I calculate a nifty .5579 wOBA for Charleston.
3) Determine the player’s wOBA z-score
- Find the league’s SD for wOBA. The ECL had an SD of 0.0816 points of wOBA.
- Subtract the league’s mean wOBA from his wOBA. This means the mean used in calculating SD. In this case, that figure for the 1924 ECL is 0.3155. So, 0.5579 – 0.3155 = 0.2424
- Divide by the SD. 0.2424 / 0.0816 = 2.8243. For those following at home, I’m not carry enough significant digits, so you might be getting 2.9706. Which is absolutely massive. For those following at home who got 2.9706, I’m not carrying quite enough significant digits. But don’t worry, I’m not making this up!
4) Adjust the sample to reach a minimum of 200 plate appearances
- If the player has more than 199 PA in season n, do not adjust the sample, and proceed. We won’t need to adjust for the sample this time.
- If the player has fewer than 200 PA combine the PA from season n, with the PA from seasons n+1 and n-1 this way:
(((200-nPA) * (sum(nPA*nZscore, n+1PA*n+1Zscore, n-1PA*n-1Zscore) / sum(nPA, n+1PA, n-1PA)) + (nPA*nZscore)) / 200
- If the sample is still under 200 PA, add the PA and Zscores from seasons n+2 and n-2, weighting them at 0.6
- If the sample still doesn’t add to 200 PA, include the player’s career PA and career weighted average Zscore.
5) Determine a player’s initial MLB wOBA by placing him into a major-league setting: Default to the NL, but for players who debuted in the AL, start them in the AL. Multiply z-score (#3) by the MLB league SD then add the product to the mean MLB wOBA.
Again, we’re talking about the mean observed when generating SDs for the league. Charleston goes into the NL, which in 1924 had an SD of 0.1183 and a mean of 0.3019. Thus (2.8243 * 0.1183) + 0.3019 = .6360 (sig digs again). That’s what I’ll call Charleston’s twOBA or translated wOBA.
6) Convert wOBA to weighted Runs Created (wRC) to determine the player’s total batting output.
We’re going to do this in order to make a QoP adjustment. We need to do this to a player’s runs created not his twOBA. The calculation goes like this:
((twOBA – 0.330) * PA) + (PA*lgR/PA)
In the first term, we find out how many runs above average he was (remember, we scaled everything to .330), and in the second term, we add in all the runs he created below average. In Charleston’s case it’s ((0.6360 – 0.3300) * 236) + (236*0.1187) = 100.2275.
7) Adjust for quality of play (QoP) by multiplying #6 by the originating league’s QoP adjustment (see table for the ones I use).
For Charleston and the 1924 ECL the QoP is 0.8, so 100.2275 * 0.80 = 80.1820
8) Turn back into wOBA
This calculation is (((#7 – (PA * lgR/PA))/PA) + 0.330, and for Charleston that works out to 0.5511
9) Create initial playing time estimate
Here we combine in-season and career durability and adjust for the destination league’s PA/G
(G + (0.5 * (destination league’s scheduled games – G) * the player’s career ratio of G to team games) + (0.5 * destination league’s scheduled games – G) * G / his team’s games) * destination league’s PA/G/lineup slot
For Charleston that means
(54 + (0.5 * (154 – 54) * (1456/1571) + (0.5 * 154- 54) * (54/55) * 4.254 = 636 PA (rounded to a whole number)
10) Determine player’s final playing time estimate by using trajectories of MLB players by career length and position.
- This is a multistep iteration. First, I start with the initial estimate in #9. With 636 PA in a league with 4.254 PA/G/lineup slot, Charleston would have 150 games worth of PA.
- But, I want to err on the side of conservatism with playing time. It feels overenthusiastic to give players strings of 150+ game seasons, so in this iteration I limit them to 95 percent of games played. For Charleston that takes him down to 147 games and his result PA are 623.
- Then I look at career trajectories. I use the PA by age of players at each position with short, medium, medium-long, and long careers. For a player with a long career like Charleston, I use a 70/30 combination of long-career centerfielders and all long-career non-catchers. Then I figure the average career bulk in PA for a player at a given position for each of our short, medium, medium-long, and long-career groups. I take the lowest PA in each group and subtract from the average. That allows PAs to vary upward from the average if necessary. This is the maximum variance we will allow from the average.
- The average in Charleston’s cohort averages 650 PA, which Charleston doesn’t exceed, so I simply take those 623 PA and round to the nearest ten, 620. However, had Charleston exceed 650, he could have claimed up to 50 more PA based on the maximum variance we calculated in the previous bullet point.
- Special note that in Charleston’s case I combined three positions (CF, LF, 1B) because Charleston spent a lot of time at each.
11) Create a final estimate for RAA/Rbat.
- This is a three-step process. First we turn our result in step #8 into RAA using our initial estimate of PA in the first bullet of step #10. We’ll subtract .330 from the wOBA we calculated in step #8, multiply that by those 636 PA and them divide that product by something called the wOBA scale. It’s a factor that the Triples Alley wOBA worksheet helpfully figures for us, and it’s what brings everything back to scale.
((.5511 – .330) * 636) / 1.0236 = 137.3775 RAA
- Next, we divide those 137.3775 RAA by the initial PA estimate then multiply by the final PA estimate. For Charleston that’s (137.3775 / 636) * 620 = 133.9215
- Finally, we make a nod toward realism. No National Leaguer in 1924 created more than Rogers Hornsby’s 96 runs above average. Therefore, I adjust a player’s RAA down to the league leader’s if it exceeds that total. In this case, I dial down Charleston to 96 RAA. I understand that some people might find this unnecessary or controversial, but I’d rather undershoot and tell you “There might be more” than overshoot and say, “This probably isn’t accurate.” And, importantly, Charleston created these runs in about forty percent of a season. Babe Ruth in a full season produced 116 runs, tied with Barry Bonds for the highest total ever. Is it likely that if Charleston played a longer schedule he would keep that pace up? Might be, but that’s speculative territory. I prefer to be guided by the norms of the times. That doesn’t mean, however, that I’m right or someone else is wrong. You can do it however you want. I just don’t feel comfortable projecting anyone to totals that exceed the Babe’s best season by nearly twenty percent.
Quality of Play Tables
All quality-of-play multipliers are expressed as a percentage of MLB runs. For example, a player in the 1920 Negro Leagues would have 20% of his runs created removed from his MLE so that 80% of his runs created remained.
NEGRO LEAGUES SPAN QoP LGS 1871-1904 0.720 EAS IND WES 1905 0.725 EAS WES 1906 0.730 EAS WES 1907 0.735 NAC WES 1908 0.740 NAC WES 1909 0.745 INT WES 1910 0.750 EAS WES 1911 0.755 EAS WES 1912 0.760 EAS WES 1914 0.770 EAS WES 1913 0.765 EAS WES 1914 0.770 EAS WES 1915 0.775 EAS WES 1916 0.780 EAS WES 1917 0.785 EAS WES 1918 0.790 EAS WES 1919 0.795 EAS WES 1920–1939 0.800 ANL EAS ECL EWL IND NAL NNL NSL 1940 0.780 NAL NNL 1941 0.780 NAL NNL 1942 0.780 NAL NNL 1943 0.780 NAL NNL 1944 0.780 NAL NNL 1945-1948 0.800 NAL NNL CUBAN LEAGUES YEAR LG QoP 1899 PAR 0.620 1900 CUB 0.720 1901 CUB 0.720 1902 CUB 0.720 1903 CUB 0.720 1904 PV 0.720 1904 CUB 0.720 1905 PV 0.721 1905 CUB 0.724 1906 PV 0.723 1906 CUB 0.728 1907 PV 0.730 1907 CUB 0.732 1908 PV 0.731 1908 CUB 0.735 1909 CUB 0.738 1910 CUB 0.746 1911 CUB 0.751 1911 CGL 0.740 1912 CUB 0.756 1913 CUB 0.754 1914 CUB 0.756 1915 CUB 0.760 1916 CUB 0.759 1918 CUB 0.776 1920 CUB 0.770 1922 CUB 0.774 1923 CUB 0.771 1923 GP 0.775 1927 CUB 0.775 MEXICAN LEAGUE YEAR QoP 1937 0.63 1938 0.66 1939 0.67 1940 0.71 1941 0.73 1942 0.69 1943 0.70 1944 0.70 1945 0.68 1946 0.67 1947 0.68 1948 0.67 1949 0.64 1950 0.66 1951 0.66 1952 0.65 1953 0.65 1954 0.64 MINOR LEAGUES LEVEL QOP LGS AAA 0.800 AA IL PCL AA 0.720 EL SALL TL WL B 0.620 FLIN (1952-53) NORW SWLG WINT C 0.580 BSTL CALL FLIN (1949) IIIL NENL D 0.500 AZMX AZTX CMXL FLIN (1947-48) FLOR LONG PROV SWIL WNM Please note that the classification codes for the minor leagues have changed repeatedly over time. The classifications above are simply how I remember them most easily. For more information read this.
The short description above will get you a long way. I make a few other tweaks to adjust for the style of play in the league.
1) Determine the player’s SB/G.
Charleston stole 20 bags in 54 games, or .3704 per game.
2) Determine his league’s SB/G.
The ECL stole 456 bases in 475 games or 1.0387 per game.
3) Compare his rate to the league’s.
0.3704 / 1.0387 = 0.3566
4) Determine the SB/G in the destination league.
The NL stole 754 bases om 1228 games, a rate of 0.6140 per game.
5) Estimate his SB in the designation league in his number of games by multiplying step #3 by step #4 and multiplying their product by his games.
(0.3566 * 0.6100) * 54 = 11.7452
6) Determine how many “remaining” games the player has by subtracting his games from the league’s scheduled games.
154 – 54 = 100
7) Find his estimated SB/G by dividing #5 by his games and then weight it by how many games he played versus his total MLE G
11.7452 / 54 = 0.2175
0.2175 * (54/148) = 0.0794
8) Find his career MLE SB/G by totaling his estimated career SB and dividing by his career MLE games then multiplying that by (1 – (his actual games by his MLE games).
(235 est SB / 1428 actual games) * (1 – (54/148) = 0.1045
9) Sum steps #7 and #8. This will be the stolen base rate used for his remaining MLE games, weighted by his current season rate and his career rate.
0.0794 + 0.1045 = 0.1841 SB/G
10) Multiply step #9 times his remaining games in step #6.
0.1841 * 92 = 16.9372
11) Add step #10, the estimated steals in his remaining games, to #5, the estimate for MLB steals in his actual games and round to a whole number.
16.9372 + 11.7452 = 28.6824, which rounds to 29 steals.
12) Match that total on the chart below and look at the right column in the chart to find the Rbaser/G for the season.
29 steals corresponds to 0.0256 on the chart.
13) Multiply step #12 by his total MLE games to determine that season’s Rbaser.
0.0256 * 146 = 3.7376
We assign Charleston about 4 runs for his baserunning in 1924.
The baserunning MLE is driven by the chart below. I found the G, PA, SB, and Rbaser for every MLB player with 200 or more PA from 1930 (the first year that BBREF bases Rbaser on play-by-play data) to 1960 (the last year before the stolen base really came back into the game). I found that the relationship between SB/PA or SB/G to Rbaser were very weak. Not worth relying on. On the other hand, when I grouped the seasons into buckets based on how many steals a player had, a clear pattern emerged that linked the number of steals in a season to Rbaser. So I created this table by drawing a line from the typical Rbaser for players with zero steals through those with 30 or more steals. All the increments along the way got a value, and the whole chart looks like this:
SB RBSR 0 -0.005 1 -0.004 2 -0.003 3 -0.002 4 -0.001 5 0.000 6 0.000 7 0.001 8 0.002 9 0.003 10 0.004 11 0.005 12 0.005 13 0.006 14 0.007 15 0.007 16 0.007 17 0.008 18 0.008 19 0.009 20 0.010 21 0.011 22 0.012 23 0.013 24 0.014 25 0.017 26 0.019 27 0.021 28 0.023 29 0.026 30 0.031
I’ve been struggling to find a good baserunning estimate for years. This is probably as good as it gets. However, the great Gary Ashwill has reminded me that stolen bases were necessarily recorded in box scores in every newspaper. Philadelphia, in particular, seems to be bad. Just keep in mind that there are error bars around these figures.
We’re now going to turn our attention to avoiding the dreaded GIDP, which Baseball-Reference.com includes among the sources of value for batters. It’s a small thing, but everything counts.
This is only applicable for seasons after 1929 because BBREF does not provide this information due to lack of play-by-play data prior to that.
1) Find the player’s batting handedness, his career MLE PA, and his career Rbaser estimate.
Charleston would not be allocated any Rdp for 1924 because it’s prior to the 1930 cutoff for Baseball-Reference.com’s play-by-play-based calculations, but we’ll proceed anyway for the sake of the example. For Charleston, these are left, 11890, and 42.
2) Draw up a list of hitters of the same handedness, rough career length, and career Rbaser total.
BBREF’s Stathead makes this simple. Subscribe today, it’s cheap. I’m always aiming for twenty or more comps, but with long-career lefties and switch-hitters, this proves quite difficult. I usually try to get within on thousand PAs on either side of the player’s MLE PAs, and within five of his Rbaser. But sometimes you have to expand outward. In this case, after numerous attempts, I had a list with just four names on it: Brett Butler, Chase Utley, Larry Walker, and Barry Bonds.
3) For each comp, find his Rdp per PA.
You’ll have to ask Stathead to provide the Rdp as it doesn’t come up by default.
4) Average all the comps’ Rdp per PA.
Butler, Utley, Walker, and Bonds average a rate of 0.0021 Rdp per PA.
5) Finally, for each season of the player’s career, multiply #4 by his MLE PA for the year.
In 1924, we estimate 620 PA, so 620 * 0.0021 = 1.302 Rdp. He gets about 9 runs for his career this way, primarily because most of his career took place prior to 1930.
This may change. Baseball-Reference.com has very recently added Rfield calculations for all Negro Leagues players. Currently this method only includes DRA found at the Negro Leagues Database.
OK, this may seem a little convoluted, but it’s worth it because it pushes more fielding value into those seasons when players are normally at their fielding peak, which is appropriate. This example assumes only one position per player, but you can easily adapt this for players with more than position.
1) Find the career fielding games and DRA for the player. For outfielders, use only his Range runs, skip the arm runs. Also, only include fielding games from among those seasons where a player’s DRA is calculated. Finally if a player switched abruptly, use only the data from seasons he played the position.
Charleston played centerfield until about 1929. From 1915 until then he played 832 games there. He accrued 51.2 DRA.
2) Determine the player’s DRA/154 games.
Divide the DRA by the games and multiply by 154. For Charleston that makes 9.5 runs a year.
3) Adjust for sample. If the player has fewer than 308 games at the position (two seasons’ worth of games), multiply #2 by his defensive games at the position by 308.
Charleston exceed 308 games handily in centerfield.
4) Transform into Rfield by using the first table below.
We want to create a presentation that’s familiar for people, so we’re going to take the less heralded DRA and turn it into BBREF’s Rfield. We’ll use our friend z-scores for this.
- Divide the player’s DRA/154 by the figure for his position in the middle column of the table below (nglSD) to find out how many SDs he is from average.
- Multiply that figure by the corresponding figure in the right column (mlbSD).
So we divide 9.5 / 17.0 to get 0.56, and we multiply that by 3.1, which gets us 1.7 Rfield per 154 games.
5) Multiply his career MLE games at the position by step #4, his Rfield per 154 games.
I assign players one position per year, which means I need to add together games from all seasons where I assigned him that position. In Charleston’s case, that’s 1678 games. Multiplying it by 1.7 Rield per 154 games, I get 19.1 career field runs in centerfield.
6) Distribute the player’s career runs based on the second table below.
This is harder to explain than to do.
- Find the median value on the table for all ages he played the position. The median for Charleston (ages 18 to 31) is 3.3 runs.
- Subtract that median from the value for his age in the season in question. Charleston was 27, and the value for that age is 3.3. Subtracting one from the other gives us 0 runs.
- Add that difference to the player’s career Rfield per 154 at the position: 0.0 + 1.7 = 1.7
- Multiply the estimated career Rfield at the position in step #5 by the ratio of the figure we found in step #6 and the figures derived from all seasons at the position. For Charleston that’s 19.1 career Rfield * (1.7 Rfield/19.1 Rfield) = 1.9 Rfield
- Finally, we need a step to correct the previous step in case it doesn’t generate the same career total we calculated in step #5. To do that we will divvy up any excess Rfield by playing time this way: The previous bullet’s total + ((our estimated career total minus the sum of Rfield at the position estimated in the previous step for all seasons at that position) * his MLE games that year, divided by his career MLE games at the position)). Charleston’s total’s matched so no worry. And he ends up with 1.9 Rfield for the season.
NLDB DRA TO RFIELD BY POSITION POS nglSD mlbSD C 10.02 1.64 1B 8.3 3.27 2B 18.42 5.81 3B 18.4 4.75 SS 18.47 5.81 LF 15.12 5.81 CF 17.01 3.14 RF 14.14 3.01
The following table charts career fielding trajectories for all players. It’s based on the average DRA per 154 games of long-time players at all positions.
FIELDING CAREER TRAJECTORY Age DRA/154 17 -1.0 18 -2.4 19 2.1 20 3.4 21 3.7 22 3.7 23 4.1 24 4.1 25 4.2 26 3.8 27 3.3 28 3.0 29 2.5 30 2.1 31 1.5 32 1.3 33 0.4 34 0.2 35 -0.5 36 -1.1 37 -1.3 38 -2.3 39 -2.1 40 -1.8 41 -2.0 42 -1.1 43 -2.0 44 -2.9 45 -6.4 46 -4.7
This is where I wave you goodbye. I try to stick as closely to Baseball-Reference.com’s value calculations as I can. That includes runs based on position, replacement runs, and all the background calculations for WAA and WAR. Please refer to their explainer for all details.
Once we do this for Charleston, we get an absolutely mammoth 11.6 WAR season, something straight out of the Babe Ruth catalog. He was one hell of a player.
Now you know how I do it. If it seems complicated to you, that’s because it is complicated. There’s a lot of steps. But those steps are there for good reasons, and I hope the reasoning behind them is at least somewhat visible. It’s all in my noggin, you know, and when it comes to MLEs, I’ll quote Bob Dylan in “From a Buick 6,” “I need a dump truck…to unload my head.” Please feel free to ask clarifying questions in the comments. Happy to answer.