We reckoned “missing” value that put Sam Rice and Bobby Doerr into the Hall of Miller and Eric at Vlad Guerrero’s expense. You probably would like to know how we did that. You’d probably also like to know how other important players might have been effected. You’re in luck, so read on.
This is the first of a three-part series on missing value from the 1930s and 1940s. This time, we’ll share how we are estimating that value. In part two, we’ll show you how we think it will affect our ratings and perceptions of the players we’ve figured so far. Finally, in part three, we’ll look at one player from the era before whose status might be raised when play-by-play stats from his period become available and identify a couple others like him.
Why this is now possible
Thanks to a major Retrosheet data release last year, Baseball-Reference, the greatest site ever on the internets, displays play-by-play (PBP) information going back to 1930. Not all of it, and more on that later, but an extensive enough amount. Despite the availability of this data, BBREF hasn’t yet recalculated its WAR values for the 1930s and early 1940s. But since the data is there, we can start making some estimations of our own. And that’s what this process we’re going to describe is about. You might say that our results represent rough predictions of what we believe will occur when BBREF is able to calculate WAR for all players back to 1930.
Three components of BBREF’s WAR values rely especially on play-by-play data:
- baserunning (rBaser)—BBREF provides a regression-based estimate prior to 1948
- double-play avoidance (rDP)—BBREF provides no estimate prior to 1948
- outfield throwing (rOF)—BBREF provides no estimate prior to 1953.
We don’t have access to BBREF’s precise formulae, and neither Miller nor I knows SQL, so we probably can’t replicate BBREF’s equations precisely. But we can get defensibly close, and here’s how.
Estimating baserunning value
As best we understand BBREF’s explanations, they sum several sub-component values up to a final rBaser total. These runs seem to be based solely on five kinds of plays occurring after a batter has reached base and the next plate appearance begins:
- Stolen bases
- Bases taken without a batted ball
- Extra bases taken on a base hit: first-to-third on a single, second-to-home on a single, and first-to-home on a double
- Outs on base.
These are the data that BBREF shows, they may use other data they do not present, and I may have some of these things wrong. Anyway, each player’s rBaser is expressed as a value above the league average in each department. With just a couple rough-edged hacks, we can make a pretty good estimate of each sub-component. By the way, I’m going to use two important abbreviations below:
- p = player (not probability)
- lg = league (specifically the league average)
I try to keep things simple, and this is very straightforward. Because these are all expressed as an average, the simplest form for stolen bases goes like this:
rSB = (p steals runs – lg steals runs) – (p caught stealing runs – lg caught steal runs)
Of course, we simply use the player’s raw steals and caught stealing, but we need to fill in three pieces of information:
A) the league’s steals and caught stealing
B) the run value of a steal
C) the run-value of a caught stealing
It’s relatively simple to determine A, we merely apply the league’s stolen base percentage to the number of steals attempted by the player. For caught stealings, we use 1 minus the stolen base percentage times the number of attempts.
Letters B and C are well beyond my pay grade. So, to keep it simple, I borrowed values from Jim Furtado’s Extrapolated Runs. This regression-based formula uses weights of 0.18 runs per steal and -0.32 runs per caught stealing. Many of you are getting ready to yell at me that Extrapolated Runs is only good for seasons 1955–1997. Yeah, I know, and it’s good enough for gub’ment work. We aren’t building Fort Knox here. So our final stolen base formula looks like this:
rSB = ((pSB * 0.18) – (lgSB% * pSBATT * 0.18)) – ((pCS * -0.32) – ((1-lgSB%) * pSBATT * -0.32))
Excel makes quick work of it. One important note. We don’t have caught stealing data for the NL until the late 1940s. So, for a given player, I took the years he played that have PBP for and averaged the AL’s stolen base/caught stealing percentage for the entire span. If someone played in the NL fom 1930–1936, I used the AL’s averages from that specific time.
For pickoffs, we adapt the same formula. Pick offs are essentially caught stealings, and, of course there’s no positive value to them. It’s all about avoiding pickoffs.
rPO = (pPO * -0.32) – ((lgPO / lgSBOPP) * pSBOPP * -0.32)
SBOPP = stolen base opportunities
Bases Taken (without a batted ball)
These include advancing on past balls, wild pitches, defensive indifference, and similar events. Take the pick off formula and treat it like a stolen base.
rBT = (pBT * 0.18) – ((lgBT / lgSBOPP) * pSBOPP * 0.18)
Extra Bases Taken
Like pick offs and bases taken, this sub-component and the next (Outs on Base) are flipsides of one another. But we use a different denominator, and we have to figure out our own run values. BBREF’s baserunning stats include the number of opportunities a player had to go first to third, second to home, and first to home. They also include how many times he did so successfully. So we are going to sum the values of all three events. They each have a different run value, but they all have the same basic formula:
rXBT = r1to3 + r2toH + r1toH
r1to3 = (p1to3 * run value) – ((lg1to3 / lg1to3opp) * p1to3opp * run value)
r2toH = (p2toH * run value) – ((lg2toH / lg2toHopp) * p2toHopp * run value)
r1toH = (p1toH * run value) – ((lg1toH / lg1toHopp) * p1toHopp * run value)
- 1to3 = first to third on a single
- 2toH = second to home on a single
- 1toH = first to home on a double
- opp = opportunity
Easy enough. The hard part is the run values. To figure these, we turned to TangoTiger’s run expectancy chart. (Yeah, I know, it’s for 1955–2015, but close enough.) We averaged the change in run expectancy for a successful extra base taken in each base-out state matching the formulae above. Here’s the values:
As usual, we were trying to keep things simple, so we didn’t get real deep into the weeds and adjust them for things like the frequency of each base-out state. We plug the values back in and get this:
(p1to3 * 0.19) – ((lg1to3 / lg1to3opp) * p1to3opp * 0.19)
+ (p2toH * 0.43) – ((lg2toH / lg2toHopp) * p2toHopp * 0.43)
+ (r1toH = (p1toH * 0.38) – ((lg1toH / lg1toHopp) * p1toHopp * 0.38)
Man, am I grateful for Excel.
Outs on Base
Like I said, this is essentially the inverse of rXBT, except that we got a bit lazy by this point:
(pOOB * -0.32) – ((lgOOB / (lg1to3opp + lg2toHopp + lg1toH opp)) * (p1to3opp + p2toHopp + p1toH opp) * -0.32)
We used the sum of all three advancement opportunities as our denominator because we figured that most outs on base probably occur in situations where a player is trying to snag an extra base. We also used the XR caught stealing run value. If anything it’s too low, since baserunner kills are very expensive for the batting team.
We tested this setup against players from the post-PBP era. We took two excellent, two average, and two rotten baserunners from each decade 1950–2010, and we calculated their career rBaser by our method then compared the result to BBREF’s calculation. BBREF’s calculation produced 778 rBaser. Our method produced 781. The standard deviation of the difference between these 32 players’ BBREF and Miller and Eric rBaser was 7.9 runs.
The biggest outliers were Joe Morgan (23 runs better by our method), Mickey Mantle (+19), Ichiro (-15), and Willie Mays (+15). No player in the group of excellent runners was misidentified as being average or below. No player in the average group zoomed or plummeted into the excellent or rotten categories. No player among the rotten runners was anything but awful. We feel like our approach at least appears like a relatively quick, easy, and reasonable approximation that will be within roughly 20% of BBREF’s eventual PBP-based figures, and that we will usually be within a win’s worth of runs or fewer.
It’s well worth noting, however, that our method yields very different results than BBREF’s current regression-driven pre-PBP estimates. On the 80 players we tried it on from the 1930s and 1940s, we reckoned 519 runs versus BBREF’s 10. Yes, one-oh. Is this difference explainable? We believe it is. First, because BBREF uses a regression formula, it naturally draws all players toward the mean, which, it appears to us, creates a drastic overfit. Second since we didn’t have this much PBP data when they came up with the regression formula and all that was available were stolen base figures of varying completeness for most of history, the BBREF estimator will naturally suffer from a lack of inputs that would vary the results much more. To see why this is important, just look at someone like Brett Butler whose stolen base percentage was well below the league average, but whose rBaser is very good because he did the other PBP-based baserunning items very well. Finally, the league stolen base percentages in the times we are talking about were terrible. The league seldom stole at even a rate higher than 60 percent. Today’s stolen base rates are nearer 75 percent than 60 percent. This meant that players with an iffy SB% could still make positive (or less-negative) contributions than now, which the BBREF estimator might not pick up.
Estimating Baserunning Before 1930
We have also estimated pre-PBP seasons for players with substantial careers (1000 PA) after 1930. BBREF’s amazing play index makes this possible and easy. Here’s a run-down of the process:
- Determine the age range and PAs of the player in question for those years where we have PBP data
- Using the Play Index, find the rBaser for 20 or so players:
- whose entire careers transpired after 1947
- who had a similar number of PAs during the same age range as the player in question
- whose rBaser was in or near the range bounded by the BBREF and Miller and Eric rBaser of the player in question
- For each comp, note their rBaser / PA
- For each comp, find the rBaser / PA for seasons before the age range queried on, requirement of 1000+ prior PA
- For each comp, subtract #3 above from #4 above
- Find the group average for #5 above
- Returning to the player in question, for each pre-PBP season, multiply his PA times #6 above plus the player’s known Eric and Miller rBaser / PA.
For non-catchers, we don’t include catchers among the comps, nor anyone who played a lot at catcher such as Brian Downing or B.J. Surhoff. Catchers are just a whole different ball of wax. For catchers, we tried to use only catchers.
The upshot is that most players lose speed and baserunning value across time, and this method uses comps to figure out how much similar players lost, which allows us to add back that value to the player in question.
Estimating Double-Play Avoidance Value
After all that baserunning stuff, this one is easy. We once again turn to XR which has a run value for GIDPs of -0.37.
rDP = (pGIDP * -0.37) – ((lgGIDP / lgPA) * pPA * -0.37)
Now here’s another little workaround that some folks might not lack the imprecision of. BBREF calculates rDP as a function of GIDPs per GIDP opportunities. But we don’t have the luxury of that data in all cases. Some GIDPopp data is not complete. Even more importantly, we don’t have full GIDP data for all teams or players over the entire 1930–1947 period that we’re trying to account for. Plate appearances are known across history, and using them as a denominator is close enough for now. We have good DP data for 1939 onward so we can figure the lgGIDP/PA accurately. Before that, however, the data looks a little funky. But for the few years before 1939, we can use the GIDP rate of 1939–1941 or something like that. However, sometimes we need to dig deeper.
When we don’t have complete DP data or any, we can estimate once again by using the comps method. It’s a little simpler than with rBaser. Here’s the rundown:
- Determine the career PAs and the career Miller and Eric and BBREF rBaser
- Using the Play Index, find the rBaser for 20 or so players:
- whose entire careers transpired after 1947
- who had the same handedness
- who had a similar number of career PAs
- whose career rBaser was in or near the range bounded by the BBREF and Miller and Eric rBaser of the player in question
- For each comp, note their career rDP / PA
- Find the group average for #3 above
- Returning to the player in question, for each pre-PBP season, multiply his PA times #4.
Estimating Outfield Throwing
Like rXBT and rOOB above, outfield throwing is reliant on runner-advancement information that BBREF provides for each outfield position. So the big idea here is
pRF throwing value – lgRF throwing value
+ pCF throwing value – lgCF throwing value
+ pLF throwing value – lgLF throwing value
But there’s a lot of granularity because in many 16 scenarios are in play instead of just three for running:
- FO2 (flyout, runner at second)
- FO3 (flyout, runner at third).
Each of those opportunities has three possible outcomes that BBREF provides data for:
- runner advances
- runner holds
- baserunner kill
And each of these three outcomes carries a different run value.
Finally, BBREF has a catch-all category called Other Assists for all outfield assists that don’t fall within the five scenarios above. Let’s make a quick chart, otherwise, we’re in for a long night:
ADVANCES HOLDS KILLS OTHER =================================== 1to3 -0.18 0.19 0.24 ---- 2toH -0.51 0.48 0.87 ---- 1toH -0.42 0.38 0.99 ---- FO2 -0.11 0.11 0.27 ---- FO3 -0.59 0.59 0.64 ---- OTHER ---- ---- ---- 0.50
These run values are based on the run expectancy chart I talked about previously this time augmented by Tango’s events-frequency chart (aka: the average run-value of each of the five scenarios and all three out situations, adjusted for frequency). The Other Assists figure is, to be honest, a guess. I took the average of the run values for kills in the five scenarios above (0.70) and adjusted downward a bit to keep it a little conservative. That last part isn’t awesome, but it’s about the best I can do.
The cluster of equations is pretty ugly, so here’s the one for the first of the five base-out driven scenarios:
r1to3= ((p1to3adv * -0.18) + (r1to3holds * 0.19) + (r1to3kills * 0.7)) – (((lg1to3adv / lg1to3opp) * p1to3opps * -0.18) – ((lg1to3holds / lg1to3opps) * p1to3opps * 0.19) – ((lg1to3kills / lg1to3opps) * p1to3opps * 0.7))
Same form for the other scenarios, just swap in the right data and the right run values. Then the Other Assists equation:
rOA = (pOA * 0.5) – ((lgOA / lgOPP) * pOPP * 0.5)
The OPP mentioned here is a column in BBREF’s outfield throwing table that represents the total number of opportunities the player had to hold or kill a baserunner. For seasons prior to 1930, we simply used the player’s known career rOF per game at position (RF/CF/LF) and multiplied it by the games at the position during the season in question.
Unfortunately, there’s one last step to take, and that’s to divide the rOF by two. After testing these formulas on about 50 players in approximately equal measure among the three outfield spots, the numbers came back consistent at all three positions. Consistently about double what BBREF got. I won’t tell you I’m psyched by this at all. Obviously, I’m doing something wrong, but I’m doing something right because the guys who are good throwers according to BBREF are good throwers for us too. And the bad ones are bad. And the ones nearest average are again nearest average. Not as perfectly as with baserunning but reasonably close. So that’s my solution. Divide by two. Inelegant, yes, defensible, probably not, but not entirely without merit, and for a ballpark estimate, it’ll do.
Let me run through one example to show you how it all fits together. Here’s Mel Ott’s 1939 season.
Ott stole 2 and was caught 3 times, which yields about -0.6 runs. Using the AL’s SB% during Ott’s career (1930–1947, 59%), we figure the league would have netted -0.1 runs. So there’s -0.4 runs for stealing.
Ott was never picked off in 1939, and we don’t dock him any runs. The league would have been picked off about once, for -0.32 runs. Ott claims +0.3 runs for pickoffs.
Bases Taken (non-hits)
Ott took 5 bases in 1939, accounting for about 0.9 runs. The league would have earned about 1.4 runs in his opportunities. Chalk up -0.5 runs for Bases Taken.
Extra Bases Taken on Hits
Ott went first-to-third 8 times in 29 opportunities; first-to-home 2 times in 4 opportunities; and second-to-home 13 times in 16 opportunities. That all results in about 7.9 runs. In the same opportunities, the league would have gained 7.4 runs. So Ott is +0.2 runs for Extra Bases Taken.
Outs on Base
Impressively Ott made no outs on base and loses no runs. In the same opportunities, the league would have been debited -1.6 runs. Ott picks up +1.6 runs for Outs on Base.
So we estimate total baserunning value for Ott in 1939 at +1.2 runs versus the 0 that BBREF shows.
Double Play Avoidance
In 1939, Ott hit into 5 deuces. We would expect the league to rap into about 10. After we apply the run value and subtract, Ott comes out +2.0 runs ahead. BBREF does not calculate DP avoidance prior to 1948
To save all the messy calculating, Ott earned about +1.2 throwing runs in rightfield. I’m not sure how BBREF gets assists information into its calculations. I use DRA and weight it 2:1 against BBREF’s rField, but I swap in BBREF’s rOF because DRA’s arm ratings aren’t good enough. DRA shows -0.9 runs.
Master Melvin picks up 4.4 runs overall. Against BBREF, he adds 3.2 runs and an unknown amount for throwing. Against DRA, he tacks on another 2.1 runs in throwing value. Rinse and repeat for all of Ott’s seasons, and he gains:
- 13 runs for rBaser and 9 runs against BBREF
- 34 runs for rDP
- 20 rOF runs, versus 11 in DRA
Now, I do a lot of mumbo-jumbo with all of this stuff, and it gives him an extra five or six wins in book. That’s not chump change.
What Do We Do with All These Estimations?
What we do might be different than what you do. Basically, we substitute them into our various formulae for adjusting WAR. Someday when BBREF updates its WAR for 1930–1952, we’ll simply update our background data with its new calculations. Until then, we can work with these estimations. What will you do with them? We hope you’ll be able to use them somehow, as primitive and rough as they are. But they’re pretty good overall. Enough to tie you over until we get the real thing. Just be sure to remember that they aren’t supposed to be gospel.