This is the natural order of things. First you come up with an idea. Then you decide it’s pretty good; you really ought to do something with it. Then you start trying to do something with it, and you realize it’s a lot more complicated than you think. Then you wonder if it’s that good an idea. Then you decide it is and get back at it. Then you’ve got it all worked out just great, and things roll along merrily. Until you realize there’s something you could, or worse *should*, be doing better! Then it’s time for serious thinking about serious tinkering.

That’s why I’m running away from how I’ve calculated baserunning runs for Negro Leaguers as part of my Major League Equivalency routine (MLE). I found that I couldn’t defend to myself the level of subjectivity that had entered my basepath ballparking. And I’m my most sympathetic audience….

I had to find a means by which to increase the objectivity in my estimations, and today I’ll tell you where I’ve arrived. But first, where did I start?

Initially, I used a player’s stolen bases per opportunity, adjusted for his team, compared to his league. I’d wind my way around to a figure that said, *Bob Smith stole 43% more often than his league, in the same opportunities.* Then I’d take that and go onto the BBREF play index and see if I could dig up a handful of dudes in the Retrosheet era who did roughly as well and assign a value that I plucked out of the air around those dudes. A good start, but lacking in rigor and objectivity.

So I started with the Negro Leagues Database. Where else? Among the trad stats they display, only three jump to mind as potential speed indicators: steals, runs scored, triples. We lack caught stealing information and any other baserunning info, so those three are it.

Next I went to BBREF’s Play Index. (Side note: Subscribing to it will regrow your bald spot, pay off your mortgage, make your love life better, help you quite smoking, and give you that new-car look and feel. You should subscribe right now.) I grabbed every retired player whose entire career took place in the Retrosheet era and who had 500 or more PA. It’s more than 2,000 careers. Then I set up scatter plots and regression trend lines for three x/y combinations:

X: SB / [ (H – HR – 3B – 2B) + BB + HPB ] compared to the MLB average of the same

Y: Rbaser / PA

R-Squared: 0.46 (kinda correlated)

X: (R – HR) / [ (H – HR – 3B) + BB + HPB ] compared to the MLB average of the same

Y: Rbaser / PA

R-Squared: 0.41 (less kinda correlated)

X: 3B / H compared to the MLB average of the same

Y: Rbaser / PA

R-squared: 4.2E-06 (not correlated)

As I’ve said many times, I’m not a statistician, so if I’ve goofed something up, let me know.

I made an executive decision to only use SB/OPP. Here’s why: The regression suggests that about half of Rbaser/PA has explanatory power somewhere in SB/OPP, and that the same is true for R/TOB at about 40%. However, stolen bases put a runner into scoring position and likely affect R/TOB. There’s bound to be some entanglement there, so I’m just using SB/OPP.

But since only half of Rbaser/PA might be explainable by SB/OPP, I felt like I’d better try augment with something else too. I went back to my original notion of comping against baserunners with similar SB/OPP rates. I had used my hunt-and-peck method for similar MLB hitters, in part, because I’m also not a super database whiz, and I’m using Excel. I couldn’t figure out how to create dynamic ranges for different players that would allow me to compare any player to the MLB average of SB/OPP for his entire career (unweighted by PAs).

Well, it took a while, and it ain’t elegant, but I finally found a cooky work around using the MATCH and OFFSET functions with inputs of a player’s first and last seasons. Now that I had those functions in conjunction, I could calculate a tailored lgSB/OPP rate for anyone in my sample. And I did.

Fortified with that info, I could create a custom comp set for any Negro Leaguer…only comped to what? If I only used the Negro Leaguer’s SB/OPP vs lg I might get a whole lot or very few comps, and what comps I got might not actually be very like the Negro Leaguer at all! So I cheated a little. I used height and weight as my comping stats. Within two inches and five pounds of the player, and you’re in the mix. If not, sorry. (A couple guys so far still didn’t have enough comps, in which case, I added an inch and five pounds to the criteria.) Physical attributes are not a perfect criteria, but they often have a strong affect on speed. Just ask Cecil or Prince Fielder.

Now that I could create some robust comp sets, I took a Negro Leaguer’s SB/OPP vs lg, and located it or its nearest value within the comps. I took the six nearest guys above and below and found their median Rbaser/PA. Bad bing, bada boom.

I’ve allotted 50% of the Negro Leaguer’s Rbaser/PA value to the regression and to the comp set. It looks like this with the regression equation plugged in:

{ 0.5 * [ (0.0022 * player’s SB/OPP v lg) – 0.0024 ] } + ( 0.5 * median Rbaser/PA of comps )

Great, right! There’s just one more little hitch. See, not every city’s box scores included stolen bases for Negro Leagues games. Sometimes even in the same city one team’s boxes tended to report steals more often than others! So we do need to do *a little bit* of subjective work to establish a player’s SB/OPP because if he played with a team in a low-reporting area, it drives down his SB/OPP since we have hits, walks, and usually hit-by-pitched-ball stats.

The subjective element now is determining which teams had low-reporting and which did not. Then we adjust their home and road opportunities as appropriate. Um, I can’t say this is really scientific, but it’s as good as I can do for now. You start to get a feel for it when you see that a few teams stole more bases than they played games and others didn’t. Or you see teams with drastically low totals compared to others.

1) Guess whether a player was on a high- or low-reporting team

2) Guess which other teams were low-reporters and add their total games played up

3) Guess which other teams were high-reporters and add their total games up

4) Estimate SB OPPs in low-reporting road games by splitting total SB OPPs in half (home and road), then multiplying the road half by the ratio of low to high reporting road teams.

5) Subtract 4 from half the total SB OPPs to get the high-reporting road games.

6) If he played for a low-reporting team, then his total SB OPPs are equal to the result of Step 5. If he’s on a high-reporting team, then add the result of Step 5 to the home half of the SB OPPs.

7) Divide SB into the result of Step 6 to get his SB/OPP

8) Got to figure his league’s reporting-adjusted SB/OPP too, but *the league’s OPPs* *may be different than the player’s*. So, first take the league’s low-reporting games (Step 2) and add them to the player’s team’s games, then divide into the league’s total games, then multiply by half the player’s OPPs. If he’s on a high-reporting team, then divide Step 2 into the league’s total games and multiply by half the player’s OPPs.

9) Subtract 8 from half the player’s OPPs, and add the result to half the player’s OPPs to get his league’s total OPPs adjusted for the reporting of SB.

10) Multiply the league’s known SB/OPP times the player’s total OPPs to get the league’s unadjusted SB in the player’s OPPs.

11) Divide Step 10 into Step 9 to get the league’s SB/OPP.

Got all that?

Let’s look at Walter “Rev” Cannady as an example.

When I go through those 11 steps just above here, Cannady comes out at 177% of his leagues (18% v 10%). Plugging him into the regression equation, I get 0.0015 Rbaser/PA, which is about a run a year. Looking at his comps, they ring up at .0019 or 1.2 runs a year. Taking half of each and adding them, it’s .00168 Rbaser/PA, which is about 1.1 runs a year. Applying to Cannady’s estimated PA (10,130), we get about 17 runs for his career.

Earlier, I told you that I wanted to *increase* the objectivity and reduce the subjectivity. By comparison to hunting and pecking for comps with stolen base info based on partially known opportunities, that’s an improvement. Using regression and a more complete comp set, I have increased my use of objective measures.

Of course, I have added some other kinds of subjectivity as well. There’s plenty of guess work in trying to determine how steals came in games with reported steals and those with unreported steals. I think, however, that there’s a precision increase that probably offsets some of that subjectivity since we’re at least acknowledging the reporting issue and making a conservative to middle-of-the-road attempt to compensate.

The other added subjectivity is the choice to use height and weight as factors for comparison. That’s a subjectivity of choice (editorial subjectivity, if you will), but at least it’s got some reasoning behind it.

Overall, I’m getting mildly more conservative numbers for baserunning, and that’s probably a good thing. It takes a lot of the decision making out of my brain and puts it in Excel’s hands, and in this case, that’s surely a good thing.

I’m working slowly through the 100 or so MLEs I drawn up so far, and once I get through them, I’ll update those already on the site. All remaining players beginning with our left fielders next week will be based on this new baserunning protocol.

## Discussion

## Trackbacks/Pingbacks

Pingback: Evaluating More Negro Leagues Centerfielders, Part I | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Left Fielders, Part 3 | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Left Fielders, Part 2 | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Left Fielders, Part I | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Shortstops, Part 3 | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Shortstops Part 2 | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Shortstops, Part 1 | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Pitchers, Part 5 | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Third Basemen | the Hall of Miller and Eric - July 4, 2018

Pingback: Evaluating More Negro Leagues Second Basemen, Part II | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating More Negro Leagues Second Basemen | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating More Negro Leagues Pitchers, Part 3 | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating More Negro Leagues First Basemen, Part 2 | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating More Negro Leagues First Basemen, Part 1 | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating More Negro Leagues Pitchers, Part 2 | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating More Negro Leagues Catchers | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating More Negro Leagues Pitchers, Part 1 | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating Negro Leagues Centerfielders, Part 2 | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating Negro Leagues Center Fielders, Part 1 | the Hall of Miller and Eric - July 5, 2018

Pingback: Evaluating Negro Leagues Corner Outfielders | the Hall of Miller and Eric - July 6, 2018

Pingback: Evaluating Negro Leagues Shortstops, Part Two | the Hall of Miller and Eric - July 6, 2018

Pingback: Evaluating Negro Leagues Shortstops Part 1 | the Hall of Miller and Eric - July 6, 2018

Pingback: Evaluating Negro Leagues Third Basemen | the Hall of Miller and Eric - July 6, 2018

Pingback: Evaluating Negro Leagues First Basemen and Second Basemen | the Hall of Miller and Eric - July 6, 2018

Pingback: Evaluating Negro Leagues Catchers | the Hall of Miller and Eric - July 6, 2018

Pingback: Evaluating Negro Leagues Pitcher, Part III: Rogan, Smith, and Williams | the Hall of Miller and Eric - July 6, 2018

Pingback: Evaluating Negro Leagues Pitchers, Part II: Foster, Foster, Mendéz, and Paige | the Hall of Miller and Eric - July 6, 2018

Pingback: Evaluating Negro Leagues Pitchers, Part I: Brown, Cooper, Day, Dihigo | the Hall of Miller and Eric - July 6, 2018