Eric and I have been working on our HoME for going on five years now, and during that time I’ve revised my evaluation and ranking systems on many, many occasions. What I haven’t done, at least not with proper transparency, is to share the totality of my method with our readers. The primary reason for holding back, I suppose, is that even I wasn’t comfortable with where I was. Also, process posts aren’t so fun to read or write. But it’s time. More and more people are checking out the HoME (thank you!), and with exposure can come criticism. I’m quite critical of other systems that don’t explain how they arrive where they arrive, so I want to correct for that concern here.
What I’m going to do in this post is explain how I convert BBREF WAR to my seasonal WAR totals. Then I explain how I use seasonal WAR totals to rank players.
The two most commonly cited WARs come from our friends at Baseball Reference and Fangraphs. I used to employ a combination of the WAR ratings on each site. I few years ago, however, I stopped using Fangraphs. There are two reasons, both pretty simple. First, I have to admit that I don’t quite understand everything that goes into FG WAR. I’m not suggesting they’re hiding anything. I don’t know that they are. But I understand everything at BBREF and not at FG. It’s easier for me to justify that which I understand, so I use BBREF WAR only. The second reason lacks good science, but I hold to it anyway. The Fangraphs system produces results that I just don’t buy. For example, they say that Don Sutton is the 14th best pitcher ever by career WAR. Tommy John is 22nd, Jim Kaat 27th. Over at BBREF, Sutton is 30th, John 48th, and Kaat 130th. I buy what BBREF is selling, at FG, not so much. Perhaps if I understood exactly why Sutton, for example, grades as he does, I could be swayed. But I don’t think I do.
If you were to ask me which seasonal WAR is more predictive of future results, I’d go with FG. If you were to ask me which seasonal WAR better represents what should have happened, I’d again pick FG. But I don’t think either is what WAR should be about when determining the best players ever. I think it should measure results – what actually happened. And for me, BBREF does that better than FG.
Adjustments for All Players
When position players pitch, it counts. Similarly, hitting counts for pitchers. The first of those statements isn’t a very big deal except for a few guys like Monte Ward. But it’s a pretty huge deal for pitchers. Guys like Red Ruffing, Wes Ferrell, and Early Wynn add tremendous value with their bats. On the other hand, Pud Galvin, Lefty Grove, and Stan Coveleski are hurt quite a bit by theirs. This decision took a bit of time for me, but I think it’s clear cut in the end. I’m far more concerned with understanding which pitcher was a more valuable to his team than I am with which player was a better pitcher. WAR is a stat about value. My adjustments need to be related to overall value.
Not Everything Counts Equally
We say the American Association, Union Association, Players League, and Federal League were major leagues. But not all leagues had the value of major leagues, so I make adjustments, not to every season, but to many.
American Association 1882 80% American Association 1883 90% American Association 1884 80% American Association 1885 90% American Association 1886 95% American Association 1887 95% American Association 1888 90% American Association 1889 95% American Association 1890 90% American Association 1890 90% American Association 1891 85% Federal League 1914 75% Federal League 1915 75% Players League 1890 90% Union Association 1884 65%
I’ve considered adjustments to the AL and NL prior to integration, and to the AL for years after that because they lagged behind the NL. I haven’t adjusted those years, though I still may if I can come up with a good way.
Some folks don’t count seasons with negative WAR. They argue that teams should have known better and that the players shouldn’t have been in the majors. Those who don’t count negative seasons adjust for what should have happened rather than what actually did happen. So they should also adjust for time missed to military service. The should adjust for time spent in the minors when the player should have been in the majors. They should adjust for times when pitchers shouldn’t have been on the mound due to injury, or when they should have been lifted due to ineffectiveness. They should adjust for a ton of things if they’re just going to dump seasons where the player shouldn’t have been in the majors. I prefer to measure what happened, not to speculate about what might have been.
Adjustments for Military Service
I do not make any. Simply, I want to evaluate what happened, not what may have happened.
Adjustments for Position Players
Defensive Regression Analysis
I’ve written about this a bunch of times. But if you really want to learn about it, check out the Michael Humphreys articles at Fangraphs. Better yet, buy his book, Wizardry. In short, I integrate the findings of Humphreys, located at The Baseball Gauge, because they make sense. He doesn’t over-count errors. He uses freely available statistics. He starts thinking with the team level, not the player level. Most of all, his results pass the sniff test. For example, players generally see their DRAs decrease as they age, which is exactly what we expect.
But I could be wrong in my appreciation for DRA, so I don’t completely substitute it for Rfield. Rather, I use DRA at 70% weight and Rfield at 30% weight since the mound moved to 60’6”. Before that, I use them at 50% each. For catchers, I use 70% Rfield and 30% DRA for all seasons. When I’m trying to make adjustments like this, I really appreciate how BBREF’s WAR has components you can substitute if you prefer them.
Yankee Stadium LF Arm Adjustment
Continuing with our DRA adjustments, I follow Michael Humphreys’ recommendation and adjust the arm of Yankee left fielders down by half a run for every 145 innings. That’s because, simply, Yankee Stadium makes things easier on left fielders. I used to make adjustments for Fenway and for Coors as well, but @DanHirsch has done such a phenomenal job at The Baseball Gauge making those adjustments for us, so I no longer have to. If you’re interested in DRA, check out that site’s player pages under defense. Here’s Omar Vizquel’s page.
Playing Time Adjustment
Seasons today typically have 162 games. But before 1961 in the AL and 1962 in the NL, it was a 154-game schedule. And in the game’s early days, it was different still. Plus, in seasons like 1995, 1994, 1981, 1972, and others, teams have played shorter schedules. I have some desire to adjust for changes essentially beyond the historical control of players. However, I do not make every season equal to 162 games. First, I am loathe to credit players for things that didn’t happen. So I had to think about what’s worse, crediting players for things they didn’t actually do or comparing apples (seasons of 162 games) to things that aren’t quite apples (seasons of fewer than 162 games). I decided to give some extra credit. For two reasons, it can’t be full credit. First, it didn’t actually happen. And second, players have a non-zero chance of injury in those extra games. So I try to get players whose teams played nearly 162 games a greater percentage of the way toward 162 than those who played fewer.
Stay with me now if you can. I take a quarter of the difference between team games and 162. I add that number to the percentage of 162 games his team played, and I add that percentage of his yearly WAR to his yearly WAR to get my updated yearly WAR.
Maybe an example would work? Let’s say a team has a schedule that was exactly 60% as long as a 162 game schedule. (Forget for a moment that such a schedule would be 97.2 games long. Just stick with the example). That season is 40% shy of 100% of 162. So I take a quarter of that 40%, or 10%, and I add it to the 60% the team actually played, or 70%. What I do then is take 70% of the difference between the theoretical 162-game WAR and the actual WAR, and I add that to the actual WAR to come up with my adjusted WAR.
Basically, I want to add some credit, but not all. I want to respect the chance for injury, and I think I do that.
Additional Catching Adjustments
There was a time when I made pretty big adjustments for catchers, trying to see them like other position players. But the truth is they’re not like other position players. They play fewer games, have shorter careers, and move away from their position to protect their bodies more frequently than players of other positions. I’ve stopped making that adjustment for two reasons. First, it’s inconsistent with other things I believe in. I don’t want to give credit for something that didn’t happen. Second, the theory at the HoME that we should elect an approximately equal number of players at each position eliminates the need for an adjustment. We’re going to get enough catchers into the HoME without any artificial inflation.
If you’re paying close attention to position distribution at the HoME, however, you’d note that catchers lag behind other positions. There are reasons for that. The main reason is that catchers play other positions, but almost no non-catchers ever catch. Johnny Bench, for example, is 79% a catcher, 9% a third baseman, 7% a first baseman, 3% a left fielder, and 3% a right fielder. So catchers increase the number of HoMErs at positions other than catcher, though there are only eight non-catcher HoMErs who contribute anything at catcher. Anyway, while I’ve digressed from explaining my system, I think it’s an important tangent to have taken.
In 2012, Max Marchi wrote a series of posts at Baseball Prospectus relating to how catchers handle pitchers. You can read up on the detail if you choose. Let me just say that I accept his methodology, but I don’t accept it completely. For catchers from 1948 (the farthest back we have Retrosheet data) through 2011, Marchi has career runs saved through pitcher handling. I divide that number by career games caught to see how many runs were saved on a per-game basis. Then I multiply that number by the number of games caught in a season. However, since I only believe this system to be good, not perfect, not with any guarantee of being correct, I take just one quarter of that final number and add it to the catcher’s annual WAR. One reason Eric’s system and mine diverge is that he includes a higher percentage of the Marchi adjustment. I think we understand catcher value less well than that of any other position. I’m not quite willing to adopt anything in full before there’s widespread mainstream acceptance of it.
Adjustments for Pitchers
Before 1893, the game was different. For starters, the mound was only 50 feet away from the plate. That meant pitchers could generate the power necessary with less effort. Thus, they could pitch more innings. To compensate for the huge pitcher inning totals in the 50 foot era, I count only 85% of their seasonal WAR.
Before 1883, pitchers essentially had to throw the ball underhand. They had a role of initiating play more than a role of trying to keep the batter from hitting. Thus, for pitchers before 1883, I take only 70% of the above number, or 60% of their seasonal WAR.
Pitching is physically demanding. I believe that pitchers have a limited number of pitches in their arms before injury will set in. Some get lucky and age out before their arms fall off, but many are not so lucky. Because of this belief, I offer a bonus for playoff innings pitched. This isn’t a bonus based on quality; it’s an acknowledgment that innings during the playoffs can cost innings during subsequent regular seasons. I want to credit the wear and tear on the arm, which I think can be seen by innings, not by quality, which I think would even out over time if given enough innings.
To add seasonal credit, I determine a pitcher’s WAR rate per 250 innings and multiply that rate by playoff innings before dividing by 250. For Eppa Rixey’s 6.67 innings in the 1915 World Series, for example, he gets 0.08 WAR to add to his seasonal total. In 1998, David Wells threw 30.67 playoff innings, which added 0.49 WAR to his seasonal total. And last October, Justin Verlander threw 36.67 innings, which temporarily adds 0.82 WAR to his total. As his career WAR rate improves or declines, that number will change as well.
Determining a Player’s Position
There are lots of ways one could determine where to place a player when trying to determine the best players at a particular position. I choose the simplest, the place he played the most games, disregarding games at designated hitter. I choose games because it’s easier to defend than anything else. It doesn’t rely on feel. I don’t ever have to try to parse WAR in an individual season where someone played six different positions. For every Ernie Banks who I call a 1B when most others call a SS, there are dozens of other easy calls that I don’t have to worry about. Yes, I know, designated hitter is a position. The reason I disregard games at DH is because there is not a large enough group of great career designated hitters to measure those players against a group of peers.
So with the adjustments above, I determined my seasonal WAR for 1511 players. That’s 454 pitchers and 1057 position players. For reference, there are 996 players in history with at least 5000 plate appearances. That’s how wide (or how narrow, if you prefer) my database is. With seasonal WAR, I can begin to create MAPES.
Like Eric’s CHEWS (CHalek’s Equivalent WAR System), MAPES (Miller’s Awesome Player Evaluation System) is a derivative of Jay Jaffe’s JAWS (Jaffe WAR Score system). I like my name most because it connects to Cliff Mapes, an outfielder who had a five-year run, mostly for the Yankees, from 1948-1952. Basically, I’m just trying to be cute.
When I started, I adjusted MAPES so it didn’t fall short where I thought Jaffe’s did. In my opinion, Jaffe trades simplicity for quality or accuracy. Let me explain.
JAWS is the average of career WAR and peak WAR, measured by a player’s seven best seasons, which don’t have to be consecutive. Why seven? What if someone has a peak of six years? Or eight? And why is it the average of career and the seven best? That’s necessarily weighted toward career (unless you produced negative WAR outside of your best seven seasons).
I created a system, which I’ve since dumped for reasons I’ll explain below, that gave credit to players for each season, weighing their best most, and their worst least, basically. Of course, while my system was more “accurate”, whatever that word means, the results it produced were hardly different at all from Jaffe’s. Jaffe and Eric would both say that my system dealt in an unnecessary area of minutiae, that the 45th best 2B is pretty much the same as the 48th best 2B. Further, none of us would ever say for certain that the guy we have ranked 45th best was actually the better player. No matter how specific, these are just estimates. They’re really just sorting systems. Even BBREF says that seasonal WAR that’s 1-2 apart shouldn’t be considered definitive.
While my position actually wasn’t any better than Jaffe’s, I continued to think we needed something less rigid than just the seven best seasons and the career.
Arguing for Consecutive
When JAWS began, it considered a player’s seven best consecutive seasons. Jaffe has since modified that to non-consecutive seasons. Eric agrees. And so do I, to an extent. While we cannot get a sense of a player’s greatness in a system that could easily not include his best or second best season, we also cannot do so if we don’t count the extended period of time when he was at his best, trying to ignore the essentially false construct of a baseball season. So I do consider consecutive seasons also.
Yes, more factors make my system more complex and less translatable. They also make it marginally more defensible.
Eric did the work in April and August with CHEWS+. Adam Darowski did it before we did with his Hall Rating. For the math, check out Eric’s excellent CHEWS+ post. A point of his system, Adams, and now MAPES+ is to index to 100. If you’re at 110, you’re 10% better than the level needed to be a HoMEr. If you’re at 90, you’re 10% worse. I think if you’re at 110+, you’re almost always going to be in. If you’re below 90, you’re almost always going to be out. And if you’re in between, there’s a discussion to be had. Basically.
As a peak voter, I include a player’s best more than I do his career. After all, it’s a player’s peak performance that does the most to drive his team toward the playoffs. My formula for position players is 37.5% peak +12.5% prime + 12.5% consecutive + 37.5% career. To get the numbers for peak, prime, consecutive, and career, I take the median of the top X pitchers and top X players by defensive position. Since there are 226 players in the HoME and I believe in a 70/30 hitter/pitcher split, I’m looking at about 20 players per position in the HoME and about 70 pitchers, I look at the median of the top-40 by defensive position and top 140 pitchers to determine what numbers to use for peak, prime, consecutive, and career.
Defining Peak, Prime, Consecutive, and Career
I’ll start with career since it’s easiest. It’s the total career WAR with my adjustments. All seasons count. And all seasons count equally.
I continue to use my seasonal WAR adjustments for the three categories that follow.
Jay Jaffe calls the best seven non-consecutive seasons a player’s peak. Since there’s no good reason seven is the right number, I take the average of the medians explained above of the best 5, 6, 7, and 8 seasons. It’s hardly any additional calculation, and I think I encompass more of what reasonable people would call a peak.
Consecutive is easy enough. But since the more consecutive years are included, themore likely a down year is included, I look at a relatively small number of seasons for the score of consecutive. My consecutive score is the average of the medians of the 3, 4, and 5 best consecutive seasons.
Because I lean peak but don’t only reside there, I want to include another factor that will help to include players who were solid for longer than their peaks but didn’t necessarily tack on year after year of passable ball after their primes. For my prime score, I look at the average of the medians for the best 9, 10, 11, 12, and 13 seasons.
A Change for Pitchers
If you consider my reasoning for adding a playoff bonus for pitchers but not a similar bonus for hitters, you’ll understand why my formula for ranking pitchers is a little different than for hitters. Innings add value to the game’s best pitchers. They also add strain. Because too many pitchers have their best seasons followed by less great seasons, I shift the consecutive factor to peak. My formula for pitchers is 50.0% peak +12.5% prime + 37.5% career.
I suppose you could call me out for inconsistency here, but the removal of consecutive feels right. And the results make sense. Every pitcher at 110+ is in or going. Every pitcher below 90 is out or wouldn’t get in if he retired today. And the guys in between represent a pretty wide borderline about whom we can debate.
Putting It All Together
Once I have my peak, prime, consecutive, and career scores, I weigh them as I explained above. I then multiply by 100 to get my MAPES+ score. Babe Ruth, as you might expect, comes out best. His 245.60 represents someone 145% over the borderline. The closest to exactly 100.00, the definition of the borderline, is Roy White at 100.01. The weakest guy I have ranked thus far is Addison Russell at 18.25, though he still has almost his entire career in front of him. The weakest retired guy overall is Charlie Comiskey at 25.02, or about 75% worse than the borderline. He got into our data set early in the process because he’s in the Hall of Fame, not because he should really be there.
I like my MAPES+ system more than any I have used in the past. Indexing to 100 is smart. I like the idea that no one exact number represents peak. And I like tilting toward peak but not residing only there. I’m troubled by using different systems for hitters and pitchers though. And I don’t know that my playoff bonus for pitchers is exactly right. Further, I expect that I’ll shift my preference more or less toward peak or prime as we move forward, and I’ll tweak totals on occasion. Still, I like this system a lot. It’s the strongest I’ve put together, thanks in huge part to Eric doing a lot of the leg work and providing the inspiration.