[Editor’s note: This article was heavily revised in August 2017 for improved readability and clarity. It’s the same content but presented in a much better way. And also in October 2017 due to changes in my methodology. The original version weighted peak too high. The new version works much better.]
Beer and tacos. Chocolate and peanut butter. Scouts and stats. Better together! Well that’s where I’m at today.
I cribbed the original idea for CHEWS (CHalek’s Equivalent WAR System) from Jay Jaffe’s JAWS. With apologies to the mustachioed man, I tweaked it a little and gave it a nice punnish name to call my own. As time has gone on, however, I’m more and more drawn to Adam Darowski’s Hall Rating. It indexes Adam’s inputs against a positional average and anyone at 100 is in the Hall of Stats.
So today, I want to introduce my new sifting score. I call it CHEWS+. Here’s why I went to the bother of all this.
- Like JAWS, it relies on the player’s highest seven (nonconsecutive) seasonal WAR totals and his career WAR.
- Like Hall Rating, it indexes a player to 100.
- Like my original CHEWS, it is based on my own equivalent WAR (eqWAR) calculations.
- Unlike either of these systems, however, CHEWS+ adds a WAR rate component.
- CHEWS+ explicitly combines a positional component and an overall component into the figure it spits back.
- CHEWS+ allows me to more clearly see the building blocks of a player’s rating.
In addition, with CHEWS+ I can tell you
nine seven pieces of information about any given player’s case for a given Hall and how he stacks up relative to other cases. For example, here’s what I can tell you about Dick Allen:
- Allen’s 48 WAR peak rates 20% better than a borderline first-base candidate’s (40 eqWAR)
- Allen’s 62 WAR career rates 2% better than a borderline first-base candidate’s (61 eqWAR)
Allen’s 5.5 WAR/650 rate is 25% better than a borderline first-base candidate’s (4.4)
- Allen ranks 13thamong first basemen with a 113 peak score
- Allen’s 48 WAR peak rates 20% better than a borderline hitter candidate’s (40 eqWAR)
- Allen’s 62 WAR career rates 7% better than a borderline hitter candidate’s (58 eqWAR)
Allen’s 5.5 WAR/650 rate is 28% better than a borderline first-base candidate’s (4.3)
- Allen ranks 77thamong all batters with a 116 career score
- Allen’s 115 CHEWS+ ranks 86thamong all batter candidates.
I like to be able to express all of this if I want to. With JAWS, I can compare to the positional average but why compute 42.5/38.6 if I don’t have to? Indexing to 100 is so much more intuitive. On the other hand, with Hall Rating, I don’t get much of a sense about how the rating works or why. Now I can display that information more simply. When I tell you someone has a peak-oriented case, I can now show you more readily what that really looks like.
So let me tell you how I’m doing this, and then I’ll show you what it looks like. To be honest, not many players’ ranking changed considerably, but a few moves were notable and worth looking at. More will be revealed.
Making the right CHEWS
An important idea about various Halls of Fame that many folks don’t think about is how the positions balance out. Since you have to have nine guys in your batting order, any of them can be an asset or a liability. Seeking some balance is reasonable. So I’ve set the system up to reflect that belief.
- Assume a 70/30 split between hitters and pitchers (or whatever split you deem optimal) and figure the number of total hitters and pitchers that yields. At our current total of 220 honorees, my 70/30 split gets you 154 hitters and 66 pitchers. For our purposes, we’re going to round overages to the nearest whole number.
- Next, find the top n hitters by dividing the total we calculated above by 8 fielding positions and then multiplying the number per position by 2: 154 / 8 = 19.25 and 19 * 2 = 38
- Do the same for pitchers, which are their own single position of course: 2 * 66 = 132.
- At each position, choose the 38 best candidates by whatever your favorite measuring stick is. I used my old CHEWS to determine these 38, but pick your own poison. At pitcher choose 132.
- For each hitter and pitcher, determine his peak (add the seasonal WAR values of his seven best non-consecutive years), his career WAR, and his career rate for accumulating WAR. Since they are easy to understand, I use WAR/650 PA and WAR/250 IP. NOTE: Because I make adjustments for schedule and other conditions, for the purpose of calculating WAR/650, I also adjusted a player’s PAs. Otherwise for early-days players I’d be dividing schedule-adjusted WAR by raw PAs, and the 19thCentury guys would look crazy good. For pitchers, I use WAR/250 IP.
and my adjusted innings. I didn’t bother with relief pitchers, too few looked good in my system.
- For each position, find the median peak and career of the 38 players or 131 pitchers.
- Repeat #4 for every hitter included at each position (don’t need to bother with pitchers).
- For each player at each position and pitchers, divide his peak and career
and rateby their respective medians at his own position.
- Repeat #6 with the median values calculated in step 5 for all positions.
- For each player,
addaverage his peak and career values from step 4 to ½ his rate value, divide by 2.5and multiply the quotient by 100. This is his positional score.
- Repeat #8 using the values calculated in #7; this is his overall score.
- The average of steps #8 and #9 is his initial CHEWS+.
I also want to reward top-level performance. To do so, I’ve instituted a small bonus based on the rate at which a player earned his WAR. It is based in two dimensions: the actual rate itself and how long a player’s career was. I give a maximum of 10 bonus points. This helps players like Larry Walker who had reasonably long careers despite missing a lot of games but whose performance was outstanding. There’s a little mixing and matching of mean and median here, but the world won’t end because of it.
- Find the mean WAR/650 PA for all hitters in the sample and the mean WAR/250 IP for all the pitchers.
- Find the standard deviation of WAR/650 PA for all hitters in the sample, and the mean WAR/250 IP.
- Divide the standard deviation by two and add to the mean. All players above this rate qualify to receive the rate bonus. This works out to approximately 20 to 30 percent of the sample.
- Examine the rates for hitters and determine a practical maximum WAR/650 PA in the sample. Not the highest, but close. I used 8.5 WAR/650 PA, which is exceeded by just a few players. Let’s not make everyone have to be Babe Ruth here.
- Divide a qualifying player’s WAR/650 PA into that practical maximum (8.5 as noted).
- Find a practical maximum PAs for hitters. I use 12,000, just 8% of hitters in the sample exceeded that figure. Remember those PA are adjusted by me. Your PA total would vary.
- Divide the player’s career PAs into 12,000.
- Multiply #5 by #7 and multiply by 10.
- Add to the initial CHEWS+ figure we determined above.
- Repeat for pitchers using WAR/250 IP. Turns out that 8.5 WAR/250 IP happens to work out for hurlers. And 4,000 innings is the practical maximum here. That innings cap will, however, be different for you unless you adjust innings for usage exactly as I do.
As you can see, CHEWS+ compares against the in/out line, not the average Hall member. I deliberately chose to do so. First, the Hall’s actual in/out line is far lower than the 19th-best player at a given position, or the 154th best hitter. If, for the sake of a thought experiment, we used WAR as our measure of overall value, at shortstop Hughie Jennings, Rabbit Maranville, Phil Rizzuto, and Travis Jackson fall well below the simple standard of being the 19th most valuable at their position. Another four—Joe Sewell, Dave Bancroft, Luis Aparicio, and Joe Tinker—cluster very close to the in/out line. So about 40% of the 21 shortstops inducted into the Hall of Fame don’t have an ironclad claim to being one of the position’s top 19 performers. Just taking a simple measure like career WAR, the median of the top 38 shortstops in history is 58.6 and the 21st highest career WAR is 48.7. But the 19th best Hall of Fame shortstop is at 42.8 career WAR, and the 21st and lowest Hall shortstop is 40.8. Of course, I would never use unadjusted career WAR by itself as my baseline for evaluation, but this thought experiment demonstrates the important point that the problem with the Hall isn’t necessarily that Joe Tinker or Joe Sewell make it in. These guys are simply borderline candidates whose cases, including any qualitative factors, may well be interchangeable with their nearest competitors’. No matter where you draw the line, there will always be a group of interchangeable borderline candidates. The problem, instead, is that players such as Jackson, Maranville, and Rizzuto fall well below the in/out line and drag the line down so far that it ceases to have much meaning.
Second, and just as importantly, many players between the in/out line and the Hall average at their position are fully qualified and easy to vote for. Back to our example, the positional average career WAR for Hall shortstops is 68.1. That figure is yanked upward a bit by Honus Wagner’s 131 career WAR. The following Hall of Fame shortstops fall below 68.1 career WAR but above the 58.6 median career WAR of the top 38 players at the position: Ernie Banks, Joe Cronin, Pee Wee Reese, Monte Ward. Who’s raising their hands to boot those guys out? But a system that matches them against the positional average will see them as below-average candidates who don’t raise the Hall’s standards.
So when you put these two points together, you see why I’m choosing to use the median of the top n shortstops in history for a given measurement. (And why we need multiple measurements of value and achievement, not just career WAR).
CHEWS+ is plug and play. Feel free to substitute your own analytics or stats into this framework as long as they are reasonable. You want to use a 5 year peak or a 10 year prime or not count negative seasons, go for it. You can eliminate a category or weight it differently than I do. Also, if you don’t like using the median n players at a position, feel free to use whatever number makes sense to you! But the important thing in this approach is to carefully select your top n candidates per position and top y pitchers and use their median as the basis of comparison.
Interpreting CHEWS+ is like anything else. It is not intended to populate the HoME like Hall Rating does. It serves as a benchmark. Context is always important, and we should always make mental allowances for potential imbalances and pertinent qualitative factors no matter what measurement we use.
CHEWSing the fat
Now that you see how it is computed, here’s some taste of how it works.
At the positional level, the score for hitters indicates 152 players at 99.5 or greater out of the 154 I was shooting for. Turning to the overall score, it shows 158 such players. And when combined into CHEWS+’, we hit 158, just a few more than we should. Among pitchers, the figure is 63 (we’re looking for 66) with one other greater than 99.0 but less than 99.5. I feel good about these results.
Let’s break it down by position.
Pos. Overall POS Score Score CHEWS+ ======================== C 19 13 16 1B 19 24 22 2B 21 18 19 3B 20 18 21 SS 18 22 19 LF 17 21 20 CF 18 17 18 RF 20 25 23 P 63 63 63 ------------------------ 215 221 221
(Frank Thomas placed at first base, Edgar Martinez and Paul Molitor at third base.)
This distribution passes the sniff test. The positions that are generally underrepresented are here as well, and those generally overrepresented are. Catcher and centerfield have good baseball reasons why they might be a little beneath the rest (catching destroys the body; the defensive spectrum is much shorter for lefty centerfielders without a good throwing arm than for other players).
Here’s the players that CHEWS+ indicates as HoME worthy who aren’t in yet and who they would replace:
POS IN CHEWS+ OUT CHEWS+ =========================================== CATCHER Roger Bresnahan* 102 Ted Simmons 99.1 Bill Freehan 88 *Technically, Bresnahan is in as a pioneer/player combo FIRST BASE Harry Stovey 101 Jake Beckley 100 *Chance is in as a manager/player combo SECOND BASE Cupid Childs 106 Bobby Doerr 99 Jeff Kent 98 THIRD BASE John McGraw* 103 Sal Bando 98 Ned Williamson 103 *McGraw is in as a manager/player combo SHORTSTOP Joe Sewell 99 Monte Ward 99 George Wright* 93 *Includes no credit for pre-1871 play LEFT FIELD Roy White 98 Jose Cruz 96 CENTER FIELD George Gore 102 Pete Browning 100 RIGHT FIELD Vlad Guerrero 100 Reggie Smith 98 Sam Rice* 98 *Rice is not credited here for running, double-play avoidance, or throwing-arm value that we’ve written about extensively. PITCHER Charlie Buffinton 105 Old Hoss Radbourn 99 Dizzy Dean 101 Bucky Walters 98 Eddie Rommell 101 Whitey Ford 96 Don Sutton 94 Mordecai Brown 94 *Griffith is in as a manager/player/exec combo
This is a pretty good record. Most of the players in either column fall into one of three categories:
- late cuts or selections that we deliberated over for months or years
- Nineteenth-century players, of whom we already have too many and went against for chronological balance
Charlie Keller. Keller’s situation is really simple. He did a lot of damage in very little time. His peak is better than the average, his career well below, but his WAR rate is outstanding. Then again, the guy only accumulated 4600 PA in his real career, and I only adjust it up to 4840 or so. He missed time to World War II, and his body betrayed him, ending his career prematurely. But even so, he barely sneaks over the line by CHEWS+, and anyone below 105 is probably interchangeable with anyone over 95. Especially if they come from an overstuffed position or an overstuffed era (like, say, the 1890s).
Also, a few important caveats apply. Many of these borderline players don’t yet have official PBP data attached to them. Some, may never or won’t for years. Others like Rice might have that information soon for some or all of their campaigns. Our guesstimates for those guys probably lift several of them up over the line. But officially, this is what they look like now.
Finally, let’s zero in on a few players from the list above to see what’s driving their ratings.
POSITION | OVERALL | NAME Peak Career Rate Score| Peak Career Rate Score| CHEWS+ ===================================================================== Hughie Jennings 111 83 118 101 | 114 87 117 103 | 102 Charlie Keller 102 74 143 99 | 107 80 146 104 | 101 Jim O’Rourke 81 119 70 94 | 85 128 72 100 | 97 Ted Simmons 101 109 79 100 | 93 111 65 90 | 95 Ned Williamson 110 101 106 105 | 109 94 103 102 | 104 Dizzy Dean | 107 90 138 | 106 Whitey Ford | 83 111 101 | 98
If you detect an orientation toward peak performance, it’s because I have one. In the past I’ve been more cautious about it. But after reading this article and seeing the inclusion of rate-based performance, I felt it was important to include it as well. I used to weight peak at 22% higher than career. Now I rate them equally but also include that rate bonus of up to 10 CHEWS+ points.
We see how this influences CHEWS+ above with peak-first candidates such as Jennings, Keller, Williamson, and Dean getting better scores than O’Rourke, Simmons, and Ford. But we can see buried in all of this how the increased transparency of this system can support good decision making. Ted Simmons is at an even 100 among catchers. If I felt it was specifically important to add another catcher, I would have good justification to do so based on his score among those at his primary position. For someone like Dizzy Dean, I might find persuasive the idea that while his peak is above average, his ability to create value may actually be understated by his peak.
Let’s linger for a moment on Simmons and Dean. I differ with many HOM voters and other writers who toss out seasons below replacement. One argument for doing so goes like this: The team should have known better and not continued to run him out there. I agree to this point, it’s the predicate of this that’s problematic: So why should the player be penalized? In my opinion, the player is not penalized by counting everything he did on the field. Instead, we are trying to get an accurate picture of the player’s entire career. Everything counts. The player is accurately measured by including his entire body of MLB work. The classic case where this comes into play is Pete Rose. From age 39 on, he racked up -1.4 WAR over 3694 PAs. But not every player earns negative numbers strictly during their baseball senescence. Ted Simmons, for example. The 1981 version of Ted Simmons, slugging catcher, hit 216/262/376 for a whopping 87 OPS+ and 0.3 WAR (BBREF style). He rebounded for 7.3 WAR over the next two years. The wheels came off again in 1984, when he “earned” -2.6 WAR, and from there the end came quickly. In 2016, we saw a younger player in mid-career do exactly what Simmons did. Coming off 38 WAR over 7 years, Andrew McCutchen served up -0.7 WAR. Want some more? Early Wynn had a full season at age 28 where he posted -1.0 WAR. Burleigh Grimes pooped out a -0.5 season at age 25. Jimmy Wynn coughed up a -0.6 hairball at age 29. Lefty Grove would rather have forgotten 1934 (-0.3). Anyway, these seasons exist. They are rare among Hall-level players, of course, but they are there, and they cost their teams wins. To me, not counting those bad seasons is akin to ignoring the F on Johnny’s report card because he otherwise got As and Bs.
A reasonable argument against my position might be that someone like Dizzy Dean or Charlie Keller or Sandy Koufax benefits on a rate basis due to a sudden injury-forced departure rather than a parade of crappy decline seasons. I hear ya. If you think the deck is stacked against long-career players, well, maybe it is. But that brings me back to the important point that JAWS, Hall Rating, CHEWS+, Hall of Fame Monitor, what have you are not gospel. They are sifting mechanisms. Draw up your long list with them, then look closely to see what they fail to capture. Because there’s no bulletproof stat and there’s no silver-bullet number to end all arguments.
Instead, what we have is thoughtful people creating thoughtful tools to get us near to an answer quickly so that we can spend more time on the borderline where the tough decisions are. And that’s what I like about this improvement over CHEWS. It gives me a simpler number as well as more and understandable details to form a decision on. I’ll soon start adding it to the HoME Stats you can find on our Honorees page.