[Editor’s note: This article was heavily revised in August 2017 for improved readability and clarity. It’s the same content but presented in a much better way.]
Beer and tacos. Chocolate and peanut butter. Scouts and stats. Better together! Well that’s where I’m at today.
I cribbed the original idea for CHEWS (CHalek’s Equivalent WAR System) from Jay Jaffe’s JAWS. With apologies to the mustachioed man, I tweaked it a little and gave it a nice punnish name to call my own. As time has gone on, however, I’m more and more drawn to Adam Darowski’s Hall Rating. It indexes Adam’s inputs against a positional average and anyone at 100 is in the Hall of Stats.
So today, I want to introduce my new sifting score. I call it CHEWS+. Here’s why I went to the bother of all this.
- Like JAWS, it relies on the player’s highest seven (nonconsecutive) seasonal WAR totals and his career WAR.
- Like Hall Rating, it indexes a player to 100.
- Like my original CHEWS, it is based on my own equivalent WAR (eqWAR) calculations.
- Unlike either of these systems, however, CHEWS+ adds a WAR rate component.
- CHEWS+ explicitly combines a positional component and an overall component into the figure it spits back.
- CHEWS+ allows me to more clearly see the building blocks of a player’s rating.
In addition, with CHEWS+ I can tell you nine pieces of information about any given player’s case for a given Hall and how he stacks up relative to other cases. For example, here’s what I can tell you about Dick Allen:
- Allen’s 48 WAR peak rates 20% better than a borderline first-base candidate’s (40 eqWAR)
- Allen’s 62 WAR career rates 2% better than a borderline first-base candidate’s (61 eqWAR)
- Allen’s 5.5 WAR/650 rate is 25% better than a borderline first-base candidate’s (4.4)
- Allen ranks 13thamong first basemen with a 113 peak score
- Allen’s 48 WAR peak rates 20% better than a borderline hitter candidate’s (40 eqWAR)
- Allen’s 62 WAR career rates 7% better than a borderline hitter candidate’s (58 eqWAR)
- Allen’s 5.5 WAR/650 rate is 28% better than a borderline first-base candidate’s (4.3)
- Allen ranks 77thamong all batters with a 116 career score
- Allen’s 115 CHEWS+ ranks 86thamong all batter candidates.
I like to be able to express all of this if I want to. With JAWS, I can compare to the positional average but why compute 42.5/38.6 if I don’t have to? Indexing to 100 is so much more intuitive. On the other hand, with Hall Rating, I don’t get much of a sense about how the rating works or why. Now I can display that information more simply. When I tell you someone has a peak-oriented case, I can now show you more readily what that really looks like.
So let me tell you how I’m doing this, and then I’ll show you what it looks like. To be honest, not many players’ ranking changed considerably, but a few moves were notable and worth looking at. More will be revealed.
Making the right CHEWS
An important idea about various Halls of Fame that many folks don’t think about is how the positions balance out. Since you have to have nine guys in your batting order, any of them can be an asset or a liability. Seeking some balance is reasonable. So I’ve set the system up to reflect that belief.
- Assume a 70/30 split between hitters and pitchers (or whatever split you deem optimal) and figure the number of total hitters and pitchers that yields. At our current total of 220 honorees, my 70/30 split gets you 154 hitters and 66 pitchers. For our purposes, we’re going to round overages to the nearest whole number.
- Next, find the top n hitters by dividing the total we calculated above by 8 fielding positions and then multiplying the number per position by 2: 154 / 8 = 19.25 and 19 * 2 = 38
- Do the same for pitchers, which are their own single position of course: 2 * 66 = 132.
- At each position, choose the 38 best candidates by whatever your favorite measuring stick is. I used my old CHEWS to determine these 38, but pick your own poison. At pitcher choose 132.
- For each hitter and pitcher, determine his peak (add the seasonal WAR values of his seven best non-consecutive years), his career WAR, and his career rate for accumulating WAR (I use WAR/650 PAs because it’s easy to understand). NOTE: Because I make adjustments for schedule and other conditions, for the purpose of calculating WAR/650, I also adjusted a player’s PAs. Otherwise for early-days players I’d be dividing schedule-adjusted WAR by raw PAs, and the 19thCentury guys would look crazy good. For pitchers, I use WAR/250 IP and my adjusted innings. I didn’t bother with relief pitchers, too few looked good in my system.
- For each position, find the median peak, career, and rate values of the 38 players or 131 pitchers.
- Repeat #4 for every hitter included at each position (don’t need to bother with pitchers).
- For each player at each position and pitchers, divide his peak, career, and rate by their respective medians at his own position.
- Repeat #6 with the median values calculated in step 5 for all positions.
- For each player, add his peak and career values from step 4 to ½ his rate value, divide by 2.5, and multiply the quotient by 100. This is his positional score.
- Repeat #8 using the values calculated in #7; this is his overall score.
- The average of steps #8 and #9 is his CHEWS+.
As you can see, CHEWS+ compares against the in/out line, not the average Hall member. I deliberately chose to do so. First, the Hall’s actual in/out line is far lower than the 19th-best player at a given position, or the 154th best hitter. If, for the sake of a thought experiment, we used WAR as our measure of overall value, at shortstop Hughie Jennings, Rabbit Maranville, Phil Rizzuto, and Travis Jackson fall well below the simple standard of being the 19th most valuable at their position. Another four—Joe Sewell, Dave Bancroft, Luis Aparicio, and Joe Tinker—cluster very close to the in/out line. So about 40% of the 21 shortstops inducted into the Hall of Fame don’t have an ironclad claim to being one of the position’s top 19 performers. Just taking a simple measure like career WAR, the median of the top 38 shortstops in history is 58.6 and the 21st highest career WAR is 48.7. But the 19th best Hall of Fame shortstop is at 42.8 career WAR, and the 21st and lowest Hall shortstop is 40.8. Of course, I would never use unadjusted career WAR by itself as my baseline for evaluation, but this thought experiment demonstrates the important point that the problem with the Hall isn’t necessarily that Joe Tinker or Joe Sewell make it in. These guys are simply borderline candidates whose cases, including any qualitative factors, may well be interchangeable with their nearest competitors’. No matter where you draw the line, there will always be a group of interchangeable borderline candidates. The problem, instead, is that players such as Jackson, Maranville, and Rizzuto fall well below the in/out line and drag the line down so far that it ceases to have much meaning.
Second, and just as importantly, many players between the in/out line and the Hall average at their position are fully qualified and easy to vote for. Back to our example, the positional average career WAR for Hall shortstops is 68.1. That figure is yanked upward a bit by Honus Wagner’s 131 career WAR. The following Hall of Fame shortstops fall below 68.1 career WAR but above the 58.6 median career WAR of the top 38 players at the position: Ernie Banks, Joe Cronin, Pee Wee Reese, Monte Ward. Who’s raising their hands to boot those guys out? But a system that matches them against the positional average will see them as below-average candidates who don’t raise the Hall’s standards.
So when you put these two points together, you see why I’m choosing to use the median of the top n shortstops in history for a given measurement. (And why we need multiple measurements of value and achievement, not just career WAR).
CHEWS+ is plug and play. Feel free to substitute your own analytics or stats into this framework as long as they are reasonable. You want to use a 5 year peak or a 10 year prime or not count negative seasons, go for it. You can eliminate a category or weight it differently than I do. Also, if you don’t like using the median n players at a position, feel free to use whatever number makes sense to you! But the important thing in this approach is to carefully select your top n candidates per position and top y pitchers and use their median as the basis of comparison.
Interpreting CHEWS+ is like anything else. It is not intended to populate the HoME like Hall Rating does. It serves as a benchmark. Context is always important, and we should always make mental allowances for potential imbalances and pertinent qualitative factors no matter what measurement we use.
CHEWSing the fat
Now that you see how it works, here’s some taste of how it works.
At the positional level, the score for hitters indicates 152 players at 99.5 or greater out of the 154 I was shooting for. Turning to the overall score, it shows 157 such players. And when combined into CHEWS+’, we hit 154 exactly, as we should. Among pitchers, the figure is 65 (we’re looking for 66) with one other greater than 99.0 but less than 99.5. I feel good about these results.
Let’s break it down by position.
Pos. Overall POS Score Score CHEWS+ ======================== C 19 14 17 1B 19 23 22 2B 20 18 19 3B 20 18 19 SS 19 21 19 LF 17 23 20 CF 19 17 18 RF 19 23 20 P 65 65 65 ------------------------ 217 222 219
This distribution passes the sniff test. The positions that are generally underrepresented are here as well, and those generally overrepresented are. Catcher and centerfield have good baseball reasons why they might be a little beneath the rest (catching destroys the body; the defensive spectrum is much shorter for lefty centerfielders without a good throwing arm than for other players). The one weirdo is the position score for leftfield, but this is rectified by the time we reach CHEWS+.
Here’s the players that CHEWS+ indicates as HoME worthy who aren’t in yet and who they would replace:
POS IN CHEWS+ OUT CHEWS+ =========================================== CATCHER Gene Tenace 102 Ted Simmons 95 Roger Bresnahan* 102 *Technically, Bresnahan is in as a pioneer/player combo FIRST BASE Will Clark 101 Frank Chance* 101 Harry Stovey 100 *Chance is in as a manager/player combo SECOND BASE Cupid Childs 106 Bobby Doerr 99 Jeff Kent 96 THIRD BASE John McGraw* 108 Sal Bando 99 Ned Williamson 104 Heinie Groh 101 *McGraw is in as a manager/player combo SHORTSTOP Hughie Jennings 102 Dave Bancroft 99 Joe Sewell 99 Monte Ward 97 George Wright* 96 *Includes no credit for pre-1871 play LEFT FIELD Charlie Keller 101 Zack Wheat 99 Jose Cruz 97 Jim O’Rourke 97 CENTER FIELD Pete Browning 105 George Gore 102 Mike Griffin 100 RIGHT FIELD Vlad Guerrero 101 Sammy Sosa 99 Willie Keeler 98 Dave Winfield 98 Harry Hooper* 97 Sam Rice* 94 *Neither Hooper nor Rice is credited for running, double-play avoidance, or throwing-arm value that we’ve written about extensively. PITCHER Bob Caruthers 116 Whitey Ford 98 Charlie Buffinton 107 Bucky Walters 97 Dizzy Dean 106 Pud Galvin 95 Eddie Rommell 106 Early Wynn 94 Jim McCormick 103 Nap Rucker 102 Clark Griffith* 101 *Griffith is in as a manager/player/exec combo
This is a pretty good record. Most of the players in either column fall into one of three categories:
- late cuts or selections that we deliberated over for months or years
- Nineteenth-century players, of whom we already have too many and went against for chronological balance
- Charlie Keller.
Keller’s situation is really simple. He did a lot of damage in very little time. His peak is better than the average, his career well below, but his WAR rate is outstanding. Then again, the guy only accumulated 4600 PA in his real career, and I only adjust it up to 4840 or so. He missed time to World War II, and his body betrayed him, ending his career prematurely. But even so, he barely sneaks over the line by CHEWS+, and anyone below 105 is probably interchangeable with anyone over 95. Especially if they come from an overstuffed position or an overstuffed era (like, say, the 1890s).
Also, a few important caveats apply. Many of these borderline players don’t yet have official PBP data attached to them. Some, like Keeler, may never or won’t for years. Others like Rice or Hooper might have that information soon for some or all of their campaigns. Our guesstimates for those guys probably lift several of them up over the line. But officially, this is what they look like now.
Finally, let’s zero in on a few players from the list above to see what’s driving their ratings.
POSITION | OVERALL | NAME Peak Career Rate Score| Peak Career Rate Score| CHEWS+ ===================================================================== Hughie Jennings 111 83 118 101 | 114 87 117 103 | 102 Charlie Keller 102 74 143 99 | 107 80 146 104 | 101 Jim O’Rourke 81 119 70 94 | 85 128 72 100 | 97 Ted Simmons 101 109 79 100 | 93 111 65 90 | 95 Ned Williamson 110 101 106 105 | 109 94 103 102 | 104 Dizzy Dean | 107 90 138 | 106 Whitey Ford | 83 111 101 | 98
If you detect an orientation toward peak performance, it’s because I have one. In the past I’ve been more cautious about it. But after reading this article and seeing the inclusion of rate-based performance, I felt it was important to include it as well. I used to weight peak at 22% higher than career. Now I rate them equally but also include rate at 50%.
We see how this influences CHEWS+ above with peak-first candidates such as Jennings, Keller, Williamson, and Dean getting better scores than O’Rourke, Simmons, and Ford. But we can see buried in all of this how the increased transparency of this system can support good decision making. Ted Simmons is at an even 100 among catchers. If I felt it was specifically important to add another catcher, I would have good justification to do so based on his score among those at his primary position. For someone like Dizzy Dean, I might find persuasive the idea that while his peak is above average, his ability to create value may actually be understated by his peak.
Let’s linger for a moment on Simmons and Dean. I differ with many HOM voters and other writers who toss out seasons below replacement. One argument for doing so goes like this: The team should have known better and not continued to run him out there. I agree to this point, it’s the predicate of this that’s problematic: So why should the player be penalized? In my opinion, the player is not penalized by counting everything he did on the field. Instead, we are trying to get an accurate picture of the player’s entire career. Everything counts. The player is accurately measured by including his entire body of MLB work. The classic case where this comes into play is Pete Rose. From age 39 on, he racked up -1.4 WAR over 3694 PAs. But not every player earns negative numbers strictly during their baseball senescence. Ted Simmons, for example. The 1981 version of Ted Simmons, slugging catcher, hit 216/262/376 for a whopping 87 OPS+ and 0.3 WAR (BBREF style). He rebounded for 7.3 WAR over the next two years. The wheels came off again in 1984, when he “earned” -2.6 WAR, and from there the end came quickly. In 2016, we saw a younger player in mid-career do exactly what Simmons did. Coming off 38 WAR over 7 years, Andrew McCutchen served up -0.7 WAR. Want some more? Early Wynn had a full season at age 28 where he posted -1.0 WAR. Burleigh Grimes pooped out a -0.5 season at age 25. Jimmy Wynn coughed up a -0.6 hairball at age 29. Lefty Grove would rather have forgotten 1934 (-0.3). Anyway, these seasons exist. They are rare among Hall-level players, of course, but they are there, and they cost their teams wins. To me, not counting those bad seasons is akin to ignoring the F on Johnny’s report card because he otherwise got As and Bs.
A reasonable argument against my position might be that someone like Dizzy Dean or Charlie Keller or Sandy Koufax benefits on a rate basis due to a sudden injury-forced departure rather than a parade of crappy decline seasons. I hear ya. If you think the deck is stacked against long-career players, well, maybe it is. But that brings me back to the important point that JAWS, Hall Rating, CHEWS+, Hall of Fame Monitor, what have you are not gospel. They are sifting mechanisms. Draw up your long list with them, then look closely to see what they fail to capture. Because there’s no bulletproof stat and there’s no silver-bullet number to end all arguments.
Instead, what we have is thoughtful people creating thoughtful tools to get us near to an answer quickly so that we can spend more time on the borderline where the tough decisions are. And that’s what I like about this improvement over CHEWS. It gives me a simpler number as well as more and understandable details to form a decision on. I’ll soon start adding it to the HoME Stats you can find on our Honorees page.