you're reading...
Sidebars, Uncategorized

Introducing CHEWS+

[Editor’s note: This article was heavily revised in August 2017 for improved readability and clarity. It’s the same content but presented in a much better way.]

Beer and tacos. Chocolate and peanut butter. Scouts and stats. Better together! Well that’s where I’m at today.

I cribbed the original idea for CHEWS (CHalek’s Equivalent WAR System) from Jay Jaffe’s JAWS. With apologies to the mustachioed man, I tweaked it a little and gave it a nice punnish name to call my own. As time has gone on, however, I’m more and more drawn to Adam Darowski’s Hall Rating. It indexes Adam’s inputs against a positional average and anyone at 100 is in the Hall of Stats.

So today, I want to introduce my new sifting score. I call it CHEWS+. Here’s why I went to the bother of all this.

  1. Like JAWS, it relies on the player’s highest seven (nonconsecutive) seasonal WAR totals and his career WAR.
  2. Like Hall Rating, it indexes a player to 100.
  3. Like my original CHEWS, it is based on my own equivalent WAR (eqWAR) calculations.
  4. Unlike either of these systems, however, CHEWS+ adds a WAR rate component.
  5. CHEWS+ explicitly combines a positional component and an overall component into the figure it spits back.
  6. CHEWS+ allows me to more clearly see the building blocks of a player’s rating.

In addition, with CHEWS+ I can tell you nine pieces of information about any given player’s case for a given Hall and how he stacks up relative to other cases. For example, here’s what I can tell you about Dick Allen:

  • Allen’s 48 WAR peak rates 20% better than a borderline first-base candidate’s (40 eqWAR)
  • Allen’s 62 WAR career rates 2% better than a borderline first-base candidate’s (61 eqWAR)
  • Allen’s 5.5 WAR/650 rate is 25% better than a borderline first-base candidate’s (4.4)
  • Allen ranks 13thamong first basemen with a 113 peak score
  • Allen’s 48 WAR peak rates 20% better than a borderline hitter candidate’s (40 eqWAR)
  • Allen’s 62 WAR career rates 7% better than a borderline hitter candidate’s (58 eqWAR)
  • Allen’s 5.5 WAR/650 rate is 28% better than a borderline first-base candidate’s (4.3)
  • Allen ranks 77thamong all batters with a 116 career score
  • Allen’s 115 CHEWS+ ranks 86thamong all batter candidates.

I like to be able to express all of this if I want to. With JAWS, I can compare to the positional average but why compute 42.5/38.6 if I don’t have to? Indexing to 100 is so much more intuitive. On the other hand, with Hall Rating, I don’t get much of a sense about how the rating works or why. Now I can display that information more simply. When I tell you someone has a peak-oriented case, I can now show you more readily what that really looks like.

So let me tell you how I’m doing this, and then I’ll show you what it looks like. To be honest, not many players’ ranking changed considerably, but a few moves were notable and worth looking at. More will be revealed.

Making the right CHEWS

An important idea about various Halls of Fame that many folks don’t think about is how the positions balance out. Since you have to have nine guys in your batting order, any of them can be an asset or a liability. Seeking some balance is reasonable. So I’ve set the system up to reflect that belief.

  • Assume a 70/30 split between hitters and pitchers (or whatever split you deem optimal) and figure the number of total hitters and pitchers that yields. At our current total of 220 honorees, my 70/30 split gets you 154 hitters and 66 pitchers. For our purposes, we’re going to round overages to the nearest whole number.
  • Next, find the top n hitters by dividing the total we calculated above by 8 fielding positions and then multiplying the number per position by 2: 154 / 8 = 19.25 and 19 * 2 = 38
  • Do the same for pitchers, which are their own single position of course: 2 * 66 = 132.
  • At each position, choose the 38 best candidates by whatever your favorite measuring stick is. I used my old CHEWS to determine these 38, but pick your own poison. At pitcher choose 132.
  • For each hitter and pitcher, determine his peak (add the seasonal WAR values of his seven best non-consecutive years), his career WAR, and his career rate for accumulating WAR (I use WAR/650 PAs because it’s easy to understand). NOTE: Because I make adjustments for schedule and other conditions, for the purpose of calculating WAR/650, I also adjusted a player’s PAs. Otherwise for early-days players I’d be dividing schedule-adjusted WAR by raw PAs, and the 19thCentury guys would look crazy good. For pitchers, I use WAR/250 IP and my adjusted innings. I didn’t bother with relief pitchers, too few looked good in my system.
  • For each position, find the median peak, career, and rate values of the 38 players or 131 pitchers.
  • Repeat #4 for every hitter included at each position (don’t need to bother with pitchers).
  • For each player at each position and pitchers, divide his peak, career, and rate by their respective medians at his own position.
  • Repeat #6 with the median values calculated in step 5 for all positions.
  • For each player, add his peak and career values from step 4 to ½ his rate value, divide by 2.5, and multiply the quotient by 100. This is his positional score.
  • Repeat #8 using the values calculated in #7; this is his overall score.
  • The average of steps #8 and #9 is his CHEWS+.

As you can see, CHEWS+ compares against the in/out line, not the average Hall member. I deliberately chose to do so. First, the Hall’s actual in/out line is far lower than the 19th-best player at a given position, or the 154th best hitter. If, for the sake of a thought experiment, we used WAR as our measure of overall value, at shortstop Hughie Jennings, Rabbit Maranville, Phil Rizzuto, and Travis Jackson fall well below the simple standard of being the 19th most valuable at their position. Another four—Joe Sewell, Dave Bancroft, Luis Aparicio, and Joe Tinker—cluster very close to the in/out line. So about 40% of the 21 shortstops inducted into the Hall of Fame don’t have an ironclad claim to being one of the position’s top 19 performers. Just taking a simple measure like career WAR, the median of the top 38 shortstops in history is 58.6 and the 21st highest career WAR is 48.7. But the 19th best Hall of Fame shortstop is at 42.8 career WAR, and the 21st and lowest Hall shortstop is 40.8. Of course, I would never use unadjusted career WAR by itself as my baseline for evaluation, but this thought experiment demonstrates the important point that the problem with the Hall isn’t necessarily that Joe Tinker or Joe Sewell make it in. These guys are simply borderline candidates whose cases, including any qualitative factors, may well be interchangeable with their nearest competitors’. No matter where you draw the line, there will always be a group of interchangeable borderline candidates. The problem, instead, is that players such as Jackson, Maranville, and Rizzuto fall well below the in/out line and drag the line down so far that it ceases to have much meaning.

Second, and just as importantly, many players between the in/out line and the Hall average at their position are fully qualified and easy to vote for. Back to our example, the positional average career WAR for Hall shortstops is 68.1. That figure is yanked upward a bit by Honus Wagner’s 131 career WAR. The following Hall of Fame shortstops fall below 68.1 career WAR but above the 58.6 median career WAR of the top 38 players at the position: Ernie Banks, Joe Cronin, Pee Wee Reese, Monte Ward. Who’s raising their hands to boot those guys out? But a system that matches them against the positional average will see them as below-average candidates who don’t raise the Hall’s standards.

So when you put these two points together, you see why I’m choosing to use the median of the top n shortstops in history for a given measurement. (And why we need multiple measurements of value and achievement, not just career WAR).

CHEWS+ is plug and play. Feel free to substitute your own analytics or stats into this framework as long as they are reasonable. You want to use a 5 year peak or a 10 year prime or not count negative seasons, go for it. You can eliminate a category or weight it differently than I do. Also, if you don’t like using the median n players at a position, feel free to use whatever number makes sense to you! But the important thing in this approach is to carefully select your top n candidates per position and top y pitchers and use their median as the basis of comparison.

Interpreting CHEWS+ is like anything else. It is not intended to populate the HoME like Hall Rating does. It serves as a benchmark. Context is always important, and we should always make mental allowances for potential imbalances and pertinent qualitative factors no matter what measurement we use.

CHEWSing the fat

Now that you see how it works, here’s some taste of how it works.

At the positional level, the score for hitters indicates 152 players at 99.5 or greater out of the 154 I was shooting for. Turning to the overall score, it shows 157 such players. And when combined into CHEWS+’, we hit 154 exactly, as we should. Among pitchers, the figure is 65 (we’re looking for 66) with one other greater than 99.0 but less than 99.5. I feel good about these results.

Let’s break it down by position.

    Pos.  Overall
POS Score Score   CHEWS+
C    19     14      17
1B   19     23      22
2B   20     18      19
3B   20     18      19
SS   19     21      19
LF   17     23      20
CF   19     17      18
RF   19     23      20
P    65     65      65
    217    222     219

This distribution passes the sniff test. The positions that are generally underrepresented are here as well, and those generally overrepresented are. Catcher and centerfield have good baseball reasons why they might be a little beneath the rest (catching destroys the body; the defensive spectrum is much shorter for lefty centerfielders without a good throwing arm than for other players). The one weirdo is the position score for leftfield, but this is rectified by the time we reach CHEWS+.

Here’s the players that CHEWS+ indicates as HoME worthy who aren’t in yet and who they would replace:

POS     IN      CHEWS+ OUT          CHEWS+
Gene Tenace       102   Ted Simmons    95
Roger Bresnahan*  102
*Technically, Bresnahan is in as a pioneer/player combo

Will Clark        101   
Frank Chance*     101
Harry Stovey      100
*Chance is in as a manager/player combo

Cupid Childs      106   Bobby Doerr    99
                        Jeff Kent      96

John McGraw*      108   Sal Bando      99
Ned Williamson    104
Heinie Groh       101
*McGraw is in as a manager/player combo
Hughie Jennings   102   Dave Bancroft  99
                        Joe Sewell     99
                        Monte Ward     97
                        George Wright* 96
*Includes no credit for pre-1871 play
Charlie Keller    101   Zack Wheat     99
                        Jose Cruz      97
                        Jim O’Rourke   97

Pete Browning     105   
George Gore       102
Mike Griffin      100

Vlad Guerrero     101   Sammy Sosa    99
                        Willie Keeler 98
                        Dave Winfield 98
                        Harry Hooper* 97
                        Sam Rice*     94
*Neither Hooper nor Rice is credited for running, double-play avoidance, or throwing-arm value that we’ve written about extensively.
Bob Caruthers     116   Whitey Ford   98
Charlie Buffinton 107   Bucky Walters 97
Dizzy Dean        106   Pud Galvin    95
Eddie Rommell     106   Early Wynn    94
Jim McCormick     103  
Nap Rucker        102
Clark Griffith*   101
*Griffith is in as a manager/player/exec combo

This is a pretty good record. Most of the players in either column fall into one of three categories:

  1. late cuts or selections that we deliberated over for months or years
  2. Nineteenth-century players, of whom we already have too many and went against for chronological balance
  3. Charlie Keller.

Keller’s situation is really simple. He did a lot of damage in very little time. His peak is better than the average, his career well below, but his WAR rate is outstanding. Then again, the guy only accumulated 4600 PA in his real career, and I only adjust it up to 4840 or so. He missed time to World War II, and his body betrayed him, ending his career prematurely. But even so, he barely sneaks over the line by CHEWS+, and anyone below 105 is probably interchangeable with anyone over 95. Especially if they come from an overstuffed position or an overstuffed era (like, say, the 1890s).

Also, a few important caveats apply. Many of these borderline players don’t yet have official PBP data attached to them. Some, like Keeler, may never or won’t for years. Others like Rice or Hooper might have that information soon for some or all of their campaigns. Our guesstimates for those guys probably lift several of them up over the line. But officially, this is what they look like now.
Finally, let’s zero in on a few players from the list above to see what’s driving their ratings.

                POSITION              | OVERALL               |
NAME            Peak Career Rate Score| Peak Career Rate Score| CHEWS+
Hughie Jennings  111    83   118  101 |  114    87   117  103 |  102
Charlie Keller   102    74   143   99 |  107    80   146  104 |  101
Jim O’Rourke      81   119    70   94 |   85   128    72  100 |   97 
Ted Simmons      101   109    79  100 |   93   111    65   90 |   95 
Ned Williamson   110   101   106  105 |  109    94   103  102 |  104

Dizzy Dean                            |   107   90   138      |  106
Whitey Ford                           |    83  111   101      |   98  

If you detect an orientation toward peak performance, it’s because I have one. In the past I’ve been more cautious about it. But after reading this article and seeing the inclusion of rate-based performance, I felt it was important to include it as well. I used to weight peak at 22% higher than career. Now I rate them equally but also include rate at 50%.

We see how this influences CHEWS+ above with peak-first candidates such as Jennings, Keller, Williamson, and Dean getting better scores than O’Rourke, Simmons, and Ford. But we can see buried in all of this how the increased transparency of this system can support good decision making. Ted Simmons is at an even 100 among catchers. If I felt it was specifically important to add another catcher, I would have good justification to do so based on his score among those at his primary position. For someone like Dizzy Dean, I might find persuasive the idea that while his peak is above average, his ability to create value may actually be understated by his peak.

Let’s linger for a moment on Simmons and Dean. I differ with many HOM voters and other writers who toss out seasons below replacement. One argument for doing so goes like this: The team should have known better and not continued to run him out there. I agree to this point, it’s the predicate of this that’s problematic: So why should the player be penalized? In my opinion, the player is not penalized by counting everything he did on the field. Instead, we are trying to get an accurate picture of the player’s entire career. Everything counts. The player is accurately measured by including his entire body of MLB work. The classic case where this comes into play is Pete Rose. From age 39 on, he racked up -1.4 WAR over 3694 PAs. But not every player earns negative numbers strictly during their baseball senescence. Ted Simmons, for example. The 1981 version of Ted Simmons, slugging catcher, hit 216/262/376 for a whopping 87 OPS+ and 0.3 WAR (BBREF style). He rebounded for 7.3 WAR over the next two years. The wheels came off again in 1984, when he “earned” -2.6 WAR, and from there the end came quickly. In 2016, we saw a younger player in mid-career do exactly what Simmons did. Coming off 38 WAR over 7 years, Andrew McCutchen served up -0.7 WAR. Want some more? Early Wynn had a full season at age 28 where he posted -1.0 WAR. Burleigh Grimes pooped out a -0.5 season at age 25. Jimmy Wynn coughed up a -0.6 hairball at age 29. Lefty Grove would rather have forgotten 1934 (-0.3). Anyway, these seasons exist. They are rare among Hall-level players, of course, but they are there, and they cost their teams wins. To me, not counting those bad seasons is akin to ignoring the F on Johnny’s report card because he otherwise got As and Bs.

A reasonable argument against my position might be that someone like Dizzy Dean or Charlie Keller or Sandy Koufax benefits on a rate basis due to a sudden injury-forced departure rather than a parade of crappy decline seasons. I hear ya. If you think the deck is stacked against long-career players, well, maybe it is. But that brings me back to the important point that JAWS, Hall Rating, CHEWS+, Hall of Fame Monitor, what have you are not gospel. They are sifting mechanisms. Draw up your long list with them, then look closely to see what they fail to capture. Because there’s no bulletproof stat and there’s no silver-bullet number to end all arguments.

Instead, what we have is thoughtful people creating thoughtful tools to get us near to an answer quickly so that we can spend more time on the borderline where the tough decisions are. And that’s what I like about this improvement over CHEWS. It gives me a simpler number as well as more and understandable details to form a decision on. I’ll soon start adding it to the HoME Stats you can find on our Honorees page.



3 thoughts on “Introducing CHEWS+

  1. Great stuff Eric, looking forward to the full analysis!

    What type of positional adjustment does Tenace have with his 102 rank?
    Wow on Simmons dropping off, good points on his negative campaigns, looks like a chance at top 100 status if you remove them…do you guys also consider RE24 and clutch value as part of the picture…Ted adds almost 9 wins.

    Monte Ward fared well in the old CHEWS, he’s so tough to place for me as a positon player/pitcher combo, interested in what your new research unveils.

    Sad to see O’Rourke collapse in the new CHEWS, I always thought of him as a mid-level HOFer.

    A bit surprised on how well Mike Griffin does, rises above the glut of 1880/90s OFs? Almost to George Gore’s level 🙂

    How do you handle league quality adjustments for 1880s hurlers?
    I’m a supporter of Pud Galvin but not of Bob Caruthers…Galvin has an extraordinary long career (even excluding some great International Association work) with some negative value seasons dragging his rate low, while Caruthers career was quite short and he took a little or a lot? advantage of some weaker AA leagues.
    McCormick is a B-R slam dunk and a B-G dud, how do you handle the 1884 Union Association?

    Rommel is interesting, the 106 doesn’t include an adjustment for relief work? He is terrible with FIP WAR, but not sure that matters too much from his time era.

    Rucker benefits tremendously from rate, very short career but fine peak.

    Ford is of interest, fares mediocre at Baseball Reference and Gauge, is solid at Fangraphs or Baseball Prospectus, and is very impressive under a WPA or Tom Thress Win-Loss lens.

    Posted by Ryan | April 17, 2017, 7:43 pm
    • Lots to talk about here. Going point by point.
      I’m not sure what u mean by positional adjustment. I adjust catchers as part of my WAR twiddling. They get extra love. But in terms of calculating CHEWS+, it’s the same as other positions, an average of their rating against position and against all hitters.

      Simmons and O’Rourke: As it turns out, both were at the bottom end of their respective positions in the old CHEWS anyway. It took me about 100 “years” of votes to elect O’Rourke.

      RE24 etc: I don’t currently include that information. Since about 40% of history doesn’t have the PBP to make that viable, I prefer not to use the info as part of the sifter. I am also a little leery of it and WPA. I am not certain enough that what is measured is a reflection of skill or variance. Anything highly context dependent sends that red flag up for me.

      Ward: WIZWIG from me. There’s no additional research on him at this time. Ward is something of a borederliner for me as well. To paraphrase someone else, he is Maury Wills plus five or six years of decent TO very good pitching. That’s enough I think.

      Griffin: He, Gore, Lemon, Cedeno, and Bernie were all clustered beneath the in/out line before and are clustered around it now. That said, we have plenty of 1899s guys, so he’s no threat to draw my vote. And Child’s would be in line ahead if him, and I don’t think electing him is a good idea.

      1880s pitchers: You can get all the details in the Our Methods section of the site, but basically, the value between replacement and average for 1880s pitchers is humongous. So that’s where I debit them to bring them into some kind of alignment with latter day hurlers. Even so, my system
      Is challenged by these dudes, as is JAWS. A lot of systems will identify some combo of the following as good candidates: Keefe, McCormick, Galvin, Caruthers, Welch, Buffinton, Whitney, King, Clarkson, Radbourn, Mullane, Wil White. Oh, and a small portion of Ward. When you step back from that, it’s crazy to elect more than a handful of these guys. In the 1880s guys still threw underhand. As late as 1887, the league’s were still futzing with the rules on called balls and strikes. The foul-strike rule wasn’t around yet. There wasn’t a mound yet. The pitching box was was fifty feet away and you could do all sorts of hijinks in it. Relief pitching hasn’t been invented. And pitchers threw 400-600 innings a year. Is WAR or any other system likely capturing the value of these pitchers effectively? And more important, there were very few long career pitchers then because the best pitchers gobbled up so many innings. You’d be electing a far higher proportion of the available pitchers than seems prudent to me. An objection to my line of thinking says that because the pitchers are so many innings, they must also be more valuable than later eras and should be elected in higher quantities. Balderdash says I. By that logic, from about 1993 on, we would be electing far fewer pitchers than we do. And anyway, if every guy had a dude throwing 400-600 innings, how special is it really?

      UA: I use career-WAR/IP or WAR/PA and apply to the playing time. Seems like the fairest thing to do.

      Rommel: I equivocate on Rommel. BBREF likes him a lot. But he did an awful lot of relief pitching that feels inessential compared to Grove’s. I think we need more info.

      Rucker: Again, Rucker was just off my ballot and now he could be on it. At the borederline there are at least two dozen pitchers each with quality candidacies. Career voter? Got some TJ for you to love. Like peak? Say hello to Siz and Nap. A little short on Deadball Dudes? Wilbur Cooper all the way. I lean to peak, and Rucker seems like a reasonable person to vote up. But players like him are why I use WAR rate at half strength. Maybe it should be at 33%? I don’t know yet.

      We’ve written extensively on Ford here. I think that Ford was the right guy in the right
      Park in front of the right defense, for the right manager, in the weaker league. I don’t like FG’s pitching WAR. I’m not convinced that pitchers have no ability to affect BIP. In fact, we know they do. FG themselves wrote years ago about how some pitchers get more pop-ups than others. Pitchers like TJ get more DPs due to sinkerballing. I’m not smart enough to know all the math, but as I recollect research has subsequently demonstrated that pitchers do have an effect on quality of contact. Anyway, I think Ford had as auspicious or propitious or fortunate set of circumstances as any pitcher ever. He made good, did a little cheating to get there and never pitched at Fenway. His winning percentage and ERA+ reflect that.

      Posted by eric | April 17, 2017, 9:57 pm
      • Thanks for the quick and thorough reply.

        Regarding Tenace, what’s his catching bonus versus a full-time catcher (i.e. Tenace gets xx % boost and Fisk gets xx %). I ask, as a health bonus puts him over the edge, while a conservative figure usually keeps him at the bubble or below. Tenace gives back a couple of wins with RE24/clutch as a tie breaker of sorts too.

        Simmons I see at the bottom, while O’Rourke scored a 52…sandwiched with OF Evans, Magee, Lofton, Sheffield, and Snider.

        I bring up RE24 as it has a tangible impact on games, whether we can parse exactly how much skill/luck is involved. More of the contact/speed guys end up doing well on this, while GIDP/slow a foot machines stink up the joint. It is unfortunate that we are missing ~40% of history on this.

        Good to see Childs rise above the heap of OF; side note that Lemon is awful in RE24/clutch, giving back almost 10 wins of value.

        For 1880s guys, Guy Hecker has a massive 1884 and some quality otherwise too. Nice points on this group in general, but my main interest is in how you handle discounts for the American Association versus the National League…this has a large impact on Caruthers, McPhee, Browning, Stovey, and possibly others. For Stovey, does it make some difference comparing him against OF, as he played 946 versus 550 at 1B.

        For the peaky/prime pitchers, how do Tommy Bond, Jim Whitney, Noodles Hahn, and Dizzy Trout fair compared with Rucker?

        For Ford, it just seems that Fangraphs or Tom Thress could/are picking up on something that Whitey excelled at that isn’t being captured by Baseball Gauge or Reference, but your arguments are certainly persuasive.

        Posted by Ryan | April 18, 2017, 9:29 am

Tell us what you think!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Institutional History

%d bloggers like this: