you're reading...
Sidebars, Uncategorized

Introducing CHEWS+

[Editor’s note: This article was heavily revised in August 2017 for improved readability and clarity. It’s the same content but presented in a much better way. And also in October 2017 due to changes in my methodology. The original version weighted peak too high. The new version works much better.]

Beer and tacos. Chocolate and peanut butter. Scouts and stats. Better together! Well that’s where I’m at today.

I cribbed the original idea for CHEWS (CHalek’s Equivalent WAR System) from Jay Jaffe’s JAWS. With apologies to the mustachioed man, I tweaked it a little and gave it a nice punnish name to call my own. As time has gone on, however, I’m more and more drawn to Adam Darowski’s Hall Rating. It indexes Adam’s inputs against a positional average and anyone at 100 is in the Hall of Stats.

So today, I want to introduce my new sifting score. I call it CHEWS+. Here’s why I went to the bother of all this.

  1. Like JAWS, it relies on the player’s highest seven (nonconsecutive) seasonal WAR totals and his career WAR.
  2. Like Hall Rating, it indexes a player to 100.
  3. Like my original CHEWS, it is based on my own equivalent WAR (eqWAR) calculations.
  4. Unlike either of these systems, however, CHEWS+ adds a WAR rate component.
  5. CHEWS+ explicitly combines a positional component and an overall component into the figure it spits back.
  6. CHEWS+ allows me to more clearly see the building blocks of a player’s rating.

In addition, with CHEWS+ I can tell you nine seven pieces of information about any given player’s case for a given Hall and how he stacks up relative to other cases. For example, here’s what I can tell you about Dick Allen:

  • Allen’s 48 WAR peak rates 20% better than a borderline first-base candidate’s (40 eqWAR)
  • Allen’s 62 WAR career rates 2% better than a borderline first-base candidate’s (61 eqWAR)
  • Allen’s 5.5 WAR/650 rate is 25% better than a borderline first-base candidate’s (4.4)
  • Allen ranks 13thamong first basemen with a 113 peak score
  • Allen’s 48 WAR peak rates 20% better than a borderline hitter candidate’s (40 eqWAR)
  • Allen’s 62 WAR career rates 7% better than a borderline hitter candidate’s (58 eqWAR)
  • Allen’s 5.5 WAR/650 rate is 28% better than a borderline first-base candidate’s (4.3)
  • Allen ranks 77thamong all batters with a 116 career score
  • Allen’s 115 CHEWS+ ranks 86thamong all batter candidates.

I like to be able to express all of this if I want to. With JAWS, I can compare to the positional average but why compute 42.5/38.6 if I don’t have to? Indexing to 100 is so much more intuitive. On the other hand, with Hall Rating, I don’t get much of a sense about how the rating works or why. Now I can display that information more simply. When I tell you someone has a peak-oriented case, I can now show you more readily what that really looks like.

So let me tell you how I’m doing this, and then I’ll show you what it looks like. To be honest, not many players’ ranking changed considerably, but a few moves were notable and worth looking at. More will be revealed.

Making the right CHEWS

An important idea about various Halls of Fame that many folks don’t think about is how the positions balance out. Since you have to have nine guys in your batting order, any of them can be an asset or a liability. Seeking some balance is reasonable. So I’ve set the system up to reflect that belief.

  • Assume a 70/30 split between hitters and pitchers (or whatever split you deem optimal) and figure the number of total hitters and pitchers that yields. At our current total of 220 honorees, my 70/30 split gets you 154 hitters and 66 pitchers. For our purposes, we’re going to round overages to the nearest whole number.
  • Next, find the top n hitters by dividing the total we calculated above by 8 fielding positions and then multiplying the number per position by 2: 154 / 8 = 19.25 and 19 * 2 = 38
  • Do the same for pitchers, which are their own single position of course: 2 * 66 = 132.
  • At each position, choose the 38 best candidates by whatever your favorite measuring stick is. I used my old CHEWS to determine these 38, but pick your own poison. At pitcher choose 132.
  • For each hitter and pitcher, determine his peak (add the seasonal WAR values of his seven best non-consecutive years), his career WAR, and his career rate for accumulating WAR. Since they are easy to understand, I use WAR/650 PA and WAR/250 IP. NOTE: Because I make adjustments for schedule and other conditions, for the purpose of calculating WAR/650, I also adjusted a player’s PAs. Otherwise for early-days players I’d be dividing schedule-adjusted WAR by raw PAs, and the 19thCentury guys would look crazy good. For pitchers, I use WAR/250 IP. and my adjusted innings. I didn’t bother with relief pitchers, too few looked good in my system.
  • For each position, find the median peak and career of the 38 players or 131 pitchers.
  • Repeat #4 for every hitter included at each position (don’t need to bother with pitchers).
  • For each player at each position and pitchers, divide his peak and career and rate by their respective medians at his own position.
  • Repeat #6 with the median values calculated in step 5 for all positions.
  • For each player, add average his peak and career values from step 4 to ½ his rate value, divide by 2.5 and multiply the quotient by 100. This is his positional score.
  • Repeat #8 using the values calculated in #7; this is his overall score.
  • The average of steps #8 and #9 is his initial CHEWS+.

I also want to reward top-level performance. To do so, I’ve instituted a small bonus based on the rate at which a player earned his WAR. It is based in two dimensions: the actual rate itself and how long a player’s career was. I give a maximum of 10 bonus points. This helps players like Larry Walker who had reasonably long careers despite missing a lot of games but whose performance was outstanding. There’s a little mixing and matching of mean and median here, but the world won’t end because of it.

  1. Find the mean WAR/650 PA for all hitters in the sample and the mean WAR/250 IP for all the pitchers.
  2. Find the standard deviation of WAR/650 PA for all hitters in the sample, and the mean WAR/250 IP.
  3. Divide the standard deviation by two and add to the mean. All players above this rate qualify to receive the rate bonus. This works out to approximately 20 to 30 percent of the sample.
  4. Examine the rates for hitters and determine a practical maximum WAR/650 PA in the sample. Not the highest, but close. I used 8.5 WAR/650 PA, which is exceeded by just a few players. Let’s not make everyone have to be Babe Ruth here.
  5. Divide a qualifying player’s WAR/650 PA into that practical maximum (8.5 as noted).
  6. Find a practical maximum PAs for hitters. I use 12,000, just 8% of hitters in the sample exceeded that figure. Remember those PA are adjusted by me. Your PA total would vary.
  7. Divide the player’s career PAs into 12,000.
  8. Multiply #5 by #7 and multiply by 10.
  9. Add to the initial CHEWS+ figure we determined above.
  10. Repeat for pitchers using WAR/250 IP. Turns out that 8.5 WAR/250 IP happens to work out for hurlers. And 4,000 innings is the practical maximum here. That innings cap will, however, be different for you unless you adjust innings for usage exactly as I do.

As you can see, CHEWS+ compares against the in/out line, not the average Hall member. I deliberately chose to do so. First, the Hall’s actual in/out line is far lower than the 19th-best player at a given position, or the 154th best hitter. If, for the sake of a thought experiment, we used WAR as our measure of overall value, at shortstop Hughie Jennings, Rabbit Maranville, Phil Rizzuto, and Travis Jackson fall well below the simple standard of being the 19th most valuable at their position. Another four—Joe Sewell, Dave Bancroft, Luis Aparicio, and Joe Tinker—cluster very close to the in/out line. So about 40% of the 21 shortstops inducted into the Hall of Fame don’t have an ironclad claim to being one of the position’s top 19 performers. Just taking a simple measure like career WAR, the median of the top 38 shortstops in history is 58.6 and the 21st highest career WAR is 48.7. But the 19th best Hall of Fame shortstop is at 42.8 career WAR, and the 21st and lowest Hall shortstop is 40.8. Of course, I would never use unadjusted career WAR by itself as my baseline for evaluation, but this thought experiment demonstrates the important point that the problem with the Hall isn’t necessarily that Joe Tinker or Joe Sewell make it in. These guys are simply borderline candidates whose cases, including any qualitative factors, may well be interchangeable with their nearest competitors’. No matter where you draw the line, there will always be a group of interchangeable borderline candidates. The problem, instead, is that players such as Jackson, Maranville, and Rizzuto fall well below the in/out line and drag the line down so far that it ceases to have much meaning.

Second, and just as importantly, many players between the in/out line and the Hall average at their position are fully qualified and easy to vote for. Back to our example, the positional average career WAR for Hall shortstops is 68.1. That figure is yanked upward a bit by Honus Wagner’s 131 career WAR. The following Hall of Fame shortstops fall below 68.1 career WAR but above the 58.6 median career WAR of the top 38 players at the position: Ernie Banks, Joe Cronin, Pee Wee Reese, Monte Ward. Who’s raising their hands to boot those guys out? But a system that matches them against the positional average will see them as below-average candidates who don’t raise the Hall’s standards.

So when you put these two points together, you see why I’m choosing to use the median of the top n shortstops in history for a given measurement. (And why we need multiple measurements of value and achievement, not just career WAR).

CHEWS+ is plug and play. Feel free to substitute your own analytics or stats into this framework as long as they are reasonable. You want to use a 5 year peak or a 10 year prime or not count negative seasons, go for it. You can eliminate a category or weight it differently than I do. Also, if you don’t like using the median n players at a position, feel free to use whatever number makes sense to you! But the important thing in this approach is to carefully select your top n candidates per position and top y pitchers and use their median as the basis of comparison.

Interpreting CHEWS+ is like anything else. It is not intended to populate the HoME like Hall Rating does. It serves as a benchmark. Context is always important, and we should always make mental allowances for potential imbalances and pertinent qualitative factors no matter what measurement we use.

CHEWSing the fat

Now that you see how it is computed, here’s some taste of how it works.

At the positional level, the score for hitters indicates 152 players at 99.5 or greater out of the 154 I was shooting for. Turning to the overall score, it shows 158 such players. And when combined into CHEWS+’, we hit 158, just a few more than we should. Among pitchers, the figure is 63 (we’re looking for 66) with one other greater than 99.0 but less than 99.5. I feel good about these results.

Let’s break it down by position.

    Pos.  Overall
POS Score Score   CHEWS+
C    19     13      16
1B   19     24      22
2B   21     18      19
3B   20     18      21
SS   18     22      19
LF   17     21      20
CF   18     17      18
RF   20     25      23
P    63     63      63
    215    221     221

(Frank Thomas placed at first base, Edgar Martinez and Paul Molitor at third base.)

This distribution passes the sniff test. The positions that are generally underrepresented are here as well, and those generally overrepresented are. Catcher and centerfield have good baseball reasons why they might be a little beneath the rest (catching destroys the body; the defensive spectrum is much shorter for lefty centerfielders without a good throwing arm than for other players).

Here’s the players that CHEWS+ indicates as HoME worthy who aren’t in yet and who they would replace:

POS     IN      CHEWS+ OUT          CHEWS+
Roger Bresnahan*  102   Ted Simmons    99.1
                        Bill Freehan   88
*Technically, Bresnahan is in as a pioneer/player combo

Harry Stovey      101
Jake Beckley      100
*Chance is in as a manager/player combo

Cupid Childs      106   Bobby Doerr    99
                        Jeff Kent      98

John McGraw*      103   Sal Bando      98
Ned Williamson    103
*McGraw is in as a manager/player combo

                        Joe Sewell     99
                        Monte Ward     99
                        George Wright* 93
*Includes no credit for pre-1871 play

                        Roy White      98
                        Jose Cruz      96

George Gore 102
Pete Browning     100   

Vlad Guerrero     100   Reggie Smith  98
                        Sam Rice*     98
*Rice is not credited here for running, double-play avoidance, or throwing-arm value that we’ve written about extensively.

Charlie Buffinton 105   Old Hoss Radbourn 99
Dizzy Dean        101   Bucky Walters     98
Eddie Rommell     101   Whitey Ford       96
                        Don Sutton        94
                        Mordecai Brown    94
*Griffith is in as a manager/player/exec combo

This is a pretty good record. Most of the players in either column fall into one of three categories:

  1. late cuts or selections that we deliberated over for months or years
  2. Nineteenth-century players, of whom we already have too many and went against for chronological balance
  3. Charlie Keller.

Keller’s situation is really simple. He did a lot of damage in very little time. His peak is better than the average, his career well below, but his WAR rate is outstanding. Then again, the guy only accumulated 4600 PA in his real career, and I only adjust it up to 4840 or so. He missed time to World War II, and his body betrayed him, ending his career prematurely. But even so, he barely sneaks over the line by CHEWS+, and anyone below 105 is probably interchangeable with anyone over 95. Especially if they come from an overstuffed position or an overstuffed era (like, say, the 1890s).

Also, a few important caveats apply. Many of these borderline players don’t yet have official PBP data attached to them. Some, may never or won’t for years. Others like Rice might have that information soon for some or all of their campaigns. Our guesstimates for those guys probably lift several of them up over the line. But officially, this is what they look like now.

Finally, let’s zero in on a few players from the list above to see what’s driving their ratings.

                POSITION              | OVERALL               |
NAME            Peak Career Rate Score| Peak Career Rate Score| CHEWS+
Hughie Jennings  111    83   118  101 |  114    87   117  103 |  102
Charlie Keller   102    74   143   99 |  107    80   146  104 |  101
Jim O’Rourke      81   119    70   94 |   85   128    72  100 |   97 
Ted Simmons      101   109    79  100 |   93   111    65   90 |   95 
Ned Williamson   110   101   106  105 |  109    94   103  102 |  104

Dizzy Dean                            |   107   90   138      |  106
Whitey Ford                           |    83  111   101      |   98  

If you detect an orientation toward peak performance, it’s because I have one. In the past I’ve been more cautious about it. But after reading this article and seeing the inclusion of rate-based performance, I felt it was important to include it as well. I used to weight peak at 22% higher than career. Now I rate them equally but also include that rate bonus of up to 10 CHEWS+ points.

We see how this influences CHEWS+ above with peak-first candidates such as Jennings, Keller, Williamson, and Dean getting better scores than O’Rourke, Simmons, and Ford. But we can see buried in all of this how the increased transparency of this system can support good decision making. Ted Simmons is at an even 100 among catchers. If I felt it was specifically important to add another catcher, I would have good justification to do so based on his score among those at his primary position. For someone like Dizzy Dean, I might find persuasive the idea that while his peak is above average, his ability to create value may actually be understated by his peak.

Let’s linger for a moment on Simmons and Dean. I differ with many HOM voters and other writers who toss out seasons below replacement. One argument for doing so goes like this: The team should have known better and not continued to run him out there. I agree to this point, it’s the predicate of this that’s problematic: So why should the player be penalized? In my opinion, the player is not penalized by counting everything he did on the field. Instead, we are trying to get an accurate picture of the player’s entire career. Everything counts. The player is accurately measured by including his entire body of MLB work. The classic case where this comes into play is Pete Rose. From age 39 on, he racked up -1.4 WAR over 3694 PAs. But not every player earns negative numbers strictly during their baseball senescence. Ted Simmons, for example. The 1981 version of Ted Simmons, slugging catcher, hit 216/262/376 for a whopping 87 OPS+ and 0.3 WAR (BBREF style). He rebounded for 7.3 WAR over the next two years. The wheels came off again in 1984, when he “earned” -2.6 WAR, and from there the end came quickly. In 2016, we saw a younger player in mid-career do exactly what Simmons did. Coming off 38 WAR over 7 years, Andrew McCutchen served up -0.7 WAR. Want some more? Early Wynn had a full season at age 28 where he posted -1.0 WAR. Burleigh Grimes pooped out a -0.5 season at age 25. Jimmy Wynn coughed up a -0.6 hairball at age 29. Lefty Grove would rather have forgotten 1934 (-0.3). Anyway, these seasons exist. They are rare among Hall-level players, of course, but they are there, and they cost their teams wins. To me, not counting those bad seasons is akin to ignoring the F on Johnny’s report card because he otherwise got As and Bs.

A reasonable argument against my position might be that someone like Dizzy Dean or Charlie Keller or Sandy Koufax benefits on a rate basis due to a sudden injury-forced departure rather than a parade of crappy decline seasons. I hear ya. If you think the deck is stacked against long-career players, well, maybe it is. But that brings me back to the important point that JAWS, Hall Rating, CHEWS+, Hall of Fame Monitor, what have you are not gospel. They are sifting mechanisms. Draw up your long list with them, then look closely to see what they fail to capture. Because there’s no bulletproof stat and there’s no silver-bullet number to end all arguments.

Instead, what we have is thoughtful people creating thoughtful tools to get us near to an answer quickly so that we can spend more time on the borderline where the tough decisions are. And that’s what I like about this improvement over CHEWS. It gives me a simpler number as well as more and understandable details to form a decision on. I’ll soon start adding it to the HoME Stats you can find on our Honorees page.



3 thoughts on “Introducing CHEWS+

  1. Great stuff Eric, looking forward to the full analysis!

    What type of positional adjustment does Tenace have with his 102 rank?
    Wow on Simmons dropping off, good points on his negative campaigns, looks like a chance at top 100 status if you remove them…do you guys also consider RE24 and clutch value as part of the picture…Ted adds almost 9 wins.

    Monte Ward fared well in the old CHEWS, he’s so tough to place for me as a positon player/pitcher combo, interested in what your new research unveils.

    Sad to see O’Rourke collapse in the new CHEWS, I always thought of him as a mid-level HOFer.

    A bit surprised on how well Mike Griffin does, rises above the glut of 1880/90s OFs? Almost to George Gore’s level 🙂

    How do you handle league quality adjustments for 1880s hurlers?
    I’m a supporter of Pud Galvin but not of Bob Caruthers…Galvin has an extraordinary long career (even excluding some great International Association work) with some negative value seasons dragging his rate low, while Caruthers career was quite short and he took a little or a lot? advantage of some weaker AA leagues.
    McCormick is a B-R slam dunk and a B-G dud, how do you handle the 1884 Union Association?

    Rommel is interesting, the 106 doesn’t include an adjustment for relief work? He is terrible with FIP WAR, but not sure that matters too much from his time era.

    Rucker benefits tremendously from rate, very short career but fine peak.

    Ford is of interest, fares mediocre at Baseball Reference and Gauge, is solid at Fangraphs or Baseball Prospectus, and is very impressive under a WPA or Tom Thress Win-Loss lens.

    Posted by Ryan | April 17, 2017, 7:43 pm
    • Lots to talk about here. Going point by point.
      I’m not sure what u mean by positional adjustment. I adjust catchers as part of my WAR twiddling. They get extra love. But in terms of calculating CHEWS+, it’s the same as other positions, an average of their rating against position and against all hitters.

      Simmons and O’Rourke: As it turns out, both were at the bottom end of their respective positions in the old CHEWS anyway. It took me about 100 “years” of votes to elect O’Rourke.

      RE24 etc: I don’t currently include that information. Since about 40% of history doesn’t have the PBP to make that viable, I prefer not to use the info as part of the sifter. I am also a little leery of it and WPA. I am not certain enough that what is measured is a reflection of skill or variance. Anything highly context dependent sends that red flag up for me.

      Ward: WIZWIG from me. There’s no additional research on him at this time. Ward is something of a borederliner for me as well. To paraphrase someone else, he is Maury Wills plus five or six years of decent TO very good pitching. That’s enough I think.

      Griffin: He, Gore, Lemon, Cedeno, and Bernie were all clustered beneath the in/out line before and are clustered around it now. That said, we have plenty of 1899s guys, so he’s no threat to draw my vote. And Child’s would be in line ahead if him, and I don’t think electing him is a good idea.

      1880s pitchers: You can get all the details in the Our Methods section of the site, but basically, the value between replacement and average for 1880s pitchers is humongous. So that’s where I debit them to bring them into some kind of alignment with latter day hurlers. Even so, my system
      Is challenged by these dudes, as is JAWS. A lot of systems will identify some combo of the following as good candidates: Keefe, McCormick, Galvin, Caruthers, Welch, Buffinton, Whitney, King, Clarkson, Radbourn, Mullane, Wil White. Oh, and a small portion of Ward. When you step back from that, it’s crazy to elect more than a handful of these guys. In the 1880s guys still threw underhand. As late as 1887, the league’s were still futzing with the rules on called balls and strikes. The foul-strike rule wasn’t around yet. There wasn’t a mound yet. The pitching box was was fifty feet away and you could do all sorts of hijinks in it. Relief pitching hasn’t been invented. And pitchers threw 400-600 innings a year. Is WAR or any other system likely capturing the value of these pitchers effectively? And more important, there were very few long career pitchers then because the best pitchers gobbled up so many innings. You’d be electing a far higher proportion of the available pitchers than seems prudent to me. An objection to my line of thinking says that because the pitchers are so many innings, they must also be more valuable than later eras and should be elected in higher quantities. Balderdash says I. By that logic, from about 1993 on, we would be electing far fewer pitchers than we do. And anyway, if every guy had a dude throwing 400-600 innings, how special is it really?

      UA: I use career-WAR/IP or WAR/PA and apply to the playing time. Seems like the fairest thing to do.

      Rommel: I equivocate on Rommel. BBREF likes him a lot. But he did an awful lot of relief pitching that feels inessential compared to Grove’s. I think we need more info.

      Rucker: Again, Rucker was just off my ballot and now he could be on it. At the borederline there are at least two dozen pitchers each with quality candidacies. Career voter? Got some TJ for you to love. Like peak? Say hello to Siz and Nap. A little short on Deadball Dudes? Wilbur Cooper all the way. I lean to peak, and Rucker seems like a reasonable person to vote up. But players like him are why I use WAR rate at half strength. Maybe it should be at 33%? I don’t know yet.

      We’ve written extensively on Ford here. I think that Ford was the right guy in the right
      Park in front of the right defense, for the right manager, in the weaker league. I don’t like FG’s pitching WAR. I’m not convinced that pitchers have no ability to affect BIP. In fact, we know they do. FG themselves wrote years ago about how some pitchers get more pop-ups than others. Pitchers like TJ get more DPs due to sinkerballing. I’m not smart enough to know all the math, but as I recollect research has subsequently demonstrated that pitchers do have an effect on quality of contact. Anyway, I think Ford had as auspicious or propitious or fortunate set of circumstances as any pitcher ever. He made good, did a little cheating to get there and never pitched at Fenway. His winning percentage and ERA+ reflect that.

      Posted by eric | April 17, 2017, 9:57 pm
      • Thanks for the quick and thorough reply.

        Regarding Tenace, what’s his catching bonus versus a full-time catcher (i.e. Tenace gets xx % boost and Fisk gets xx %). I ask, as a health bonus puts him over the edge, while a conservative figure usually keeps him at the bubble or below. Tenace gives back a couple of wins with RE24/clutch as a tie breaker of sorts too.

        Simmons I see at the bottom, while O’Rourke scored a 52…sandwiched with OF Evans, Magee, Lofton, Sheffield, and Snider.

        I bring up RE24 as it has a tangible impact on games, whether we can parse exactly how much skill/luck is involved. More of the contact/speed guys end up doing well on this, while GIDP/slow a foot machines stink up the joint. It is unfortunate that we are missing ~40% of history on this.

        Good to see Childs rise above the heap of OF; side note that Lemon is awful in RE24/clutch, giving back almost 10 wins of value.

        For 1880s guys, Guy Hecker has a massive 1884 and some quality otherwise too. Nice points on this group in general, but my main interest is in how you handle discounts for the American Association versus the National League…this has a large impact on Caruthers, McPhee, Browning, Stovey, and possibly others. For Stovey, does it make some difference comparing him against OF, as he played 946 versus 550 at 1B.

        For the peaky/prime pitchers, how do Tommy Bond, Jim Whitney, Noodles Hahn, and Dizzy Trout fair compared with Rucker?

        For Ford, it just seems that Fangraphs or Tom Thress could/are picking up on something that Whitey excelled at that isn’t being captured by Baseball Gauge or Reference, but your arguments are certainly persuasive.

        Posted by Ryan | April 18, 2017, 9:29 am

Tell us what you think!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Institutional History

%d bloggers like this: