Fooling Hitters, 2014

Over at Camden Chat I graphed this year’s starters on how much they got hitters to chase pitches out of the zone, with a focus on Sir Walks-a-Lot Ubaldo Jimenez.

I had a lot of fun planning and writing this article; I’m happy it was so well-received!

Reflections on Learning about Sabermetrics

This week is the third week of the Sabermetrics 101: Introduction to Baseball Analytics course at edX. So far I’m really enjoying it. The course is about being a data scientist as that discipline pertains to baseball. They talk about how being a data scientist is about the convergence of three areas: domain expertise (knowledge of the subject matter you are analyzing), computer skills & hacking, and math & statistical knowledge. (The actual slide presented is this one, developed by Drew Conway.) I’m enjoying learning SQL, why certain metrics are better than others, and in general how to approach baseball analysis scientifically.

I knew a small bit about sabermetrics going into the course. Sites like FanGraphs are invaluable for not only providing lots of data, but leading you to data that are meaningful by highlighting certain ones in their blog posts. Because of these and other sites I have gone from using OPS+ to wOBA/wRC+ as my go-to hitting metrics. I have seen the importance of walk rate, strikeout rate, and home run to fly ball ratio for both pitchers and hitters (but primarily pitchers). I’ve had experience using to see (literally, using the wonderful graphs on the site) how a pitcher’s pitch selection has changed over time as they age and how they attack certain hitters over time. I’ve seen how BABIP is deployed and used in analysis and am starting to get a sense of its limitations. I have started to understand concepts like correlation/causation, linear weights, regression to the mean, win probability, leverage, and others that help me sift through the noise and find a signal, to coin a phrase from Nate Silver. Speaking of Silver, it’s through his book that I came to find Bayesian inference as a tool that I both understand and find useful.

I’ve even had a chance to develop some Python skills. I am not a programmer but I play one on TV, having taken some classes in high school/college and worked in technical writing for software engineering companies for 10 years now. I’ve also been lucky to have the chance to put a lot of this research/knowledge in writing.

So when I came across the Sabermetrics 101 course I was intrigued. My scattershot approach was good from an autodidactic point of view but the straight line of a class is appealing to me. I came to the class hoping to get a foundational knowledge of the principles and concepts used in sabermetrics so I could really understand why they are used and not just that they are used. I feel this knowledge will help inform my writing, not only helping me write about interesting/informative things but also making sure I write about them accurately and precisely.

I also think that the principles of sabermetrics are applicable in other areas, which is another reason I a pretty gung-ho about getting deep into the class. The whole idea of using objective analysis to find out what metric(s) are important, the situations in which they are important or can be de-emphasized, and what factors contribute most to these metrics is fascinating to me. So is learning to become a better writer. And so is learning to become a more informed baseball fan, which helps me enjoy the game. An analogy: when I was a techno DJ, many people asked me if knowing how to spin records made it harder to listen to other DJs, since I could pick up on their mistakes. I always thought this was true, but I preferred to focus on the fact that I could also appreciate when another DJ was doing well and explain that to others (should they be interested in it).

I feel the same way about learning sabermetrics … I can now feel the impact of seeing someone’s wRC+ compared to another player’s, because I know what that means. This information informs the way I watch games and, as a big Orioles fan, how I feel when something happens. Like when Nelson Cruz faces a left-handed pitcher this year, I get excited, because he is destroying them in 2014, and I can help quantify by how much he is tearing them apart. I can also understand just how meaningful it is when Adam Jones walks because, well, he hardly ever does that. I can also do things like estimate how much he is impacting the team by not walking more often.

I guess it all comes down to mastery of skills, a desire that everybody has. Who doesn’t enjoy mastering a skill?

Speaking of which, as I talk more and more with friends and acquaintances about learning sabermetrics, they question they often ask me is — what will I do with this skillset? Most bring up gambling, but am not a risk-taker by nature (especially not with money) so that would be tough for me to swallow. I don’t have a desire to work in an major-league front office but I’d be curious what that’s like. At this point I’d say, I want to use sabermetrics along with writing to learn how to explain the game better to others, to help them feel what I feel when I look at Chris Davis’s heat map, to help them understand what it means when Ubaldo Jimenez relies on his fastball instead of his sinker. I’m currently doing this at CamdenChat, which is awesome because the site combines my fan interest in the Orioles with writing and sabermetrics.

Someday I’d love to get a job at FanGraphs, The Hardball Times, and/or another site/publication, or start my own site. Who knows. Until I figure that out, I have Module 3 of the course to complete this week!

Demonic Velocity

In 2014 it seems like every young promising pitcher is going under the knife for Tommy John surgery. Baseball fans, teams, staff, writers, and analysts have written a lot of digital ink about what the causes of this seeming epidemic may be.

One writer, Joe Sheehan, piqued my interest in the May 28th edition of his newsletter. In it he explores the odds that velocity is a strong predictor of whether a pitcher has had or will have Tommy John surgery. He wonders in particular whether starting pitchers whose fastballs average 95+ MPH are at an increased risk vs. their counterparts. Hence his use of the word “demon” to describe the phenomenon:

I’m thinking of “The Right Stuff” here. The movie — it’s one of the few Tom Wolfe books I haven’t read — opens with a narration about “the demon” that lives at Mach 1, the speed of sound. Maybe, and I am spitballing here, 95 mph is where the demon lives when it comes to elbows.

(p.s. Joe, I’ll happily lend you the book if you like. It’s great.)

In reading his post and looking at his research and conclusions I immediately thought of the problem in Bayesian terms. We can ask ourselves the following questions:

  1. Given that a pitcher throws 95 MPH+, what is the probability that pitcher has experienced (or will experience) Tommy John surgery?
  2. Given that a pitcher throws 93-94.9 MPH, what is the probability that pitcher has experienced (or will experience) Tommy John surgery?

The goal here is to say, if all you know about a pitcher is that he throws 95+ MPH regularly, how likely is it that this nameless, faceless, team-less pitcher has had (or will ever have) Tommy John surgery? And how much more likely is he to experience this surgery than if he throws 93-94.9 MPH?

Here are the important numbers that Joe found with his research:

Velo    Pitchers   TJ post    %    TJ any     %
95+       22          5     23%      10     45%
93-94.9   98         13     13%      22     22%


  • Velo is the average four-seam fastball velocity
  • Pitchers is the number of pitchers who throw this hard. Note that Joe researched only starting pitchers here.
  • TJ Post is the number of pitchers who had Tommy John surgery soon after recording the velocity
  • % is TJ Post / Pitchers
  • TJ any is the number of pitchers who had Tommy John surgery either before or after recording the velocity. Joe’s reasoning here is that post-surgery pitchers generally return to throwing as hard as they did before the surgery. So if a pitcher throws 95+ MPH after the surgery, it’s safe to say he threw that hard before it.
  • % (the second one) is TJ any / Pitchers.

Regarding the % columns, you can see how he would conclude that it appears throwing 95+ MPH roughly doubles a pitcher’s chances of undergoing Tommy John surgery.

In my analysis I only concerned myself with the TJ any column. Let’s define our Bayesian variables for the first question:

  • x is the base probability of any pitcher having Tommy John surgery. This, according to another data point in Joe’s newsletter, is .20.
  • y is the probability of experiencing Tommy John surgery, given a pitcher throws 95+ MPH. In this dataset that is .45 (as Joe found).
  • z is the probability of experiencing Tommy John surgery, given a pitcher throws 93 – 94.9 MPH. In this dataset that is .22 (again, as Joe found).

Put it all together with (xy) / ((xy) + z(1-x)) and you have .34. That’s the answer to our first question. For the second question, you run the same math but swap y and z as defined above. You get .11. In each case the probability is slightly lower than what Joe found. But the questions we’re asking differ slightly.

.34/.11 is 3.05. This analysis shows that pitchers who throw 95+ MPH are not twice as likely, but more than three times as likely to experience (or to have experienced) Tommy John surgery as pitchers who average between 93 and 94.9 MPH. Put another way, if all you know about a pitcher is that he throws 95+ MPH and you guess that he’s had (or will have) Tommy John surgery, you will be right three times more often as you would be in making the same guess about a pitcher who throrws 93-94.9 MPH.

Further research is needed, but the numbers above seem like more evidence for a hypothesis that in general, the harder you throw your fastball as a starting pitcher, the higher the probability you’ll have Tommy John surgery. Indeed, the American Sports Medicine Institute recently released a position paper with their findings on the rash of surgeries. Their last reminder was that “Pitchers with high ball velocity are at increased risk of injury.” We’re starting to get an idea of just how much the risk increases.

Thanks again to Joe for inspiring this research!

Tagged ,

Recent Writings: 1-0 Chances, Ubaldo’s First Start

I put up two articles recently at CamdenChat. The first was me doing some Bayesian analysis on what could happen given the O’s record of 1-0 (they are now 2-3). The second is a look at Ubaldo Jimenez’s first start with the O’s.

Tagged , ,


A hale and hearty welcome goes out from Birdland (well, from me, anyway) to the newest Oriole Ubaldo Jimenez, who signed a four-year, $50 million deal (in addition to costing the O’s the #17 pick in the draft) with the team this week. He’ll improve the O’s starting rotation, which until now didn’t really have a fifth starter (Brian Matusz’s hopes and dreams notwithstanding).

Jimenez is good enough to be #3 and (on this staff anyway) maybe even #2, given Wei-Yin Chen’s well-documented problems with working late into games. Bud Norris likely moves into the fifth-starter role, given his struggles against lefties. And although Kevin Gausman is probably kicking the dirt in frustration now, he should take comfort in the fact that he’ll likely be the first man called up from AAA when the rotation needs help.

The O’s did about as good as they could do here. Jimenez isn’t elite, but he doesn’t have to be. He only has to be better than the Jason Hammel / Freddy Garcia / Jake Arrieta horrorshow that combined for 280 innings of 4.47 xFIP ball in 2013. Although many wrote him off due to a poor second half in 2011 and a poor 2012, in 2013 Jimenez showed he can adapt enough to be good, with occasional peaks of very good, when needed. In addition, he just turned 30 and he’s never hit the disabled list. 

With all that said, let’s see what we might expect from Jimenez in an O’s uniform.

Strikeout Rate

Jimenez has a high strikeout rate relative to the Orioles staff. In 2013 he struck out 25% of batters, the 6th-highest rate among the 35 qualified AL starters. This easily would’ve been the highest rate on the team; Chris Tillman had the next-highest K rate at 21.2%.

2013 was easily his career high, so in 2014 I’d expect something closer to his career rate of 21.5%. Still, that’s very good, even more so when you consider how much Oriole Park at Camden Yards promotes the long ball.

Groundball Rate

If you can’t strike out a batter, you’ll want to get a ground ball. Grounders go for hits more often than fly balls do but are far less damaging since they rarely go for extra bases, especially when you have Manny Machado and J.J. Hardy playing defense behind you!

Jimenez’s GB rate last year was 43.9%, which would be the best on the O’s staff in 2014 (Scott Feldman’s GB rate was higher last year, but he’s no longer with the team). His career GB rate is 47.6%, slightly higher than last year’s AL average of 43.5%.

Since he’s changed his pitch mix a few times in his career, I would expect something closer to 2013 than his career rate. So while a 43% GB rate is neither elite nor good, it’s at least average, which is more than you can say for any other O’s starting pitcher.

HR/FB Rate

Another reason I’m hopeful for Jimenez as an Oriole is that he is good at preventing home runs, a skill the O’s sorely lack and whose absence was felt all last year. In 2013 only 9% of his fly balls became home runs. Again this isn’t elite, but it’s not only better than the AL average of 11.2%, it’s better than any other O’s starter managed last year.

The low HR/FB rate looks particularly enticing when you also consider Jimenez’s average ground ball rate. Relative to the other O’s pitchers, Jimenez not only allows fewer fly balls, but fewer of those fly balls become home runs. All good stuff.


Let’s pause here and note that for the Orioles, K rate, GB rate, and HR rate should be the go-to stats when acquiring and developing pitchers. Some media thought that $50m/4 years was too many years for a free-agent pitcher, but teams in offensive parks (Texas, Colorado) should pay a premiun for strikeouts, groundballs, and HR suppression. Those skills matter so much more to them than in teams with dead zone stadiums (San Diego, Oakland). So in this context, I don’t consider the Jimenez signing an overpay.

Walk Rate

Now for the not-so-good: Jimenez walks a lot of batters. The average AL starter walked 8.3% of batters last year, whereas Jimenez walked 10.3%. This would’ve easily been the worst on the O’s staff. So it’s fair to say that Jimenez needs to be better-than-average in getting strikeouts and grounders and limiting home runs. Those tendencies are what keeps him in the majors despite his rather high walk rate. Heck, even in his amazing 2010 season, he walked 10.3% of batters. So clearly he’s learned to survive despite this weakness. But O’s fans will do a lot of swearing at the TV this year.

Strand Rate

Jimenez stranded over 76% of baserunners last year. Those ducks left on the pond never got to score, which kept his ERA low. Don’t look for him to repeat that rate in 2014; 76% is all-time-great territory, and no one thinks Jimenez is an all-time great.

I’d look for his strand rate to fall closer to the AL average of 72.6%. His career strand rate is 71.6%, which is a tick below that average, but this rate includes a poor partial season in 2007, an abysmal 2011 (65% strand rate) and a poor 2012 (68.5% strand rate). I believe Jimenez has adapted such that his strand rate won’t be that low in 2014.

Change in Pitching Style

Stats aside, the biggest reason I’m optimistic about Jimenez is that he’s shown he can adapt his pitching style to atone for his struggles in late 2011 and throughout 2012.

In 2010 hitters slugged just .327 against his fastball. That’s no surprise, as it blew by them at an average of 96 MPH. But in 2011 and 2012, the heat fell to around 93 MPH and the pitch looked much more hittable. 3 MPH in one year is a serious drop, and I suspect that’s why hitters slugged .434 against his fastball in 2011 and an otherworldly .559 against it in 2012. But then suddenly in 2013, his fastball was dynamite; opponents managed just a .266 SLG against it.

Brooksbaseball-Chart (2)

What happened? He didn’t find a fountain of youth; his fastball velocity dipped to just 92 MPH. Instead, Jimenez changed his approach and became a sinker/slider guy. Look at how frequently he throws those two pitches now, relative to his heater:

Ubaldo Jimenez, FB, SNK, SL % Usage 2011-2013

My theory is that in 2013, hitters sat on his fastball but frequently got caught off guard with a sinker or slider instead. This change in usage made them flail at actual fastballs, especially considering his sinker travels at roughly same speed as his four-seamer does.

If this is the case though, Jimenez may need to adapt again as word gets around the league that he throws cheese less often. If batters start out 2014 by waiting on Jimenez’s sinker or slider to dip out of the zone, he’ll have to adapt yet again. But it’s encouraging to me that he’s done so already and is still young and healthy. All told it seems Jimenez is set for a good year in 2014.

 Stats from FanGraphs. Pitch graphics from

Tagged ,

Jackie Robinson in Austin

My SABR buddy Eric recently visited a local middle school to talk about the connection between Jackie Robinson and Austin. Check it out!


Team Similarity Scores, now with Errors and Stolen Bases

I had a lot of fun with my article Comparing Baseball Teams Throughout History; specifically, writing the program to calculate the numbers and then digging through them to see what the results were. I circulated the article to a few friends and the local SABR chapter and got into a discussion about other ways to measure teams’ similarities.

As a result, I updated the algorithm to take stolen bases (as a proxy for speed) and errors (as a proxy for defense) into account. These numbers are readily available in the Lahman database, making it easy to factor them into the score.

I thought I’d post an update here as the new information has drastically altered the comparables for all of the teams I posted about.

2001 Seattle Mariners

Originally the ’67 Tigers came in as the #1 comp to the powerhouse 2001 Mariners. No longer; now, the 2002 Mariners take the top spot with a similarity score of 840. Also, the ’98 Yankees were way down on the similarity list at #7. They’ve jumped four spots to number 3. And whereas the ’94 Astros were the tenth-most similar team, now they are in the third spot.

The full list:

  1. 2002 Seattle Mariners, 840
  2. 1998 New York Yankees, 812
  3. 1994 Houston Astros, 808
  4. 2000 Atlanta Braves, 793
  5. 2009 Minnesota Twins, 791
  6. 2006 Los Angeles Dodgers, 789
  7. 2003 Seattle Mariners, 780
  8. 1995 Cincinnati Reds, 762
  9. 1987 New York Mets, 757
  10. 1987 Milwaukee Brewers, 753

The 1967 Tigers drop all the way down to a score of 726; I’d guess that is 13th or 14th.

2003 Detroit Tigers

In my first article, the 1963 Mets took the top spot with a score of 781. Now, the 2001 Pirates (originally second place) emerge as the most similar team. In fact, whereas the original list had three Mets teams from the ’60s on it, the updated list now has only the 1966 team.

The top ten in full:

  1. 2001 Pittsburgh Pirates, 758
  2. 1962 Chicago Cubs, 716
  3. 2004 Kansas City Royals, 710
  4. 1964 Washington Senators, 689
  5. 1966 New York Mets, 685
  6. 1988 Chicago White Sox, 668
  7. 2002 Tampa Bay Devil Rays, 666
  8. 1969 San Diego Padres, 665
  9. 1969 Cincinnati Reds, 663
  10. 2001 Detroit Tigers, 662

With stolen bases and errors factored in, the similarity between the ’03 Tigers and the ’01 Mariners drops all the way down to 95.

1986 New York Mets

  • 2006 Los Angeles Dodgers, 822
  • 1983 Philadelphia Phillies, 795
  • 1991 Los Angeles Dodgers, 795
  • 1972 Houston Astros, 787
  • 2005 San Diego Padres, 772
  • 1993 Atlanta Braves, 769
  • 1989 Montreal Expos, 768
  • 1994 Los Angeles Dodgers, 751
  • 1965 Detroit Tigers, 748
  • 2000 Atlanta Braves, 740

There’s the 2000 Atlanta Braves again — they were 79.3% similar to the 2001 Mariners.

2008 Tampa Bay Rays

  1. 2011 Tampa Bay Rays, 838
  2. 2009 Tampa Bay Rays, 831
  3. 2009 Colorado Rockies, 809
  4. 2011 Colorado Rockies, 805
  5. 2010 Cincinnati Reds, 804
  6. 2008 Milwaukee Brewers, 793
  7. 2006 Philadelphia Phillies, 792
  8. 2011 Cincinnati Reds, 764
  9. 2010 New York Yankees, 757
  10. 2011 Toronto Blue Jays, 757

(Aside: Notice how the #9 and #10 teams above are actually tied. My program doesn’t do a good job taking that into account yet.)

1975 Cincinnati Reds

  1. 1976 Cincinnati Reds, 769
  2. 1989 Baltimore Orioles, 732
  3. 1957 Chicago White Sox, 691
  4. 1976 Baltimore Orioles, 684
  5. 1979 Cincinnati Reds, 678
  6. 1973 Cincinnati Reds, 667
  7. 1973 Baltimore Orioles, 663
  8. 1975 Baltimore Orioles, 655
  9. 1974 Cincinnati Reds, 641
  10. 1985 California Angels, 631

Ah, love seeing the Orioles so much on this list, even if they are less than 3/4 similar to the powerhouse Big Red Machine.

1994 Montreal Expos

  1. 2002 Boston Red Sox, 777
  2. 1987 Houston Astros, 749
  3. 2006 Los Angeles Angels of Anaheim, 738
  4. 2010 Chicago White Sox, 725
  5. 1986 Houston Astros, 721
  6. 2004 Minnesota Twins, 716
  7. 1994 Cincinnati Reds, 710
  8. 1990 Los Angeles Dodgers, 708
  9. 2011 St. Louis Cardinals, 705
  10. 1988 New York Mets, 704

The ’02 Red Sox remain as the team most similar to the ’94 Expos.

2010 Texas Rangers

Just for giggles – the first Rangers World Series team.

  1. 1995 San Diego Padres, 895
  2. 2006 Arizona Diamondbacks, 806
  3. 1964 Cleveland Indians, 805
  4. 2003 Florida Marlins, 804
  5. 1997 Los Angeles Dodgers, 802
  6. 2009 Houston Astros, 790
  7. 2007 Chicago Cubs, 786
  8. 2009 Oakland Athletics, 785
  9. 2000 Arizona Diamondbacks, 782

The ’64 Indians?! That team finished in sixth place at 79-83. ANd the ’09 Astros finished 5th in their division at 74-88. Hm.

Tagged ,