Processing some Lessons Learned

I recently posted an article on Jonathan Schoop’s poor season at the plate and got ripped a new one in the comments. I stand by the overall information presented in the article; I don’t think my analysis was inaccurate or unfair. But I was disappointed by the negative tone of most of the feedback. And to make matters worse, Schoop hit a game-tying home run that night against Dellin Betances. Okay, that made matters better :-)

I tried to calm down and think things through. I think I made a few mistakes:

  • I hadn’t written an analytical piece in a few months (owing to other time commitments) so I was itching to write something and get it out there ASAP.
  • I started the article with the intent to explore why Schoop was surprisingly low on the RE24 leaderboards, I think at the time he was second-to-last among players with at least 300 PA. He’s not doing well at the plate, but I originally wanted to focus just on his context-dependent numbers and explore those a bit, while making it clear that I’m not judging him overall on those.But I got lazy and, when I noticed that his overall numbers were pretty bad, ended up making the piece focus on Schoop’s overall line, which ended up seeming like a hatchet job instead of an exploration of why his RE24 numbers are so low. The RE24/WPA part ended up being a small paragraph in a sea of words.

    It wasn’t a conscious decision on my part but, like I said above, I was in a hurry to write an article after a minor hiatus.

  • I didn’t give enough weight, in terms of word count, to the idea that I understand why the Orioles are playing Schoop (they have no significantly better alternatives, and he is a prospect so he’s got to get some playing time).
  • I realized I was using the word “bad” a lot and so searched for synonyms so I wasn’t repeating myself. I ended up using adjectives like “pitiful”, “awful”, and “terrible” which probably made the article sound more nasty than I intended.
  • I compared his batting line to two pitchers. That was unfair :-)
  • I didn’t pay enough attention to the fact that the season isn’t over yet and he’ll have a chance to boost his numbers, particularly his fWAR.
  • I finished up the article in a tired state, after I’d been up late a few nights in a row, and when I had a bad case of “computer vision syndrome” and really just wanted to go to bed. It was in this state that I touched up a few sentences and added the closing paragraph or two.
  • I posted the article the day after Manny sprained his knee. Adding more negativity to a frustrating time was probably not a good idea.

Overall the whole thing came off as “Schoop is awful” when in reality I wanted it to be “Schoop has hurt the team more than you’d expect for someone with even his batting line, but we shouldn’t judge him on the context stats.” That’s a more nuanced point that required more research; I didn’t put in the effort to do either.

Just a drop in the bucket in terms of life importance, but it helps me to think through these things.

A few SABR 44 pics

Jimmy Wynn and other Houston Colt .45 members signing autographs after their panel.

Bob Aspromonte signing a baseball.

Hornsby chapter members posing for a quick photo.

Mike Vance, Banks-Bragan chapter chair C. Paul Rogers, (unidentified), and Dierker chapter chair Bob Dorrill.

Hornsby chapter member Eric Robinson giving a talk on the brief history of the Houston Eagles.

Hornsby+1, who placed second in the team trivia contest. From L to R: Bill Gilbert, Jan Larson, David Kaiser, and Tom Thayer.

Quick Reflections on SABR 44

I just returned from Houston where I attended the 44th annual Society for American Baseball Research (SABR) convention. It was my first national conference, and I had a great time! I attended a lot of really interesting presentations, met some people whose work I admire, and saw an awesome ballgame.

One of my favorite presentations was where Dave Smith of Retrosheet investigated why the home team tends to score more runs in the first inning than the visiting team does. Along the way, he investigated lineup construction and when your number 9 hitter is better than your number 7 hitter.

Another presentation I liked was Michael Haupert’s talk about William Hulbert and the founding of (what is essentially) professional baseball, and professional sports, as we know it today. And of course I can’t forget my friend Eric Robinson’s presentation, which focused on the terrible time the Eagles (a Negro League team) had when they moved from Newark to Houston. I am not a historian but I found these talks really interesting.

In addition to the talks, the event featured a number of interesting panels. This being Houston, many formers Astros were present. Jimmy Wynn, Bob Aspromonte, Hal Smith, and Carl Warwick represented the Colt .45s. Enos Cabell, Jose Cruz, and Deacon Jones (with Tal Smith also present) represented the 1980 pennant-winning Astros team, the first in francise history to go to the playoffs. And Larry Dierker, Alan Ashby, and Art Howe represented as well. (Many of the audience questions Howe got were related to his portrayal in Moneyball by the late Phillip Seymour-Hoffman.)

These guys talked a lot about their playing time and shared various stories. I appreciated the historical impact, but they weren’t super meaningful to me personally, having been either not alive or not cognizant of baseball when these guys were playing. (I realize I wasn’t the target audience for these particular panels.) But hearing what goes into constructing and playing for a pennant-winning team was interesting. And of course it’s always fun to just hear war stories straight from the horses’ mouths, so to speak. Cabell in particular was hilarious when talking about how manager Bill Virdon would discipline players for not running to first base every time.

I did enjoy the panel with Bob Watson, Eddie Robinson, and Dr. Bobby Brown. Robinson and Brown shared some entertaining stories about Yogi Berra, Joe DiMaggio, Marilyn Monroe, and the like. You won’t get to hear stuff like this for much longer, folks. I tried to take it all in as much as I could.

There was also a pretty interesting panel with Astros GM Jeff Luhnow, assistant GM David Stearns, and Director of Decision Sciences Sig Mejdal. They didn’t reveal any team secrets, but they did continue to explain how they set up their organization, how running a team has changed just in the past 10 years, and the kinds of data they are interested in.

They seemed to try to put a human face on the organization by talking about the human element of what they are doing, and I wonder if that’s because there has been so much bad press about them recently. The team has been accused of running science experiments on its players, trade information was leaked from their organization, and they had an acrimonious fallout with 1-1 pick Brady Aiken a couple of weeks ago. That, plus the recent SI cover story that celebrated their data-driven ways, has probably combined to make the organization appear too robotic, too heartless, and too unfeeling.

And then there was Saturday night’s 8-2 win over the Blue Jays. I was rooting hard for the Astros, since the Blue Jays are chasing my Orioles in the AL East, and I wasn’t disappointed. The game featured several home runs, including an inside-the-park job by Jon Singleton (that was awarded when replay overturned the “out” call) and a mammoth dinger by Chris Carter. It featured an almost-home-run when Rob Grossman leapt and robbed Juan Francisco of a game-tying two-run shot. It featured speedy Jose Altuve scoring the go-ahead run from first on two wild throws by Blue Jays defenders. And if that wasn’t enough, it featured the MLB debut of Mike Foltynewicz in which he struck out Jose Bautista. I kept joking with other SABR members that if only we’d seen an unassisted triple play, we would’ve had baseball bingo.

Whew. In addition to all of that, I met a ton of nice people, including several whose blogs I follow or whose work I have enjoyed. I attended several committee meetings and was encouraged by all the talk and ideas I heard. And I thoroughly enjoyed a rousing trivia competition (as a spectator, mind you).

My biggest takeaway is that this is just a group of people who love to get together and talk about baseball, whether it’s the 1910 Triple Crown Race or the current exploits of Mike Trout. It took me a while to realize that though, because despite hanging out with the local chapter often, I’m not fully used to being around such a group. The first day I was there I didn’t really know what to do or how to fit in. I felt like I wanted to contribute and that I wanted to shout my baseball interest from the top of the rafters. I was meeting not only new people, which is not something I am terribly comfortable with, but also people whose work I’ve enjoyed and whose name I’d recognize anywhere, which gave me that giggling-fan feeling.

It left me kind of jittery. But by the second day I settled down and relaxed. It helped to have members from my local chapter, people whom I already knew and was friendly with, present and grinning. But mainly it helped to just sink into all the talk, to be present for the discussions, and pay attention to the research. Everyone I met was also extremely pleasant and welcoming. I ended up making several new friends and getting some great information about the kind of baseball work I enjoy doing.

There’s a lot of good stuff going on in baseball, both the game and on the sidelines with this research, so to speak. There are innumerable facets of the game to research, analyze, consider, and debate. And there are a lot of smart and passionate fans around to discuss these things. In addition to having so much fun at the conference, I feel I’ve only begun to scratch the surface of what this community can offer and what I can contribute to it. SABR 45 is in Chicago and I plan on doing all I can to attend!

 

Tagged

Fooling Hitters, 2014

Over at Camden Chat I graphed this year’s starters on how much they got hitters to chase pitches out of the zone, with a focus on Sir Walks-a-Lot Ubaldo Jimenez.

I had a lot of fun planning and writing this article; I’m happy it was so well-received!

Reflections on Learning about Sabermetrics

This week is the third week of the Sabermetrics 101: Introduction to Baseball Analytics course at edX. So far I’m really enjoying it. The course is about being a data scientist as that discipline pertains to baseball. They talk about how being a data scientist is about the convergence of three areas: domain expertise (knowledge of the subject matter you are analyzing), computer skills & hacking, and math & statistical knowledge. (The actual slide presented is this one, developed by Drew Conway.) I’m enjoying learning SQL, why certain metrics are better than others, and in general how to approach baseball analysis scientifically.

I knew a small bit about sabermetrics going into the course. Sites like FanGraphs are invaluable for not only providing lots of data, but leading you to data that are meaningful by highlighting certain ones in their blog posts. Because of these and other sites I have gone from using OPS+ to wOBA/wRC+ as my go-to hitting metrics. I have seen the importance of walk rate, strikeout rate, and home run to fly ball ratio for both pitchers and hitters (but primarily pitchers). I’ve had experience using BrooksBaseball.net to see (literally, using the wonderful graphs on the site) how a pitcher’s pitch selection has changed over time as they age and how they attack certain hitters over time. I’ve seen how BABIP is deployed and used in analysis and am starting to get a sense of its limitations. I have started to understand concepts like correlation/causation, linear weights, regression to the mean, win probability, leverage, and others that help me sift through the noise and find a signal, to coin a phrase from Nate Silver. Speaking of Silver, it’s through his book that I came to find Bayesian inference as a tool that I both understand and find useful.

I’ve even had a chance to develop some Python skills. I am not a programmer but I play one on TV, having taken some classes in high school/college and worked in technical writing for software engineering companies for 10 years now. I’ve also been lucky to have the chance to put a lot of this research/knowledge in writing.

So when I came across the Sabermetrics 101 course I was intrigued. My scattershot approach was good from an autodidactic point of view but the straight line of a class is appealing to me. I came to the class hoping to get a foundational knowledge of the principles and concepts used in sabermetrics so I could really understand why they are used and not just that they are used. I feel this knowledge will help inform my writing, not only helping me write about interesting/informative things but also making sure I write about them accurately and precisely.

I also think that the principles of sabermetrics are applicable in other areas, which is another reason I a pretty gung-ho about getting deep into the class. The whole idea of using objective analysis to find out what metric(s) are important, the situations in which they are important or can be de-emphasized, and what factors contribute most to these metrics is fascinating to me. So is learning to become a better writer. And so is learning to become a more informed baseball fan, which helps me enjoy the game. An analogy: when I was a techno DJ, many people asked me if knowing how to spin records made it harder to listen to other DJs, since I could pick up on their mistakes. I always thought this was true, but I preferred to focus on the fact that I could also appreciate when another DJ was doing well and explain that to others (should they be interested in it).

I feel the same way about learning sabermetrics … I can now feel the impact of seeing someone’s wRC+ compared to another player’s, because I know what that means. This information informs the way I watch games and, as a big Orioles fan, how I feel when something happens. Like when Nelson Cruz faces a left-handed pitcher this year, I get excited, because he is destroying them in 2014, and I can help quantify by how much he is tearing them apart. I can also understand just how meaningful it is when Adam Jones walks because, well, he hardly ever does that. I can also do things like estimate how much he is impacting the team by not walking more often.

I guess it all comes down to mastery of skills, a desire that everybody has. Who doesn’t enjoy mastering a skill?

Speaking of which, as I talk more and more with friends and acquaintances about learning sabermetrics, they question they often ask me is — what will I do with this skillset? Most bring up gambling, but am not a risk-taker by nature (especially not with money) so that would be tough for me to swallow. I don’t have a desire to work in an major-league front office but I’d be curious what that’s like. At this point I’d say, I want to use sabermetrics along with writing to learn how to explain the game better to others, to help them feel what I feel when I look at Chris Davis’s heat map, to help them understand what it means when Ubaldo Jimenez relies on his fastball instead of his sinker. I’m currently doing this at CamdenChat, which is awesome because the site combines my fan interest in the Orioles with writing and sabermetrics.

Someday I’d love to get a job at FanGraphs, The Hardball Times, and/or another site/publication, or start my own site. Who knows. Until I figure that out, I have Module 3 of the course to complete this week!

Demonic Velocity

In 2014 it seems like every young promising pitcher is going under the knife for Tommy John surgery. Baseball fans, teams, staff, writers, and analysts have written a lot of digital ink about what the causes of this seeming epidemic may be.

One writer, Joe Sheehan, piqued my interest in the May 28th edition of his newsletter. In it he explores the odds that velocity is a strong predictor of whether a pitcher has had or will have Tommy John surgery. He wonders in particular whether starting pitchers whose fastballs average 95+ MPH are at an increased risk vs. their counterparts. Hence his use of the word “demon” to describe the phenomenon:

I’m thinking of “The Right Stuff” here. The movie — it’s one of the few Tom Wolfe books I haven’t read — opens with a narration about “the demon” that lives at Mach 1, the speed of sound. Maybe, and I am spitballing here, 95 mph is where the demon lives when it comes to elbows.

(p.s. Joe, I’ll happily lend you the book if you like. It’s great.)

In reading his post and looking at his research and conclusions I immediately thought of the problem in Bayesian terms. We can ask ourselves the following questions:

  1. Given that a pitcher throws 95 MPH+, what is the probability that pitcher has experienced (or will experience) Tommy John surgery?
  2. Given that a pitcher throws 93-94.9 MPH, what is the probability that pitcher has experienced (or will experience) Tommy John surgery?

The goal here is to say, if all you know about a pitcher is that he throws 95+ MPH regularly, how likely is it that this nameless, faceless, team-less pitcher has had (or will ever have) Tommy John surgery? And how much more likely is he to experience this surgery than if he throws 93-94.9 MPH?

Here are the important numbers that Joe found with his research:

Velo    Pitchers   TJ post    %    TJ any     %
95+       22          5     23%      10     45%
93-94.9   98         13     13%      22     22%

Where:

  • Velo is the average four-seam fastball velocity
  • Pitchers is the number of pitchers who throw this hard. Note that Joe researched only starting pitchers here.
  • TJ Post is the number of pitchers who had Tommy John surgery soon after recording the velocity
  • % is TJ Post / Pitchers
  • TJ any is the number of pitchers who had Tommy John surgery either before or after recording the velocity. Joe’s reasoning here is that post-surgery pitchers generally return to throwing as hard as they did before the surgery. So if a pitcher throws 95+ MPH after the surgery, it’s safe to say he threw that hard before it.
  • % (the second one) is TJ any / Pitchers.

Regarding the % columns, you can see how he would conclude that it appears throwing 95+ MPH roughly doubles a pitcher’s chances of undergoing Tommy John surgery.

In my analysis I only concerned myself with the TJ any column. Let’s define our Bayesian variables for the first question:

  • x is the base probability of any pitcher having Tommy John surgery. This, according to another data point in Joe’s newsletter, is .20.
  • y is the probability of experiencing Tommy John surgery, given a pitcher throws 95+ MPH. In this dataset that is .45 (as Joe found).
  • z is the probability of experiencing Tommy John surgery, given a pitcher throws 93 – 94.9 MPH. In this dataset that is .22 (again, as Joe found).

Put it all together with (xy) / ((xy) + z(1-x)) and you have .34. That’s the answer to our first question. For the second question, you run the same math but swap y and z as defined above. You get .11. In each case the probability is slightly lower than what Joe found. But the questions we’re asking differ slightly.

.34/.11 is 3.05. This analysis shows that pitchers who throw 95+ MPH are not twice as likely, but more than three times as likely to experience (or to have experienced) Tommy John surgery as pitchers who average between 93 and 94.9 MPH. Put another way, if all you know about a pitcher is that he throws 95+ MPH and you guess that he’s had (or will have) Tommy John surgery, you will be right three times more often as you would be in making the same guess about a pitcher who throrws 93-94.9 MPH.

Further research is needed, but the numbers above seem like more evidence for a hypothesis that in general, the harder you throw your fastball as a starting pitcher, the higher the probability you’ll have Tommy John surgery. Indeed, the American Sports Medicine Institute recently released a position paper with their findings on the rash of surgeries. Their last reminder was that “Pitchers with high ball velocity are at increased risk of injury.” We’re starting to get an idea of just how much the risk increases.

Thanks again to Joe for inspiring this research!

Tagged ,

Recent Writings: 1-0 Chances, Ubaldo’s First Start

I put up two articles recently at CamdenChat. The first was me doing some Bayesian analysis on what could happen given the O’s record of 1-0 (they are now 2-3). The second is a look at Ubaldo Jimenez’s first start with the O’s.

Tagged , ,