I put up two articles recently at CamdenChat. The first was me doing some Bayesian analysis on what could happen given the O’s record of 1-0 (they are now 2-3). The second is a look at Ubaldo Jimenez’s first start with the O’s.
A hale and hearty welcome goes out from Birdland (well, from me, anyway) to the newest Oriole Ubaldo Jimenez, who signed a four-year, $50 million deal (in addition to costing the O’s the #17 pick in the draft) with the team this week. He’ll improve the O’s starting rotation, which until now didn’t really have a fifth starter (Brian Matusz’s hopes and dreams notwithstanding).
Jimenez is good enough to be #3 and (on this staff anyway) maybe even #2, given Wei-Yin Chen’s well-documented problems with working late into games. Bud Norris likely moves into the fifth-starter role, given his struggles against lefties. And although Kevin Gausman is probably kicking the dirt in frustration now, he should take comfort in the fact that he’ll likely be the first man called up from AAA when the rotation needs help.
The O’s did about as good as they could do here. Jimenez isn’t elite, but he doesn’t have to be. He only has to be better than the Jason Hammel / Freddy Garcia / Jake Arrieta horrorshow that combined for 280 innings of 4.47 xFIP ball in 2013. Although many wrote him off due to a poor second half in 2011 and a poor 2012, in 2013 Jimenez showed he can adapt enough to be good, with occasional peaks of very good, when needed. In addition, he just turned 30 and he’s never hit the disabled list.
With all that said, let’s see what we might expect from Jimenez in an O’s uniform.
Jimenez has a high strikeout rate relative to the Orioles staff. In 2013 he struck out 25% of batters, the 6th-highest rate among the 35 qualified AL starters. This easily would’ve been the highest rate on the team; Chris Tillman had the next-highest K rate at 21.2%.
2013 was easily his career high, so in 2014 I’d expect something closer to his career rate of 21.5%. Still, that’s very good, even more so when you consider how much Oriole Park at Camden Yards promotes the long ball.
If you can’t strike out a batter, you’ll want to get a ground ball. Grounders go for hits more often than fly balls do but are far less damaging since they rarely go for extra bases, especially when you have Manny Machado and J.J. Hardy playing defense behind you!
Jimenez’s GB rate last year was 43.9%, which would be the best on the O’s staff in 2014 (Scott Feldman’s GB rate was higher last year, but he’s no longer with the team). His career GB rate is 47.6%, slightly higher than last year’s AL average of 43.5%.
Since he’s changed his pitch mix a few times in his career, I would expect something closer to 2013 than his career rate. So while a 43% GB rate is neither elite nor good, it’s at least average, which is more than you can say for any other O’s starting pitcher.
Another reason I’m hopeful for Jimenez as an Oriole is that he is good at preventing home runs, a skill the O’s sorely lack and whose absence was felt all last year. In 2013 only 9% of his fly balls became home runs. Again this isn’t elite, but it’s not only better than the AL average of 11.2%, it’s better than any other O’s starter managed last year.
The low HR/FB rate looks particularly enticing when you also consider Jimenez’s average ground ball rate. Relative to the other O’s pitchers, Jimenez not only allows fewer fly balls, but fewer of those fly balls become home runs. All good stuff.
Let’s pause here and note that for the Orioles, K rate, GB rate, and HR rate should be the go-to stats when acquiring and developing pitchers. Some media thought that $50m/4 years was too many years for a free-agent pitcher, but teams in offensive parks (Texas, Colorado) should pay a premiun for strikeouts, groundballs, and HR suppression. Those skills matter so much more to them than in teams with dead zone stadiums (San Diego, Oakland). So in this context, I don’t consider the Jimenez signing an overpay.
Now for the not-so-good: Jimenez walks a lot of batters. The average AL starter walked 8.3% of batters last year, whereas Jimenez walked 10.3%. This would’ve easily been the worst on the O’s staff. So it’s fair to say that Jimenez needs to be better-than-average in getting strikeouts and grounders and limiting home runs. Those tendencies are what keeps him in the majors despite his rather high walk rate. Heck, even in his amazing 2010 season, he walked 10.3% of batters. So clearly he’s learned to survive despite this weakness. But O’s fans will do a lot of swearing at the TV this year.
Jimenez stranded over 76% of baserunners last year. Those ducks left on the pond never got to score, which kept his ERA low. Don’t look for him to repeat that rate in 2014; 76% is all-time-great territory, and no one thinks Jimenez is an all-time great.
I’d look for his strand rate to fall closer to the AL average of 72.6%. His career strand rate is 71.6%, which is a tick below that average, but this rate includes a poor partial season in 2007, an abysmal 2011 (65% strand rate) and a poor 2012 (68.5% strand rate). I believe Jimenez has adapted such that his strand rate won’t be that low in 2014.
Stats aside, the biggest reason I’m optimistic about Jimenez is that he’s shown he can adapt his pitching style to atone for his struggles in late 2011 and throughout 2012.
In 2010 hitters slugged just .327 against his fastball. That’s no surprise, as it blew by them at an average of 96 MPH. But in 2011 and 2012, the heat fell to around 93 MPH and the pitch looked much more hittable. 3 MPH in one year is a serious drop, and I suspect that’s why hitters slugged .434 against his fastball in 2011 and an otherworldly .559 against it in 2012. But then suddenly in 2013, his fastball was dynamite; opponents managed just a .266 SLG against it.
What happened? He didn’t find a fountain of youth; his fastball velocity dipped to just 92 MPH. Instead, Jimenez changed his approach and became a sinker/slider guy. Look at how frequently he throws those two pitches now, relative to his heater:
My theory is that in 2013, hitters sat on his fastball but frequently got caught off guard with a sinker or slider instead. This change in usage made them flail at actual fastballs, especially considering his sinker travels at roughly same speed as his four-seamer does.
If this is the case though, Jimenez may need to adapt again as word gets around the league that he throws cheese less often. If batters start out 2014 by waiting on Jimenez’s sinker or slider to dip out of the zone, he’ll have to adapt yet again. But it’s encouraging to me that he’s done so already and is still young and healthy. All told it seems Jimenez is set for a good year in 2014.
I had a lot of fun with my article Comparing Baseball Teams Throughout History; specifically, writing the program to calculate the numbers and then digging through them to see what the results were. I circulated the article to a few friends and the local SABR chapter and got into a discussion about other ways to measure teams’ similarities.
As a result, I updated the algorithm to take stolen bases (as a proxy for speed) and errors (as a proxy for defense) into account. These numbers are readily available in the Lahman database, making it easy to factor them into the score.
I thought I’d post an update here as the new information has drastically altered the comparables for all of the teams I posted about.
Originally the ’67 Tigers came in as the #1 comp to the powerhouse 2001 Mariners. No longer; now, the 2002 Mariners take the top spot with a similarity score of 840. Also, the ’98 Yankees were way down on the similarity list at #7. They’ve jumped four spots to number 3. And whereas the ’94 Astros were the tenth-most similar team, now they are in the third spot.
The full list:
The 1967 Tigers drop all the way down to a score of 726; I’d guess that is 13th or 14th.
In my first article, the 1963 Mets took the top spot with a score of 781. Now, the 2001 Pirates (originally second place) emerge as the most similar team. In fact, whereas the original list had three Mets teams from the ’60s on it, the updated list now has only the 1966 team.
The top ten in full:
With stolen bases and errors factored in, the similarity between the ’03 Tigers and the ’01 Mariners drops all the way down to 95.
There’s the 2000 Atlanta Braves again — they were 79.3% similar to the 2001 Mariners.
(Aside: Notice how the #9 and #10 teams above are actually tied. My program doesn’t do a good job taking that into account yet.)
Ah, love seeing the Orioles so much on this list, even if they are less than 3/4 similar to the powerhouse Big Red Machine.
The ’02 Red Sox remain as the team most similar to the ’94 Expos.
Just for giggles – the first Rangers World Series team.
The ’64 Indians?! That team finished in sixth place at 79-83. ANd the ’09 Astros finished 5th in their division at 74-88. Hm.
Update 1/25/2014: This article lays the groundwork for the Team Similarity Scores, but more accurate results are available here.
Many moons ago, Bill James came up with a way of mathematically classifying how similar two players are to one another. But to my knowledge no one has applied this logic to teams. So when Tom Tango asked me why I used the 1974 Orioles to make predictions about future Orioles teams, I asked myself: what teams should I be using as a comparison point? I didn’t quite answer that question (yet) but I did end up with a project that was as fun as that, if not more so.
There are certainly other dimensions on which to compare teams, but I think these aspects are a good start. Note that outside of runs scored & allowed, which measure a team’s talent more accurately and precisely than wins and losses, the other stats I used are the components of FIP.
For an example of how this works, let’s look at some teams!
The ’01 Mariners tied the major-league record for regular-season wins when they went 116-46 to capture the AL West crown. Although they beat the Indians 3-2 in the ALDS, they lost the ALCS in five games to the Yankees (who, I must point out, eventually lost to the Diamondbacks). On offense, the team was led by Bret Boone, who hit .331/.372/.578 and played stellar defense to rack up 7.8 fWAR. It was also the debut of a Japanese player who wore his first name on his jersey: Ichiro Suziki was a force to be reckoned with, hitting .350/.381/.457 and compiling 6 fWAR. Mike Cameron (5.5 fWAR), Edgar Martinez (4.7 fWAR), John Olerud (4.6 fWAR), and David Bell (2.6 fWAR) also contributed excellent seasons.
Pitching-wise, Freddy Garcia gave the team 34 excellent starts. He benefitted from an abnormally low .255 BABIP, but the fact is his FIP was 3.48 that year and he accrued 5.3 fWAR. Arthur Rhodes and Joel Piniero also excelled at run prevention, finishing the season with a 2.14 and 2.86 FIP, respectively.
Which team throughout history was most like this powerhouse?
Enter the 1967 Detroit Tigers, who went 91-71 and finished one game behind the “Impossible Dream” Red Sox in the American League. That year the Tigers received 7 fWAR from right fielder Al Kaline (.308/.411/.541) and 5.8 from catcher Bill Freehan. Dick McAuliffe chipped in for 4.8 fWAR and Norm Cash, at age 32, handled himself nicely with 3.8 fWAR. On the mound, Mickey Lolich had an excellent season (2.65 FIP), as did Joe Sparma and Earl Wilson.
It may not seem like it because of the large disparity in wins, but despite playing over 30 years apart the two teams are extremely similar to each other:
Using the algorithm described above, 1000 – (24 + 4 + 2 + 8 + 18 + 20 + 4 + 10) = 1000 – 90 = 910.
Thus the 1967 Detroit Tigers are 91% similar to the 2001 Seattle Mariners. What other teams are most similar to the ’01 Mariners?
Those are pretty high similarity scores. It’s interesting to me how there is such a wide range of win-loss records here; from 114 to 87. Just goes to show how the relationship between wins and talent isn’t exact, I guess.
Let’s flip things up a little bit and talk about the team that holds the modern record for most losses in a season. The ’03 Tigers went 43-119. Only the ’62 Mets and the 1899 Cleveland Spiders lost more games in a season.
The ’03 Tigers were ‘led’ by Dmitri Young. He posted an excellent 136 wRC+ (.297/.372/.537, 29 HR) but since he was primarily a DH and was awful in the outfield and at the infield corners, he notched only 2.0 fWAR. The other regulars on the team were execrable, barely above replacement level if they were even above it. Collectively the team hit .240/.300/.375 for a wRC+ of 80.
From a pitching standpoint, Nate Cornejo led the staff with 1.6 fWAR and Jeremy Bonderman followed close behind with 1.2. Their respective 4.70 and 4.89 FIPs didn’t portend much talent. Perhaps the brightest spot on the team was a young Fernando Rodney, who notched 0.5 fWAR in just 29.2 innings. His ERA was a ghastly 6.07 but his 3.50 FIP foretold some staying power.
With all that “talent” you can understand why the team scored just 591 runs and allowed 928. Who else in major league history is similar to this dreck?
No team is very similar, but the 1963 New York Mets (record: 51-111) come the closest with a similarity score of 781. Let’s run the numbers:
Rounding out the top 10 teams similar to the 2003 Tigers:
Oh, those poor Mets fans in the ’60s. In fact it’s interesting to me how many of these poor teams come in the ’60s. I wonder if Tigers fans in ’03 knew they were watching a brand of baseball that hadn’t been played, with a couple of exceptions, in 40 years.
Now the kicker: how similar were the ’01 Mariners to the ’03 Tigers? As you can already guess, not very. The similarity score between the two teams is a measly 226.
Here are a few more interesting teams:
1986 New York Mets:
2008 Tampa Bay Rays:
1975 Cincinnati Reds:
1994 Montreal Expos:
… the list goes on! You can see for yourself by downloading the set of top 10 scores here.
As I said above, I realize there are many more dimensions on which teams can be compared. I just started off with direct measurements of simple fields. In the future I’d like to make the following changes:
I’m also certain that the “one point per strikeout/walk/HR” logic needs some more rigor applied, also.
We’ll see how far I get on this, but hey, it’s a start In the meantime, what comparisons would you like to see made? And — are there any teams you’d like to know the similarity score for?
In my previous article, I used Bayes’s theorem to ask the question “if the O’s are at or above .500 on July 1, what is the probability they will finish at or above .500?” I found that if the O’s are winning on July 1, they’ll likely finish the season that way; conversely, if the O’s are losing on July 1, they’ll likely finish the season that way. It’s not the most useful question, since predicting whether the O’s will have a winning or losing season isn’t the thing that most people truly care about, but it was a way for me to practice with this methodology and see if there was merit in it. And, as a fan who recently endured 14 losing seasons, I am allowed so simply hope for a winning season
I sent the article to Tom Tango who provided me with some feedback and another couple avenues of research. Today I want to follow one of those avenue by examining the following class of question:
If, after n games, the Orioles have won m games, what is the probability they will win at least 94 games in the season?
I’m not finished with this research project yet, but I want to share what I have done so far.
To do this study, first I converted all seasons to 162-game equivalents, since that’s what we care about today. The (current) O’s have been around since 1954, when the season was 154 games, plus they played in a few strike-shortened seasons (1972, 1981, 1994, 1995) and a few years they played only 161 games.
Then for each season, first I assessed whether they won at least 94 games that season. Easy enough Then I examined mid-season records for the Orioles at six points: after 27 games, 54 games, 81 games, 108 games, 135 games, and (of course) 162 games. These points each correspond to one-sixth of each season and fall roughly on the end-of-month boundary marks for April, May, June, July, August, and (of course) September. (For shorter seasons, I adjusted the measuring points accordingly.)
At each point I noted how many wins the Orioles had. Then I computed the following variables:
Then I used Bayes’s theorem to calculate the posterior probability for each n game / m win combination. Thank goodness for spreadsheets.
For what it’s worth, I chose to examine the 94-win threshold because that’s the average amount of games the AL Wild Card winners (or, in the past two years, Coin Flip Game participants) have won (again accounting for the strike-shortened season of 1995). If you win that many games you’ve had a damn fine season and have a good shot at making the playoffs. So the 94-win proxy is really a threshold for “making the playoffs” via any fashion, without accounting for the fact that you have to actually win the Coin Flip Game in order to advance.
To start, let’s look at the results from my measurements at the 27-game mark:
You say you’ve started the season 10-17? Well fear not, you still have a non-zero shot at 94 wins, as the 1982 Orioles prove to you They ended up finishing 94-68. It can happen to you, too!
Other than that little surprise, it makes sense that if you start the season under .500, your probability of winning 94+ games is fairly low; the further you get over .500, your probability obviously rises.
There are some other surprises, like the probability at 16 games being over .50 but 0 at 17 games, and the probability being 1 at 19 games. These are symptomatic of problems in the dataset & methodology that I will discuss below.
There are two main problems with the results:
Anyway, it doesn’t make sense that you can win 17 games and yet have a lower probability of winning 94 than you’d have if you’d won 16 games. I would believe that only if no team had ever done it, and I don’t have the data to prove that.
I have a couple thoughts on how to rectify these issues:
Despite the research needed, I thought including data from the other five measuring points would be instructive.
After 108 games, if you have not won at least 40 games, winning 94 will be impossible.
After 135 games, if you haven’t won at least 67 games, winning 94 will be impossible.
Before I go, what happens if we lower the threshold to 88+ games?
Here we see that winning 88+ games is easier than winning 94+ games (betcha didn’t need a long blog post to tell you that!). And, although I just got through saying that I need to do more research, it seems there’s a chance that the O’s record after the 27th game can tell us a something useful about how they will fare in the rest of the season.
More to come!