Rogue Statisticians
Even a non-fan like myself is aware that Roger Clemens is currently testifying before congress, responding to the Mitchell Report’s allegation of steroid use. His decision to testify is a bold move to be sure.
To aid in his defense, the Clemens team has assembled the The Clemens Report, which—surprise surprise— concludes that Clemens late-career performance is not out of the ordinary. In particular, they examine his ERA over the course of his entire career, and conclude that it’s behavior is comparable to that of other great pitchers, Nolan Ryan for example.
I have not read the entire 45 page report, but I did catch this article in yesterday’s New York Times which calls the Clemens Report’s findings into questions. I’ve also read this blog post, written by one of the articles four authors, all of whom are professors with a serious background in statistics. As much as I love the idea of four numerically-inclined professors getting together, crunching some numbers, and submitting their analysis to the New York Times, I am unconvinced.
Courtesy of the Times, here is Clemens ERA over time:

The professors argue that ERA (earned run average) is not a particularly reliable statistic. For one, it depends on a lot of external factors, like the quality of a teams defense. They argue that WHIP (Walks plus hits per inning pitched) is more reliable. Here is Clemens WHIP data, which they fit with a smooth curve:

I’m no statistics professor, but the WHIP data didn’t look all that reliable either. It turns out, however, that if you take both sets of data and divide each by its mean (so they are scaled equally), the ERA data has five times more variance. In this sense, the WHIP data varies less from year to year, so it probably is a better choice than ERA when looking for a long term trend.
Still, look at the red curve, its a parabola fit to the data using least squares (meaning the parabola is chosen so as to minimize the average square of the distance between the curve and each point). The professors claim that this parabola indicates a sudden increase in Clemens performance half way through his career (for both ERA and WHIP, lower numbers are better). Furthermore, they compare the curve with that of other pitchers, and conclude that its hump shape is unusual.
My problem with their approach is that it is appears to be very sensitive to outliers. Clemens’ two highest WHIPs appear to have a very significant influence on the shape of the curve. To prove this point, I generated the same parabola using MATLAB, then I generated a parabola with just the largest outlier removed. I also generate a third parabola with just the second largest outlier removed (coincidentally this outlier came from a season when Clemens played only 23 games). In this third curve in particular, the alleged improvement in Clemens’ career all but disappears.

Obviously my approach of selectively removing data points can be called into question, but removing outliers is often standard practice when fitting noisy data. Regardless, the fact that their results heavily rely on only a single year’s WHIP is enough to make me skeptical. The Clemens team has also provided a more critical response.
Anyway, I don’t want to come across as overly negative of the professors work. I like that they took an interest in the Clemens’ report. Their challenge to the reports seemingly selective use ERA statistics seems reasonable, but in the end, their conclusion appears equally tenuous. If we were to apply their methods to many other different pitchers, using a variety of metrics, I wonder how many great pitchers would have suspicious looking careers. I doubt only Clemens. I’m also genuinely surprised that four professors decided to announce their statistical findings directly in the New York Times sports section. It’s nice to know this is an option should I ever become a professor. For now though, I remain a lowly PhD student with a not particularly well-read blog.
February 14th, 2008 at 10:23 am
Sorry, the quadratic doesn’t cut it in the era. WHIP is obviously the same. At least three phases, which shoots down the theory that statistics support a simple drug use pattern.
http://img197.imagevenue.com/img.php?image=12578_clemens_era_122_527lo.jpg
February 14th, 2008 at 3:42 pm
Drew, that’s a good point. I only used a quadratic because that’s what the professors used, but the choice clearly biases the curve to divide a pitcher’s career into “two phases”. Looking at the actual data, and the image you posted, a higher degree polynomial is probably more appropriate. Unfortunately, to get the three phase behavior you’ve suggested, a polynomial of at least degree six is required. I just tried this, but it didn’t reveal much. It was basically overfitting the data.