Monday, March 5, 2012

College Baseball Minor League Equivalencies

The following is a paper I wrote in college about college baseball MLEs. Please note that this data was collected before the NCAA bat changes took place.


In 1985 the father of sabermetrics and current Red Sox advisor, Bill James, introduced a new concept in his annual Baseball Abstract book.  What James invented was a new way to measure and evaluate minor league players’ performances.  He came up with the idea of Major League Equivalencies (MLEs) in which he adjusts for a number of external factors such as run environment, park factors, and level of competition.  For example, James found that on average a player loses about 18 percent of his offensive production when moving from Triple-A to the Major Leagues.  These translations are not predictions of what the player will do in the Major Leagues but rather an indicator of what he has done and how the player performed.   Bill James believed this to be some of his most important and influential research breakthroughs.[1]  Having reliable translations of minor league performance may change how front office decision makers decide whom to acquire or promote.  Since James first wrote about MLEs many others have worked to duplicate and improve his work.  There are now translations for Japanese leagues and all levels of the minor leagues.  There are also some translations for pitchers, but these results are prone to much greater error than hitting stats.
The objective of this research was to take Major League Equivalencies to the next level.  Specifically, to introduce NCAA Division 1 college statistics and determine minor league translations to figure which statistics correlate and translate best into the professional ranks.  There is simply too much noise and variation for one to translate college stats all the way to the major league level.  However, once the numbers are translated into the minor league levels it is conceivable that these numbers could then be adjusted using MLEs.  I set out to translate college stats into Rookie level equivalencies and Low-A level equivalencies because these are the two levels most college draft picks end up playing their first year. 
The particular statistics I was interested in observing were walk rate (BB%), strikeout rate (K%), isolated power (ISO) and weighted on-base average (wOBA).  A player’s walk rate is simply his total number of walks divided by his plate appearances.  The strikeout rate is the total amount of strikeouts divided by at-bats.  Isolated power is measured by a player’s slugging percentage subtracted by his batting average.  It is a simple yet effective way to capture a player’s true power.  Weighted on-base average is a linear weights statistic designed to measure total offensive performance[2].  It is weighted to an on-base percentage scale, which means at the professional levels .330 is around the average.  I used the formula (slugging percentage plus 1.75 times on-base percentage) divided by three to estimate wOBA.  
The website www.boydsworld.com has an in-depth database of college baseball statistics which I used to obtain the college numbers.  The first step was to apply park and strength of schedule factors for all players.  To do this, I used a methodology used by writer and researcher Kent Bonham in a series of online articles he wrote.  To apply the park factor I multiplied the three-year weighted Park Factor for each team by the square root of 100 divided by the PF.  Then to apply the strength of schedule rating I multiplied the SOS-number for every team by the square root of the SOS divided by 100.  Basically a Park Factor over 100 implies the team played in favorable hitting environments while a PF under 100 suggests the opposite. Likewise, a strength of schedule rating over 100 indicates the team faced an above-average level of difficulty and under 100 means a weaker schedule was played.  These neutralizing
  I then took all college position players drafted in the first 30 rounds of the 2009, 2008, and 2007 drafts who acquired at least 50 plate appearances in Rookie level or Low-A and college the same year they were drafted.  I decided to use only same-year statistics because this would eliminate noise and other variables such as player improvement in skill, strength, or age.  The next step was to equalize the plate appearances since players rarely accumulate the exact same amount of plate appearances in college and the minor leagues the same year they were drafted.  To do this one must create a “plate appearance factor” to multiply for each player.  I divided each player’s plate appearances in college and the minors for that season, using the factor to weight all statistics according to the lesser amount of plate appearances.  So if a player had 125 plate appearances in college but only 100 in the minors that same here I would weight all of this college stats by .8 (100 divided by 125) so that all things were held equal.  This ensured that the total plate appearances would be the same for sample groups.
My rookie-level sample consisted of 222 players and the Low-A sample had 324 samples.  With sample sizes this large I felt good enough to move forward with the research.  I had originally intended to use five years of draft data but I noticed that with three years the numbers had already begun to flatten out and I did not think any more data would be necessary.
To create these factors I first had to sum up all statistics.  These statistics included plate appearances, at bats, hits, singles, doubles, triples, home runs, total bases, slugging percentage, walks, hit by pitch, strikeouts, ground into double plays, on-base percentage, sacrifice flies, sacrifice hits, stolen bases, stolen base success rate, batting average on balls in play, isolated power, walk percentage, strikeout percentage, walk to strikeout rate, and weighted on-base average.  This was done with four different data sets; rookie level stats, NCAA-rookie stats, Low-A stats, and NCAA Low-A stats.  The totals were added up for each respective draft class and then the minor league numbers were divided by the collegiate numbers to create our factors for both the Rookie level and Low-Single A level (please see attached spreadsheets for specific results).
The factors are for a typical player and are not representative of each individual’s skill.  Naturally some players will over perform and underperform the factors but they are designed as a guideline for the average division-1 collegiate player’s transition into the minor leagues.  These factors intuitively make sense, Rookie level players lose less of their offensive value going from NCAA Division-1 than Low-Single A players because the competition increases at each level of the minor leagues and Rookie leagues are the lowest rung of the minors.  According to my factors, Rookie level players lose roughly 16 percent of their offensive value (wOBA factor of .842) while Low-A players productions are reduced by roughly 27 percent (wOBA factor of .727).  Walk rates remain fairly consistent at each level with a Rookie factor of .879 and a Low-A factor of .866.  Strikeouts rates are increased at both levels by factors of 1.313 and 1.378 respectively.  One interesting observation was the dramatic decrease of power experienced as a player transitions from college to the minor leagues.  A typical player will lose roughly 33 percent of his isolated power jumping to the rookie leagues and about 45 percent going from the NCAA to Low-A.
In order to test my results I ran a regression of some of my predicted statistics against the actual minor league numbers.  I decided to run regressions for the key statistics I mentioned on page two at the Low-A level.  To do this I multiplied each of the sample’s college statistics by the respective factor.  This gave me my x-variable, the predicted results.  I than ran excel regressions against their actual minor league statistics for that season (please see attached for excel regressions).  The results were encouraging.  For strikeouts rates I got an equation actual K%= 15.248 + .389(predicted K%).  The t-statistic was significant (8.00) and the p-value was below .05 so the result is statistically significant.  The regression equation for walk rates was actual BB%= 5.22 + .489(predicted).  The T-stat was over seven and the p-value was below .05 indicating the result was significant.  For ISO the equation was actual= .057+ .516(predicted) and again the values were highly significant.  The equation for wOBA was .212 + .200(predicted) and the results were significant (t-stat of 2.6 and p-value of .009).  I have also included charts of the actual versus predicted statistics.
There are some causes for concern in this study.  First off, there is selection bias in separating the samples into Rookie and Low-A pools.  It is often the case that better players are drafted and placed in higher levels (usually Low-A, but sometimes Single-A) while weaker picks might go straight to the rookie leagues.  The numbers seem to support this argument as rookie players posted an average of a .420 weighted on-base average in college while Low-A players averaged 35 points higher for a cumulative average of .455.  This would indicate a gap in talent level between the two groups and may skew the numbers accordingly.  Also, by using 30 rounds of draft picks we see wildly dramatic results in the minor league levels.  Typically, higher draft picks will outperform others especially at the lower levels of the minor league system because they are more advanced and more talented than their counterparts.  Using 30 rounds as a baseline may be too wide a gap and a further study may reduce that number.  There also could be a problem with using 50 plate appearances as a baseline.  Baseball statistics accumulated in only 50 plate appearances are not incredibly reliable and subject to extremely large confidence intervals because the sample is rather small.  In the future it may be beneficial to increase the minimum amount.  Also, some of the NCAA data was missing some entries.  Not every school tallies sacrifice hits and sac flies or stolen base attempts which may alter the numbers somewhat.
Something that could be interesting for a future study is to include collegiate pitchers.  Pitching is, by nature, subject to more variability to due injuries and other uncertainties and this may be difficult for a researcher to overcome but I think I’ve laid out a baseline for which one to follow.  Similar methods could be applied to pitchers to create factors as well.










[1] http://baseballanalysts.com/archives/2004/11/abstracts_from_20.php
[2] http://www.insidethebook.com/woba.shtml

Monday, October 25, 2010

College Baseball Equivalencies

This is a project I've been working on and off for several months now. I plan to delve into these results much more in-depth and expand the sample size but I decided to post my preliminary results here for everyone to see.

Basically what I did was take the top 75 offensive college players who were drafted and played rookie ball in 2009. I weighted their stats by the lowest PA total (in college or in RK) for all players. Once the PA were equal I compared all the college numbers to the RK league numbers and created factors, which we will call Rookie League Equivalencies (RKLE).

The factors can be applied to a player's adjusted college stats to give a rough projection of how he will perform in Rookie-ball.

The factors are as follows:
.HR- .48
2b- .80
3b- .89
BB%- .88
HBP- .73
K%- 1.32
sb- .93
ISO- .59
wOBA- .85

The key stats I put in bold. As I had hypothesized walks carry over pretty highly. Strikeouts are likely to increase, so a player who strikes out a lot in college will strike out even more in the low levels of the minors.

Obviously one season of players is not a huge sample size and I will ideally add to this database. I also plan to expand to Low-A and other levels if there are large enough samples

Monday, June 7, 2010

Red Sox select Kolbrin Vitek

The Boston Red Sox took Ball State's Kolbrin Vitek with the 20th overall pick in the 2010 MLB first year player draft.

Vitek is one of the more highly regarded college bats in this year's draft class and there were strong rumors suggesting the Padres were very interested in him with the 9th pick.

Vitek posted an impressive adjusted-wOBA of .476. He had 33 walks and 36 strikeouts and has solid bat speed and a short compact stroke. His speed is above average as is his arm strength. He was also one of the top pitchers for Ball State this season.

The concern about Vitek is his defense. He played 2nd this season but doesn't profile very well there. He may be best suited for third or a corner outfield position.

Thursday, June 3, 2010

Felix Doubront Scouting Report

I finally got the chance to see Boston prospect Felix Doubront throw last night versus the Charlotte Knights.

Doubront has a deceptive over-hand delivery. He has a three-pitch mix that includes a fastball in the 91-93 mph range. The pitch runs in on left-handed batters and he broke three bats in his 5.1 innings that I saw. His change-up is above-average with a chance to be a good major league pitch. He throws is in the low-80s and gets good downward action. He also flashes a high-70s curve. This pitch still needs work but could become average.

Doubront was dominant last night. He struck out 3 and walked 1 while scattering 2 hits over 5.1 innings. He recorded 9 groundballs (vs 5 flies) and really only allowed three hard hit balls. I came away very impressed with his stuff. I can see him developing into a #3 starter down the road.

Wednesday, May 26, 2010

MLB Draft | WAR Values


Here is a visual representation of MLB draft picks by WAR (over first 6 seasons). You can see a considerable downward trend beginning around pick 10. Nothing ground breaking here but something nice to look at.

Sunday, May 16, 2010

MLB draft scouting reports

I attended several Boston College baseball games this weekend against Florida State and Connecticut. The Seminoles are traditionally a baseball powerhouse and little has changed this season. UCONN has assembled the premier team in the Big East and they have several notable prospects of their own on the roster. Here is a quick rundown of some of the prospects I watched play.

Tyler Holt- Jr, OF (Fl St)

Holt has a small, compact frame. He hits out of a crouched stance and triggers his swing with a pretty big leg kick. His approach at the plate is highly advanced. He works deep into the count and has excellent bat control. This year he has 44 walks and 34 strikeouts. He is slugging .658 with 11 home runs on the year but is more of a line-drive gap hitter. He has good speed and is a solid base-stealer. Defensively, Holt's range is a plus and his arm is average. He should be able to stick in center in the pros. Holt is a prototypical high-OBP top of the order hitter and should go in the top five or so rounds in June.

Mickey Wiswall- Jr, 1B (BC)

Wiswall exploded onto the scene last year after he followed up a solid spring by being named an All-Star in the Cape Cod league over the summer. Wiswall's spring has been a bit of a disappointment although he has still flashed his raw power by hitting two home runs vs. UCONN today giving him 16 on the year. He has a muscular build and thick trunk. As i said, his raw power is a plus and he generates good loft on his swing. He is an aggressive hitter but his swing is long and he is prone to strikeouts (45 on the year compared to 16 walks). He has some versatility defensively. His arm is below average and he profiles best at first but some teams may be willing to give him a shot as a corner outfielder.

Pat Dean- Jr, LHP (BC)

Dean is a projectable 6'1 175 pounds with long limbs. He features a fastball, slider, curveball, and change-up. He works in the 88-90 mph range but can dial it up to 92 on occasion. His fastball and slider are above-average pitches and his curve has a chance to be an average pitch as well. His command is good (51 K/10 BB on the year) and he changes speed well. He struck out nine and walked just one against Florida State but was punished on two mistakes which led to four runs on two long balls. Durability may be a concern for him as well. He was forced to miss a start this year with a sore shoulder and has been counted on to go deep into almost every game he pitches.


Michael Olt- Jr, 3B (UCONN)

Olt is a powerful third base prospect with plus raw power. He has a strong, athletic build and generates good bat speed. He has a long leg stride and a tendency to swing through balls which results in high strikeout totals. When he does make good contact he can really do some damage. I saw him lace a triple into the right-center gap. A converted shortstop, he has good actions at third base and an above-average arm. He made a great barehanded pick and throw on a bunt attempt early in the game. Although his speed his below average he should be able to stick at the hot corner. The question I have is will he be able to hit for enough average to stick in the professional ranks. He swings and misses much more often than you would like to see from a potential draft pick. That being said, you won't find many third baseman with this type of power potential.

George Springer- Soph, OF (UCONN)

I came away from the weekend most impressed with this sophomore. Springer has a lean and athletic build. He has plus power potential to all fields. Today, he hit a towering fly ball home run 400 plus feet to dead center (his first of two on the day). He is long to the ball and strikeouts way too much (29% K rate this year) but he also walks a fair amount as well. His speed is above average and his athleticism and arm could allow him to develop into an above average defender. Today he made an excellent diving catch on a sinking liner in right-center. He is certainly a prospect to follow for the 2011 draft class.

Saturday, May 15, 2010

Anthony Rizzo gets promoted

Last week one of my favorite prospects, Anthony Rizzo, was promoted to Double-A Portland to fill the void of Lars Anderson.

Rizzo has a really solid '09 campaign between Greenville and Salem. He was off to a slower start in 2010 but slugged .491 in 114 at bats for Salem this year.

Rizzo is a plus defender at first base and at just 20 years old his bat is coming along nicely. Keep in mind he is just a year and a half removed from chemotherapy as well.

Although the sample size is small, Rizzo's LD rate is way down and FB rate up from his career numbers. Could this be a change in approach to loft more balls or simply random noise? The only way for us to know is to watch him in person or simply wait for him to accumulate more at bats.