Baseball hacks database

April 7, 2009

For the past few days I’ve been working on building a baseball database of all of the players who have played baseball from 1871 to 2008.  The tricky part in building such a database is gathering statistics of the current season and merging it with the Lahman baseball database.  A book called Baseball Hacks shows you how to gather statistics from the current season by using  data from http://mlb.mlb.com and inserting it into a MySQL database.

One of the drawbacks in merging this data is trying to find a way to cross-reference a player’s playerID in the Lahman database with his mlb.com ID.  A playerID is generated by using the first five letters of a player’s last name and first two letters of his first name.  A number is added to the end of the ID to make it unique in case of duplicates. The playerID for Chipper Jones, for example, is jonesch06.  His mlb.com ID is 116706.  I was thinking since I know the pattern of how the playerIDs are generated in the Lahman database I could somehow use that to link the Lahman database data to the mlb.com data but this method could end up being too inaccurate.

Luckily, I stumbled upon the forums at http://www.baseball-fever.com.  In the Statistics, Analysis, & Sabermetrics area there are some individuals asking how to link these IDs together.   The author of THE BOOK — Playing The Percentages In Baseball posted a file that contains the playerIDs mlb.com IDs of all players.  I should be able to use this information to merge the Lahman database with this current season!  Hopefully, I’ll have a working database of past seasons and the current season soon.

http://baseballsimulator.com/blog/wp-content/plugins/sociofluid/images/digg_32.png http://baseballsimulator.com/blog/wp-content/plugins/sociofluid/images/stumbleupon_32.png http://baseballsimulator.com/blog/wp-content/plugins/sociofluid/images/delicious_32.png http://baseballsimulator.com/blog/wp-content/plugins/sociofluid/images/blinklist_32.png http://baseballsimulator.com/blog/wp-content/plugins/sociofluid/images/blogmarks_32.png http://baseballsimulator.com/blog/wp-content/plugins/sociofluid/images/technorati_32.png http://baseballsimulator.com/blog/wp-content/plugins/sociofluid/images/google_32.png

{ 2 trackbacks }

New baseball database
07.09.09 at 4:44 am
How to maintain an up-to-date baseball database
04.15.10 at 4:51 pm

{ 4 comments… read them below or add one }

Ron 04.26.09 at 10:41 pm

Great idea, here. Thanks for suggesting the book, I hadn’t heard of it and I’ll have to check it now.

Jim McCurdy 01.21.11 at 9:34 pm

I know this is an old post, but why do you want to tie the Lehman database to MLB.com? Is there data on MLB.com that you can’t find in the Lehman database?

Thanks,
Jim McCurdy

Rick 03.30.11 at 7:55 pm

Hello,

We are looking to find a way to download MLB boxscores and build a database format from this so we are able to analyze that database. Can you help with this? A feed service or someone that can set this up for us?

THX,

Rick

Zasiedzenie Warszawa 11.23.11 at 6:19 pm

Great goods from you, man. I’ve understand your stuff previous to and you’re just extremely fantastic. I actually like what you’ve acquired here, certainly like what you are saying and the way in which you say it. You make it entertaining and you still care for to keep it sensible. I cant wait to read much more from you. This is really a wonderful website.

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>