For the past few days I’ve been working on building a baseball database of all of the players who have played baseball from 1871 to 2008. The tricky part in building such a database is gathering statistics of the current season and merging it with the Lahman baseball database. A book called Baseball Hacks shows you how to gather statistics from the current season by using data from http://mlb.mlb.com and inserting it into a MySQL database.
One of the drawbacks in merging this data is trying to find a way to cross-reference a player’s playerID in the Lahman database with his mlb.com ID. A playerID is generated by using the first five letters of a player’s last name and first two letters of his first name. A number is added to the end of the ID to make it unique in case of duplicates. The playerID for Chipper Jones, for example, is jonesch06. His mlb.com ID is 116706. I was thinking since I know the pattern of how the playerIDs are generated in the Lahman database I could somehow use that to link the Lahman database data to the mlb.com data but this method could end up being too inaccurate.
Luckily, I stumbled upon the forums at http://www.baseball-fever.com. In the Statistics, Analysis, & Sabermetrics area there are some individuals asking how to link these IDs together. The author of THE BOOK — Playing The Percentages In Baseball posted a file that contains the playerIDs mlb.com IDs of all players. I should be able to use this information to merge the Lahman database with this current season! Hopefully, I’ll have a working database of past seasons and the current season soon.








{ 2 trackbacks }
{ 4 comments… read them below or add one }
Great idea, here. Thanks for suggesting the book, I hadn’t heard of it and I’ll have to check it now.
I know this is an old post, but why do you want to tie the Lehman database to MLB.com? Is there data on MLB.com that you can’t find in the Lehman database?
Thanks,
Jim McCurdy
Hello,
We are looking to find a way to download MLB boxscores and build a database format from this so we are able to analyze that database. Can you help with this? A feed service or someone that can set this up for us?
THX,
Rick
Great goods from you, man. I’ve understand your stuff previous to and you’re just extremely fantastic. I actually like what you’ve acquired here, certainly like what you are saying and the way in which you say it. You make it entertaining and you still care for to keep it sensible. I cant wait to read much more from you. This is really a wonderful website.