New baseball database

July 9, 2009

In April, I wrote about building a baseball database of statistics for every player including those playing in the current 2009 season.  After several weeks working on it I have finally completed the database.   I had no idea it would be so complicated to build such a database but I learned a lot.

My first hurdle was learning about .htaccess .  I’ve forgotten exactly what .htaccess is for but I used it to redirect query results of players so each player will have their own unique URL page.

I then had to determine a way to separate the batting and pitching queries so when a pitcher is being searched you will get his pitching stats rather than batting stats.   Later, I’ll be able to have all of these stats on each player page

I thought it was important to be able to access Babe Ruth’s batting and pitching stats.  So, I added pitching stats to position players who also had pitched in their career.  Check out Jose Canseco and Steve Finley’s pitching stats.

If you are familiar with databases you know that every record must have a unique ID.  When dealing with the current 2009 season, one of the challenges was trying to find a way to generate a unique ID for all rookie players.  First, I needed to find a way to determine if a player is indeed a rookie.  Then, I had to find an automated way to generate a unique ID.  I spent a few days working on a stored procedure.  This stored procedure will scan the entire master database for duplicates and if duplicates are found, the number at the end of the ID will be incremented.

The method I was using for determining rookies had a flaw because I was searching for players who were playing in the 2009 season but hadn’t played in 2008.  The drawback to this was that some players, like Kris Benson, had been out of baseball for a few years so they weren’t actually rookies.  I had to go back and find the IDs for these players and ensure they weren’t considered new rookie players.

Each day I work on the database I always seem to find a mistake somewhere.  The other day, I noticed Scott Rolen’s 2009 AB totals were a little off.  For some reason, I missed the stats from one day in June so I had to go back and fix that.

Another thing I noticed was Miguel Tejada’s batting average in my database didn’t match the batting average from other sites.  After some research, I found that if a game was suspended due to rain I wasn’t updating the database.

Another glitch is I don’t have any 2009 stats for HBP, BK, and WP categories for pitchers.  I still need to find a way to gather this data.

I’m sure I’ve overlooked some other things.  If anyone notices any errors please let me know.

{ 1 comment… read it below or add one }

Joe 10.07.09 at 2:03 pm

Would you be willing to share your database and php source code (under an open source license)?

I’d love to take a look at this and help out if I can. I’ve been working on something similar, but would prefer to add to an existing program rather than recreate it from scratch, and I’m sure there are others who would as well.

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>