From the category archives:

Database

In MLB The Show Diamond Dynasty you can customize your Diamond Dynasty players’ appearance, full name, and even their uniform number.  Player customization is definitely a good feature especially considering the generic random names MLB the Show has come up with for these Dynasty players.  With names like “Ron Makepeace”, it’s a little difficult to be able to form an identity with your team.  It’s hard to keep track of who’s who on your team unless you’ve played with them for a few games.  Before you know it, a Diamond Dynasty player’s 30-40 game contract will have expired by the time you became fully familiarized with the names.

To help form an identity with my team, I’ve devised an MLB player database that can be used as a reference when customizing new Diamond Dynasty players before they are activated.  The database grades each MLB player from A through F.  Although, it’s not a perfect rating system it’s still a pretty useful resource for customizing a Diamond Dynasty player based on his aptitude.  The grading that was used is based on ESPN’s player ratings.  Right now, home runs and run production seem to play a big factor in how good a batter’s rating is.  The rating system should probably be refined to be a little more accurate but for now the current system is in workable shape.   Again, the current rating system not perfect but it can still help give you pretty good idea on naming your Diamond Dynasty players after MLB players.

I’ve also included baseball card images on the player pages so one can get a better idea on how to customize their Diamond Dynasty player’s appearance to look like the MLB player version.

To use, go to the Buy/Sell/Manage Cards screen on PS3.  Before activating a card to your roster, search for an MLB player that fits the same criterea as your Diamond Dynasty player.  Let’s take Chris Chavez, for instance, he’s a third baseman with a B- aptitude so we will need to search for B rated MLB third basemen.  Chris Chavez also bats left-handed so we can filter the search to include left-handed hitters only.  I’ll also filter the search to include players who played for the New York Yankees.  It looks like Chris Chavez might be best fitted to perform like a Graig Nettles type.

Diamond Dynasty

I think this database look up makes Diamond Dynasty a bit more fun when you are able to recognize the names of your players.  So far, as a Giants fan, I’ve named a platinum card after Juan Marichal.  I’ve also had a gold lefty starter named after Steve CarltonJesus Alou, who is my current left fielder, is actually turning into quite a star in Diamond Dynasty.

Hope you Enjoy!  MLB Player Baseball Database for Diamond Dynasty.


{ 0 comments }

Last year about this time I wrote about building a baseball player database using information provided by a book called Baseball Hacks. I’ve finally managed to write up a list of steps required to create such a database of player statistics and keep it up-to-date as the season progresses.

I’m by no means a SQL expert so the steps I took to get this far in building an automated database may not have been the best but it seems to be fully functional and working.  Please feel free to comment if improvements can be made.   Take a look at my steps below:

1.  Each year since 1995, Sean Lahman has provided a complete database of baseball statistics for every team and player in Major League history.  You can download the 2009 version here:  http://www.baseball1.com.

2.  MySQL will be used as the database.  Download and install MySQL (http://dev.mysql.com/downloads/).

3.  Download and create 5 tables which will be used to store data from the current season. Download: current_season.sql.

4. Download and install Perl: http://www.perl.org/get.html

5. A few things on the MLB.com Gameday servers have changed since the Baseball Hacks book was released in 2006 so I’ve made some changes to the Perl scripts provided by the book. Download modified scripts: (load_db_all_days.pl, update_db.pl, save_to_db.pm).

6. These scripts require some Perl modules in order for them to work. Most of these modules are already installed when you install Perl but you may need to add the DBD-mysql model. You can do this by running the Perl Package Manager from Start>Programs>Perl.

Here’s a list of modules used by these scripts:

DBI
XML::Simple
LWP
Time::Local
Data::Dumper
DBD-mysql

7. Run load_db_all_days.pl to gather data from opening day to games that were played yesterday. When you run load_db_all_days.pl it will begin inserting data from games that were played on April 5th and will complete after data from all games that were played yesterday have been inserted.

8. This year opening day actually started on April 4th, 2010 (http://gdx.mlb.com/components/game/mlb/year_2010/month_04/day_04/) with the Yankees and Red Sox. There was also an exhibition played between the Mariners and Giants. In order to gather data for the game between the Yankees and Red Sox we’ll need to modify load_db_all_days.pl.

The starting dates in the script will need to be changed each season. In load_db_all_days.pl.

Change the following lines below:

my $mintime = timegm(0,0,0,4,3,110);
my $mintimestr = gmtime(timegm(0,0,0,4,3,110));

The $mintime variable is set to timegm(second, minute, hour, day, month, year). The number 4 is the day and 3 is the month. In Perl, months range from 0-11 (0=January, 11=December). So, the number 3 = April. The year is from 1900 (e.g., 1900 + 110 = 2010).

The dates for $mintime and $mintimestr should always match.

To force the script to run just for one game uncomment line 63 so it looks like this:

die;

If for some reason the exhibition game was processed you can always go into your database and delete data for the exhibition game from the 5 tables that were created in step 3.

To run a Perl script in Windows open a command prompt and change the directory to the location where load_db_all_days.pl resides and run “Perl load_db_all_days.pl

9. Create a new table for batters. This table will store a combination of statistics from the Lahman database and current season statistics.

CREATE TABLE batting_combined (
  playerID varchar(9) NOT NULL DEFAULT '',
  stint smallint(2) NOT NULL,
  teamID char(3) DEFAULT NULL,
  lgID char(2) DEFAULT NULL,
  nameFirst varchar(50) DEFAULT NULL,
  nameLast varchar(50) DEFAULT NULL,
  yearID smallint(4) NOT NULL DEFAULT '0',
  g smallint(3) DEFAULT NULL,
  ab smallint(3) DEFAULT NULL,
  r smallint(3) DEFAULT NULL,
  h smallint(3) DEFAULT NULL,
  2B smallint(3) DEFAULT NULL,
  3B smallint(3) DEFAULT NULL,
  hr smallint(3) DEFAULT NULL,
  rbi smallint(3) DEFAULT NULL,
  sb smallint(3) DEFAULT NULL,
  cs smallint(3) DEFAULT NULL,
  bb smallint(3) DEFAULT NULL,
  so smallint(3) DEFAULT NULL,
  sf smallint(3) DEFAULT NULL,
  hbp smallint(3) DEFAULT NULL,
  title varchar(50) DEFAULT NULL,
  url varchar(255) DEFAULT NULL,
  message varchar(255) DEFAULT NULL,
  sponsor text,
  PRIMARY KEY (stint,playerID,yearID)
);

10. Create a new table for pitchers.

CREATE TABLE pitching_combined (
  playerID varchar(9) NOT NULL DEFAULT '',
  stint char(2) NOT NULL DEFAULT '',
  teamID char(3) DEFAULT NULL,
  lgID char(2) DEFAULT NULL,
  nameFirst varchar(50) DEFAULT NULL,
  nameLast varchar(50) DEFAULT NULL,
  yearID smallint(4) NOT NULL DEFAULT '0',
  g smallint(3) unsigned DEFAULT NULL,
  gs smallint(2) DEFAULT NULL,
  cg smallint(2) DEFAULT NULL,
  sho smallint(2) DEFAULT NULL,
  bf smallint(2) DEFAULT NULL,
  outs smallint(2) DEFAULT NULL,
  hr smallint(2) DEFAULT NULL,
  bb smallint(2) DEFAULT NULL,
  so smallint(2) DEFAULT NULL,
  er smallint(2) DEFAULT NULL,
  runs smallint(2) DEFAULT NULL,
  hits smallint(2) DEFAULT NULL,
  wins smallint(2) DEFAULT NULL,
  losses smallint(2) DEFAULT NULL,
  holds smallint(2) DEFAULT NULL,
  hopps smallint(2) DEFAULT NULL,
  saves smallint(2) DEFAULT NULL,
  sopps smallint(2) DEFAULT NULL,
  HBP smallint(3) DEFAULT NULL,
  BK smallint(3) DEFAULT NULL,
  WP smallint(3) DEFAULT NULL,
  title varchar(50) DEFAULT NULL,
  url varchar(255) DEFAULT NULL,
  message varchar(255) DEFAULT NULL,
  sponsor text,
  PRIMARY KEY (playerID,yearID,stint)
);

11. Run following insert statement for the batting_combined table:

insert into batting_combined
(playerID,yearID, teamID,stint,lgID ,nameFirst, nameLast, g, ab, r, h, `2B`, `3B`, hr, rbi, sb, cs, bb, so, sf, hbp)
SELECT
ma.playerID,
ba.yearID,
ba.teamID,
ba.stint,
ba.lgID,
ma.nameFirst,
ma.nameLast,
ba.G,
ba.AB,
ba.R,
ba.H,
ba.`2b`,
ba.`3b`,
ba.HR,
ba.RBI,
ba.SB,
ba.CS,
ba.BB,
ba.SO,
ba.sf,
ba.hbp
FROM master AS ma , batting AS ba
WHERE ma.playerID = ba.playerID;

12. Run following insert statement for the pitching_combined table:

insert into pitching_combined
(playerID,stint,teamID,lgID,nameFirst, nameLast, yearID, g,gs,cg,sho, bf,outs, hr, bb, so,hbp,bk,wp, er, runs, hits, wins, losses, saves)
SELECT ma.playerID,
pi.stint,
pi.teamID,
pi.lgID,
ma.nameFirst,
ma.nameLast,
pi.yearID,
pi.G,
pi.gs,
pi.cg,
pi.sho,
pi.BFP,
pi.ipouts,
pi.HR,
pi.BB,
pi.SO,
pi.hbp,
pi.bk,
pi.wp,
pi.ER,
pi.R,
pi.H,
pi.W,
pi.L,
pi.SV
FROM master AS ma , pitching AS pi
WHERE ma.playerID = pi.playerID;

13. Tangotiger of http://www.insidethebook.com has provided a list of cross referenced player ids for each player . This list allows us to map a player’s ‘Lahman’ database id with his ‘mlb.com’ id.

Create the following table:

CREATE TABLE export_id_map (
  MLBAM_ID int(11) DEFAULT NULL,
  retro_id varchar(11) DEFAULT NULL,
  bdb_id varchar(11) DEFAULT NULL,
  stats_id int(11) DEFAULT NULL,
  bis_id int(11) DEFAULT NULL,
  source_id int(11) DEFAULT NULL
);

You can refer to the example below to see how to insert the csv file from Tangotiger’s site into the export_id_map table:

load data local infile 'C:\\Files\\EXPORT_ID_MAP\\EXPORT_ID_MAP.txt' into table export_id_map
fields terminated by ','
enclosed by '"'
lines terminated by '\r\n'
(MLBAM_ID, retro_id, bdb_id, stats_id, bis_id, source_id);

14. Create following table export_id_map2. In this table I added two columns nameFirst and nameLast:

CREATE TABLE export_id_map2 (
  MLBAM_ID int(11) NOT NULL,
  retro_id varchar(11) DEFAULT NULL,
  bdb_id varchar(11) DEFAULT NULL,
  stats_id int(11) DEFAULT NULL,
  bis_id int(11) DEFAULT NULL,
  source_id int(11) DEFAULT NULL,
  nameLast varchar(50) DEFAULT NULL,
  nameFirst varchar(50) DEFAULT NULL,
  PRIMARY KEY (MLBAM_ID)
);

15. Step 14 was necessary because the list provided by Tangotiger last year did have player names.  I haven’t adapted to this year’s list which did not have columns for player names.  If I decided to leave out the names I probably would have had to build a 3 table join which can get complicated especially with my lack of thorough knowledge of MySQL joins. Run the following insert:

insert into export_id_map2
(MLBAM_ID,retro_id, bdb_id,stats_id,bis_id ,source_id, nameLast, nameFirst)
SELECT
ex.MLBAM_ID,
ex.retro_id,
ex.bdb_id,
ex.stats_id,
ex.bis_id,
ex.source_id,
ma.nameLast,
ma.nameFirst
FROM master AS ma , export_id_map AS ex
WHERE ma.playerID = ex.bdb_id
group by ex.bdb_id

16. Create following table for new players that are not currently found in the Lahman database:

CREATE TABLE rookies (
  playerID varchar(10) NOT NULL,
  gameDate date DEFAULT NULL,
  teamID char(3) DEFAULT NULL,
  eliasID int(6) NOT NULL,
  nameFirst varchar(32) DEFAULT NULL,
  nameLast varchar(32) DEFAULT NULL,
  nickName varchar(32) DEFAULT NULL,
  jersey smallint(2) DEFAULT NULL,
  throws char(1) DEFAULT NULL,
  PRIMARY KEY (playerID)
);

17. Create a view (vw_lastplayed) which will return a list of players that didn’t play last year:

CREATE VIEW vw_lastplayed AS select
ma.nameFirst AS nameFirst,
ma.nameLast AS nameLast,
max(ba.yearID) AS myMax,
ba.G AS G,
ba.AB AS AB,
ba.R AS R,
ba.H AS H,
ba.2B AS 2b,
ba.3B AS 3b,
ba.HR AS HR,
ba.RBI AS RBI,
ba.SB AS SB,
ba.CS AS CS,
ba.BB AS BB,
ba.SO AS SO,
ma.playerID AS playerID
from (`master` ma join batting ba) where (ma.playerID = ba.playerID) group by ma.playerID having (max(ba.yearID) = '2008');

18. Create following view to determine which batters have played for the last 20 years.  This helps determine whether or not a player should be considered a rookie:

CREATE VIEW vw_20years AS select
concat(ma.nameFirst,' ',ma.nameLast) AS `CONCAT(ma.nameFirst, ' ', ma.nameLast)`
from (`master` ma join batting ba) where ((ma.playerID = ba.playerID) and (ba.yearID between 1988 and 2008)) group by concat(ma.nameFirst,' ',ma.nameLast);

19. Create following view to determine all pitchers that have played within the last 20 years:

CREATE VIEW vw_pitching_20years AS select
concat(ma.nameFirst,' ',ma.nameLast) AS `CONCAT(ma.nameFirst, ' ', ma.nameLast)`
from (`master` ma join batting ba) where ((ma.playerID = ba.playerID) and (ba.yearID between 1988 and 2008)) group by concat(ma.nameFirst,' ',ma.nameLast);

20. Create following procedure which will scan ids in the master table and compare ids in rookies table for duplicates. If duplicates are found, the number at the end of the ID will be incremented.

CREATE  PROCEDURE `idGen`()
BEGIN
   DECLARE counter INT DEFAULT 1;
            simple_loop: LOOP
              SET counter=counter+1;
              select counter;
              update rookies set playerid = lower(concat(left(nameLast,5), left(nameFirst,2),concat('0', counter))) where playerid in (Select playerid FROM master);
              IF counter=20 THEN
                 LEAVE simple_loop;
              END IF;
            END LOOP simple_loop;
            update rookies set playerid = lower(playerid);
END

21. Run update_db.pl nightly. I have it scheduled to run at about 5 am Pacific time each day.

The update_db.pl script will gather data for games that were played the day before.  This script also does the following:

  • Updates the batting_combined & pitching_combined tables to combine current season statistics with all-time stats.
  • Inserts data into rookies table so we have a list of all known rookies.
  • Inserts data into batting_combined & pitching_combined of all players who exist in the rookies table list but aren’t really rookies (i.e. they missed the 2009 season).
  • Deletes non-rookies from the rookies table.
  • Runs procedure created in step 19.

That’s it!  Here is the baseball database.  Please leave a comment if you have any questions.

{ 6 comments }

Baseball Card Database

March 23, 2010

Baseball Card Database

Over 37000 unique cards can be viewed, searched and “flipped” in my new baseball card database.  I hope you enjoy these cards.  Maybe you’ll find something interesting.  I’ve spent hours looking through these cards.

Funny Baseball Cards

Take a look at Oscar Gamble’s large Afro hair or Wally Moon’s ‘eyebrow’ or Don Mossi’s ears.  Ross Grimsley also had a nice hairdo.  John Pacella of the dreadful Mets of the early 80’s lost his hat.

Check out Jason Michael’s big glove and Kirby Puckett carried a big bat.

Larry Haney appears to be a left-handed catcher in this reverse negative.   Bob Uecker is a right-handed batter but posed left-handed.

My Collection

I currently have about 15,000 cards in my baseball card collection. I have a least one card from every year from 1968 to 1995.

The oldest card in my collection is a 1968 Al Weis card. Al played a major role for the 1969 Miracle Mets in the World Series. In Game 2 his 9th-inning single drove in the winning run in a 2-1 victory. In the clincher, Game 5 at Shea Stadium, he homered off McNally in the seventh inning to tie the game at 3-3. The Mets scored two runs in the eighth to complete their improbable World Series victory.

Of my older baseball cards, I have quite a few from the 1974 set. I happen to have a Mike Schmidt and Charlie Hough rookie card. I also have Dan Fife’s only baseball card ever.

I started purchasing card packs in 1978. This set had a game on the back called “PLAY BALL. Played by two”. The instructions were as follows:

Player has 50 player cards. Toss coin for who goes first. First player turns cards over one at a time. Attempting to score runs until 3 outs have been made. After 3 outs, second player begins. Game is played with 9 innings. In case of tie, play extra innings.

If you turned up Paul Blair’s baseball card you would have a home run.

I always thought Mark McGwire’s United Stated Baseball Team rookie card would be worth a lot some day but its value has likely taken a major hit since the steroid scandal. I have four of these cards.  I also found a couple
Barry Bonds
rookie cards.

{ 3 comments }

New baseball database

July 9, 2009

In April, I wrote about building a baseball database of statistics for every player including those playing in the current 2009 season.  After several weeks working on it I have finally completed the database.   I had no idea it would be so complicated to build such a database but I learned a lot.

My first hurdle was learning about .htaccess .  I’ve forgotten exactly what .htaccess is for but I used it to redirect query results of players so each player will have their own unique URL page.

I then had to determine a way to separate the batting and pitching queries so when a pitcher is being searched you will get his pitching stats rather than batting stats.   Later, I’ll be able to have all of these stats on each player page

I thought it was important to be able to access Babe Ruth’s batting and pitching stats.  So, I added pitching stats to position players who also had pitched in their career.  Check out Jose Canseco and Steve Finley’s pitching stats.

If you are familiar with databases you know that every record must have a unique ID.  When dealing with the current 2009 season, one of the challenges was trying to find a way to generate a unique ID for all rookie players.  First, I needed to find a way to determine if a player is indeed a rookie.  Then, I had to find an automated way to generate a unique ID.  I spent a few days working on a stored procedure.  This stored procedure will scan the entire master database for duplicates and if duplicates are found, the number at the end of the ID will be incremented.

The method I was using for determining rookies had a flaw because I was searching for players who were playing in the 2009 season but hadn’t played in 2008.  The drawback to this was that some players, like Kris Benson, had been out of baseball for a few years so they weren’t actually rookies.  I had to go back and find the IDs for these players and ensure they weren’t considered new rookie players.

Each day I work on the database I always seem to find a mistake somewhere.  The other day, I noticed Scott Rolen’s 2009 AB totals were a little off.  For some reason, I missed the stats from one day in June so I had to go back and fix that.

Another thing I noticed was Miguel Tejada’s batting average in my database didn’t match the batting average from other sites.  After some research, I found that if a game was suspended due to rain I wasn’t updating the database.

Another glitch is I don’t have any 2009 stats for HBP, BK, and WP categories for pitchers.  I still need to find a way to gather this data.

I’m sure I’ve overlooked some other things.  If anyone notices any errors please let me know.

{ 1 comment }

Baseball hacks database

April 7, 2009

For the past few days I’ve been working on building a baseball database of all of the players who have played baseball from 1871 to 2008.  The tricky part in building such a database is gathering statistics of the current season and merging it with the Lahman baseball database.  A book called Baseball Hacks shows you how to gather statistics from the current season by using  data from http://mlb.mlb.com and inserting it into a MySQL database.

One of the drawbacks in merging this data is trying to find a way to cross-reference a player’s playerID in the Lahman database with his mlb.com ID.  A playerID is generated by using the first five letters of a player’s last name and first two letters of his first name.  A number is added to the end of the ID to make it unique in case of duplicates. The playerID for Chipper Jones, for example, is jonesch06.  His mlb.com ID is 116706.  I was thinking since I know the pattern of how the playerIDs are generated in the Lahman database I could somehow use that to link the Lahman database data to the mlb.com data but this method could end up being too inaccurate.

Luckily, I stumbled upon the forums at http://www.baseball-fever.com.  In the Statistics, Analysis, & Sabermetrics area there are some individuals asking how to link these IDs together.   The author of THE BOOK — Playing The Percentages In Baseball posted a file that contains the playerIDs mlb.com IDs of all players.  I should be able to use this information to merge the Lahman database with this current season!  Hopefully, I’ll have a working database of past seasons and the current season soon.

{ 10 comments }