logo
 
Assignment - SQLing
CS 5800 - Database Systems
Utah State University
Home
Calendar
Homework   
Lectures
Syllabus
Resources
People
Due date: Tuesday, Oct. 24 (23:59), late turnins will have 20% deducted..

Turnin: Use Canvas.

Overview

In this assignment you will create a schema, populate it, and then write some queries. The database you create will be managed using MySQL, PostgreSQL (or some SQL compliant DBMS of your choosing). The data is from www.seanlahman.com/baseball-archive/statistics. More information on the data is available from that site. The fields are defined here http://seanlahman.com/files/database/readme2012.txt, though I have changed the names of some of the fields. Here is a screen grab of the part of the ER diagram that shows how tables, such as the Master and Appearances tables, are related. A bigger picture of how the tables are related can be found here.

Choosing a DBMS

Using MySQL

If you would like to use MySQL, it can be downloaded and installed on your home machine. To create the database and populate it you may use the script baseball.sql.

Using Postgres

You may alternatively use Postgres. To create the database and populate it you may use the script postgres.sql.

Using SQL Server

SQL Server limits inserts to 1000 rows at a time, so the sqlserver.sql script breaks up the data inserts into 999 rows in each transaction. You will have to create the tables using the postgres.sql script.

Other DBMSs

You may use a DBMS of your choosing. The SQL scripts to load the data should work on your chosen system. Use the postgres.sql script for the data (it is the generic script). If you have trouble with this script, please e-mail me to let me know.

Creating the Database

An SQL script (specific to MySQL) to create the database is available in the file baseball.sql. In the Query Workbench, first create a schema as follows.
  create schema baseball;
This will create the baseball schema. Make this the default schema. Next open the script and execute it. If you ever need to recreate the database just drop the schema, recreate it, and re-run the script. The database that will be created has twenty-five tables.

Deliverables and Submission Instructions

Create a file called queries.sql, which will be a text file containing your queries for each of the questions given below. Each query will be (at least one) SELECT statement. You may create "helper" views for the queries, but if you do, be sure to give the SQL for the created views. I have added an "answer" for each query. The answers are merely to help you check if your query could be correct, but be careful because an incorrect query might produce a correct answer, and even a correct query may produce a slightly different answer.
  1. LA Dodgers - List the first name and last name of every player that has played at any time in their career for the Los Angeles Dodgers. List each player only once. (answer)
  2. LA Dodgers Only - List the first name and last name of every player that has played only for the Los Angeles Dodgers (i.e., they did not play for any other team including the Brooklyn Dodgers, note that the Brooklyn Dodgers became the Los Angeles Dodgers in the 1950s). List each player only once. (answer)
  3. Expos Pitchers - List the first name and last name of every player that has pitched for the team named the "Montreal Expos". List each player only once. (answer)
  4. Error Kings - List the name of the team, year, and number of errors (the number is the "E" column in the "teams" table) for every team that has had 160 or more errors in a season. (answer)
  5. USU batters - List the first name, last name, year played, and batting average (h/ab) of every player from the school named "Utah State University". (answer)
  6. Yankee Run Kings - List the name, year, and number of home runs hit for each New York Yankee batter, but only if they hit the most home runs for any player in that season. (answer)
  7. Bumper Salary Teams - List the total salary for two consecutive years, team name, and year for every team that had a total salary which was 1.5 times as much as for the previous year. (answer)
  8. Montreal Expos Three - List the first name and last name of every player that has batted for the Montreal Expos in at least three consecutive years. List each player only once. (answer)
  9. Home Run Kings - List the first name, last name, year, and number of HRs of every player that has hit the most home runs in a single season. Order by the year. Note that the "batting" table has a column "HR" with the number of home runs hit by a player in that year. (answer)
  10. Third best home runs each year - List the first name, last name, year, and number of HRs of every player that hit the third most home runs for that year. Order by the year. (answer)
  11. Triple happy team mates - List the team name, year, names of player, the number of triples hit (column "3B" in the batting table), in which two or more players on the same team hit 10 or more triples each. (answer)
  12. Ranking the teams - Rank each team in terms of the winning percentage (wins divided by (wins + losses)) over its entire history. Consider a "team" to be a team with the same name, so if the team changes name, it is considered to be two different teams. Show the team name, win percentage, and the rank. (answer)
  13. Pitchers for Mangaer Casey Stengel - List the year, first name, and last name of each pitcher who was a on a team managed by Casey Stengel (pitched in the same season on a team managed by Casey). (answer)
  14. Two degrees from Yogi Berra - List the name of each player who appeared on a team with a player that was at one time was a teamate of Yogi Berra. So suppose player A was a teamate of Yogi Berra. Then player A is one-degree of separation from Yogi Berra. Let player B be related to player A because A played on a team in the same year with player A. Then player A is two-degrees of separation from player A. (answer)
  15. Median team wins - For the 1970s, list the team name for teams in the National League ("NL") that had the median number of total wins in the decade (1970-1979 inclusive). (answer)
The queries.sql file should have the following format (note, two dashes "--" at the start of a line indicates a comment). Please add other comments as needed.
...
-- Query 5 - USU batters
-- List the first name, last name, year played, 
-- and batting average (h/ab) of every player
-- from the school named "Utah State University".
SELECT ...
FROM ... 
WHERE ...
-- Query 6 ...
-- ...
...
Be sure to include the number for each query, the English text describing the query and the SQL text for the query. If the query does not work or is incomplete, please comment out the SQL text and put in appropriate comments as to why the query does not work. Our plan is to run the queries on some tables to ensure that they perform correctly.

Grading

Good programming style is always important. Your code should be clear, concise, and have appropriate comments.

Groups

The assignment permits you to work in groups of at most two. Forming a group will usually result in less work for each group member, but you may choose to do the assignment individually if you so desire. Groups will be graded the same as individuals. To register your group, please e-mail me (I will assume partners working together on the ER model will continue in a group for this assignment unless otherwise notified). Ideally, each group will be happy and harmonious, but if not, then you are responsible for the entire assignment.
                                                                                                                                                                                                                                                                                                                                             
  Copyright © 2017 by Curtis Dyreson. All rights reserved.