Homework 3: SQL
Due: October 20, 11:59pm
To be done
individually!!
PART-I: SQL
1. Introduction
In this assignment, you will write SQL queries that answer questions about a database containing information about players, teams and games from the National Basketball Association (NBA). Yahoo Sports (http://sports.yahoo.com/nba/) and ESPN (http://sports.espn.go.com/nba/) are examples of web sites that use such a database. We have simplified the schema by maintaining information for only part of the 2003 season[1].
Some notes to keep in mind as you go:
From your home directory, run the command:
gtar -zxvf /home/cc/cs186/fa05/hw/hw3/hw3.tar.gz
This will create a directory Hw3/ in your home directory
with the following scripts:
Script |
Description |
initdb.sh |
Initialize
database directory at $PGDATA_HW3. To reinitialize, you have to delete
existing $PGDATA_HW3 first. |
startpg.sh |
Starts Postgres master process for database directory
at $PGDATA_HW3 |
loadnba.sh loadbignba.sh |
Creates
database ‘hw3’, and loads in some test data. The test data from loadbignba.sh is based on real data from this
year’s NBA roster. We have also provided a smaller data set that you
can load using loadnba.sh. Feel free to
insert in your own data for your own testing. Read loaddata.sql to see how to use the copy command to bulk load data. |
loadxml.sh |
Creates database hw3_xml, creates a table named AuctionSites, and loads in some test data. |
runquery.sh <QUERY_FILE> |
Runs
query stored in <QUERY_FILE> using psql
and outputs to screen. We have provided a sample query file query0.sql |
runxml.sh <XPATH_FILE> |
Runs
xpath query stored in <XPATH_FILE> using psql
and outputs to screen. We have provided a sample xml query file xml0.sql |
stoppg.sh |
Stops
Postgres master process for the
database directory at $PGDATA_HW3 before you log off. |
To set up the initial
database for the very first time,
you should run:
There are 4 relations in the schema, which are described below along with their integrity constraints. Columns in the primary key are underlined.
Player(playerID: integer, name : varchar(50), position : varchar(10), height : integer, weight : integer, team: varchar(30))
Each Player is assigned a unique playerID. The position of a player can either be Guard, Center or Forward. The height of a player is in inches while the weight is in pounds. Each player plays for only one team. The team field is a foreign key to Team.
Team (name: varchar(30),
city : varchar(20))
Each Team has a unique name associated with it. There can be multiple teams from the same city.
Game (gameID: integer, homeTeam: varchar(30), awayTeam : varchar(30), homeScore : integer, awayScore : integer)
Each Game has a unique gameID. The fields homeTeam and awayTeam are foreign keys to Team. Two teams may play each other multiple times each season. There is an integrity check to ensure homeTeam and awayTeam are different.
GameStats (playerID
: integer, gameID: integer,
points : integer, assists :
integer,
rebounds : integer)
GameStats records the performance statistics of a player within a game. A player may not play in every game, in which case it will not have its statistics recorded for that game. gameID is a foreign key to Game. playerID is a foreign key to Player. Assume that two assertions are in place. The first is to ensure that the player involved belongs to either the involving home or away teams, and the second is to ensure that the total score obtained by a team (in Game) is consistent with the total sum (in GameStats) of individual players in the team playing in the game[2].
3. Queries
Please make sure you include the ORDER BY clause as stated in or queries. If your query output differs from ours because of a missing or mistaken ORDER BY clause, you do not get any credit! Also, make sure you name the aggregate columns and other output columns in the SELECT lists as indicated using the AS clause (e.g., "... AS average_points, ..." for Query 2).
1.
[1 point] Find the names of the shortest player(s) playing the "Guard" position
for each team. (ORDER BY Players.name)
select list: Player.name
ordering: Player.name
ascending
2. [1 point] List the names of all players and the average number of points and rebounds for all games that they played in, ignoring players who did not play in any games. (ORDER BY Player.name)
select list: Player.name, average_points, average_rebounds
ordering: Player.name ascending
3. [1 point] List all distinct (player name, team name) pairs of players and their teams, where the player name contains "Mark" and the team is located in a city whose name starts with "B" or "C" or "L". (ORDER BY Player.name, Team.name) Hint: You'll need to use the 'LIKE' predicate operator.
select list: Player.name, Team.name
ordering:
Player.name ascending
select list: Team.city, max_center_height
ordering: Team.city ascending
select list: Player.name, Player.height, Player.weight
ordering: Player.name ascending, Player.height
ascending
6. [2points] List the names and teams of all players that have averaged at least 10 points for their teams' home games thus far, but have not scored 10 points or more in any of their teams' away games. The result should be ordered by team names, and within each team, player names should be presented in descending order.
select list: Player.team, Player.name
ordering: Player.team ascending, Player.name descending
7. [2 points] List the IDs, names, and teams for all players who have scored at least 15 points in every home game for their team. Do not output players whose teams have not played any home games. (ORDER BY Player.playerID)
select list: Player.playerID, Player.name, Player.team
ordering: Player.playerID
8. [2 points] List the names of all teams with a winning percentage of 60% or better at home games. Output the team name (calling it 'team'), and the number of home wins and losses. (ORDER BY team)
select list: team, num_of_home_wins, num_of_home_losses
ordering: team
9.
[2 points] For each win of the "ChicagoBulls", list the game ID and the
name(s) of player(s) who scored the maximum number of points for the Bulls in
that win, along with that maximum number of points. (ORDER BY Game.gameID,
Player.name)
select list: gameID, Player.name, max_num_of_points
ordering: gameID, Player.name
10. [2 points] List the names of all teams that have a winning record thus far in the season (total number of wins > total number of losses). (ORDER BY team_name)
select list: team_name
ordering: team_name ascending
PART-II: XML & XPath
1. Introduction & Schema Information
For this part of the assignment, you will experiment with the XML support provided by PostgreSQL by posing XPath queries over XML documents describing online-auction-site databases. The DTD schema for such auction-site XML data is given below. (The element names should be self-explanatory.)
<!ELEMENT site
(regions, categories, catgraph, people, open_auctions, closed_auctions)>
<!ELEMENT categories (category+)>
<!ELEMENT category (name, description)>
<!ATTLIST category id ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT description (text | parlist)>
<!ELEMENT text (#PCDATA | bold | keyword | emph)*>
<!ELEMENT bold (#PCDATA | bold | keyword | emph)*>
<!ELEMENT keyword (#PCDATA | bold |
keyword | emph)*>
<!ELEMENT emph (#PCDATA | bold | keyword | emph)*>
<!ELEMENT parlist (listitem)*>
<!ELEMENT listitem (text | parlist)*>
<!ELEMENT catgraph (edge*)>
<!ELEMENT edge EMPTY>
<!ATTLIST edge from IDREF #REQUIRED to IDREF #REQUIRED>
<!ELEMENT regions (africa, asia, australia, europe, namerica, samerica)>
<!ELEMENT africa (item*)>
<!ELEMENT asia (item*)>
<!ELEMENT australia (item*)>
<!ELEMENT namerica (item*)>
<!ELEMENT samerica (item*)>
<!ELEMENT europe (item*)>
<!ELEMENT item (location, quantity, name, payment, description,
shipping, incategory+, mailbox)>
<!ATTLIST item id ID #REQUIRED
featured CDATA #IMPLIED>
<!ELEMENT location (#PCDATA)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT payment (#PCDATA)>
<!ELEMENT shipping (#PCDATA)>
<!ELEMENT reserve (#PCDATA)>
<!ELEMENT incategory EMPTY>
<!ATTLIST incategory category IDREF #REQUIRED>
<!ELEMENT mailbox (mail*)>
<!ELEMENT mail (from, to, date, text)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT itemref EMPTY>
<!ATTLIST itemref item IDREF #REQUIRED>
<!ELEMENT personref EMPTY>
<!ATTLIST personref person IDREF #REQUIRED>
<!ELEMENT people (person*)>
<!ELEMENT person (name, emailaddress, phone?, address?, homepage?,
creditcard?, profile?, watches?)>
<!ATTLIST person id ID #REQUIRED>
<!ELEMENT emailaddress (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT address (street, city, country, province?, zipcode)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT province (#PCDATA)>
<!ELEMENT zipcode (#PCDATA)>
<!ELEMENT country (#PCDATA)>
<!ELEMENT homepage (#PCDATA)>
<!ELEMENT creditcard (#PCDATA)>
<!ELEMENT profile (interest*, education?, gender?, business, age?)>
<!ATTLIST profile income CDATA #IMPLIED>
<!ELEMENT interest EMPTY>
<!ATTLIST interest category IDREF #REQUIRED>
<!ELEMENT education (#PCDATA)>
<!ELEMENT income (#PCDATA)>
<!ELEMENT gender (#PCDATA)>
<!ELEMENT business (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT watches (watch*)>
<!ELEMENT watch EMPTY>
<!ATTLIST watch open_auction IDREF #REQUIRED>
<!ELEMENT open_auctions (open_auction*)>
<!ELEMENT open_auction (initial, reserve?, bidder*, current, privacy?,
itemref, seller, annotation, quantity, type, interval)>
<!ATTLIST open_auction id ID #REQUIRED>
<!ELEMENT privacy (#PCDATA)>
<!ELEMENT initial (#PCDATA)>
<!ELEMENT bidder (date, time, personref, increase)>
<!ELEMENT seller EMPTY>
<!ATTLIST seller person IDREF #REQUIRED>
<!ELEMENT current (#PCDATA)>
<!ELEMENT increase (#PCDATA)>
<!ELEMENT type (#PCDATA)>
<!ELEMENT interval (start, end)>
<!ELEMENT start (#PCDATA)>
<!ELEMENT end (#PCDATA)>
<!ELEMENT time (#PCDATA)>
<!ELEMENT status (#PCDATA)>
<!ELEMENT amount (#PCDATA)>
<!ELEMENT closed_auctions (closed_auction*)>
<!ELEMENT closed_auction (seller, buyer, itemref, price, date, quantity, type,
annotation?)>
<!ELEMENT buyer EMPTY>
<!ATTLIST buyer person IDREF #REQUIRED>
<!ELEMENT price (#PCDATA)>
<!ELEMENT annotation (author, description?, happiness)>
<!ELEMENT author EMPTY>
<!ATTLIST author person IDREF #REQUIRED>
<!ELEMENT happiness (#PCDATA)>
PostrgreSQL stores entire XML documents as text BLOBs (Binary Large Objects), inside text-valued columns of relational tables, and provides functions that support (in-memory) processing of XPath queries over the documents. For the purposes of this assignment, we will provide you with a simple relational table named "AuctionSites" with schema: AuctionSites(auctionSiteID: integer, auctionSiteXML:text), where "auctionSiteID" is an integer site key and "auctionSiteXML" is the XML document for the corresponding site. You can initialize and load the data for the AuctionSites table using the loadxml.sh script. The runxml.sh script takes a file (containing your XPath query) as argument and runs is using psql.
2. XPath Queries
You should obviously use the DTD-schema information given above in formulating your queries. The queries can be expressed using the simple xpath_bool() and xpath_nodeset() functions supported by PostgreSQL. (See the README file under the contrib/xml2/ directory of the PostgreSQL distribution for more details, or this web page for a few examples.) Again, please make sure to rename the output columns in your query result as requested.
11. [1 point] List the address information for the person with id "person0" in each of the auction site tuples. (Output columns: auctionSiteID, person0_addresses)
12. [2 points] For each auction site, list the interval information/elements for each open auction where the type contains the substring "Dutch" and the quantity of auctioned items is greater than one. (Output columns: auctionSiteID, open_dutch_auctions)
13. [2 points] We classify a person as wealthy, if the income listed in his/her profile is more than 60000. For each auction site that contains at least one wealthy person, list the name information for all wealthy persons. Auction sites without any wealthy people should not appear in the output. (Output columns: auctionSiteID, wealthy_folks)
SUBMISSION INSTRUCTIONS
For both parts of the homework, submit each of your queries inside a separate file. Query X should go into a file called ‘queryX.sql’. E.g. query 1 goes into query1.sql, query 2 goes into query2.sql, and so on. You should be able to run ‘./runquery.sh queryX.sql’ to see the output of your query X. You are welcome to use view definitions within your query, as long as you ensure that runquery.sh command generates the correct output. This output will be used to be compared with our solutions to see if your query is written correctly.
.