Lab 13: SeQueL
Starter Files
Download lab13.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.
A 61A Thanksgiving
This lab is optional, but highly recommended for practice.
In this lab, we will move past the basics of SQL and start practicing recursive select statements and aggregation functions. Keep in mind that questions towards the end will require extensive use of both!
Recursive SQL
A table defined within a with
clause may have a single recursive case
that defines output rows in terms of other output rows.
sqlite> with
...> fib(previous, current) as (
...> SELECT 0, 1 UNION
...> SELECT current, previous + current FROM fib
...> WHERE current <= 20
...> )
...> SELECT previous FROM fib;
0
1
1
2
3
5
8
13
Also recall that you can perform string concatenation with the || operator:
sqlite> select "hello" || " " || "world"
hello world
Question 1: Going Home
It's Thanksgiving! You want to book your flight home, but there's just one problem: everybody knows that as we get closer to Thanksgiving, the prices of airplane tickets shoot up for no good reason.
Assume that on November 1st, 2nd, and 3rd, your tickets home cost $20, $30, and $40 dollars respectively. What we want to do is find the prices you'll have to pay if you wait.
You have a friend in the airline business, so you know the following information about ticket prices in November: on an day after the 3rd, the price of a ticket is equal to the average of the previous two days, which is then added to 5 times the value of the numerical date (i.e 5th, 6th, etc) mod 7 (to account for the fact that demand rises through the week).
For example, consider the price of tickets on November 16. If the price of a ticket on the 14th was $10 (this is not true in our model), and on the 15th it was $20, the average of those two prior days is $15. We then add that to 5*(16%7), which is 10. Thus, on the 16th tickets should be $25 (Yikes!).
You may notice that this definition is recursive, which is why we're going to solve this problem with recursive selects in SQL!
Your end goal is to return a table flight_costs
with the dates in November, from November 1st
to 25th (the day before Thanksgiving), along with the ticket price on each day:
Day | Price |
---|---|
1 | 20 |
2 | 30 |
3 | 40 |
And so on.
Hint: You CAN use the % operator in SQL! :)
Hint: You might notice that your output numbers are all whole numbers if you use the "/" operator to do your division (which is what we want you to do in this part). This is because SQL will only produce an integer data type when it divides two integer data types. Don't worry about that for this question: having only integer outputs is OK. Just make sure you get the recursive idea correct, and you should be fine.
CREATE TABLE flight_costs as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
with flights(day, cur, prev, prev2) as (
SELECT 1, 20, 0, 0 UNION
SELECT 2, 30, 20, 0 UNION
SELECT 3, 40, 30, 20 UNION
SELECT day + 1, (cur + prev)/2 + (5 * ((day+1) % 7)), cur, prev FROM flights
WHERE day >= 3 and day < 25
)
SELECT day as Day, cur as Price FROM flights;
Use OK to test your code:
python3 ok -q flights
Question 2: A Friend in Need Requires Turkey Indeed
One of your friends has made the terrible mistake of booking Thanksgiving plane tickets late! However, upon seeing how well you did on your 61A SQL lab, your friend asked you to help find the best plane tickets for flying home in this darkest of hours.
Of course, you'd like to find the cheapest flight possible, but regardless of the savings, you would also like to make sure you don't send your friend on too many flight transfers.
Therefore, to help your friend out, find the cheapest set of flights from SFO to PDX but do not include options with more than two flights! You should generate a table with the following columns:
- The set of airports that the flights pass through.
- Total cost of a set of flights.
Be sure to order your table from the cheapest to most expensive option.
All of the available flights as well as their prices can be found in the
flights
table.
You should get the following output:
sqlite> SELECT * FROM schedule;
SFO, SLC, PDX|176
SFO, LAX, PDX|186
SFO, PDX|192
Hint: As with before, you may find it helpful to create a table using recursive select. What are all the things you need to keep track of? For example, it might be helpful to save the number of flights taken on the current path.
Hint: If your table is taking a long time to generate, it might be stuck in a loop somewhere. Notice that there are some flight paths that loop, e.g SLC to LAX to SLC again. To handle this, consider when you should stop adding rows to your table.
CREATE TABLE schedule as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
with trips(path, ending, flights, cost) as (
SELECT departure || ", " || arrival, arrival, 1, price FROM flights
WHERE departure = "SFO" UNION
SELECT path || ", " || arrival, arrival, flights + 1, cost + price
FROM trips, flights
WHERE ending = departure AND flights < 2
)
SELECT path, cost FROM trips WHERE ending = "PDX" ORDER BY cost;
Use OK to test your code:
python3 ok -q schedule
Question 3: Shopping Cart
Seeing as how you are now a responsible college student, it has finally fallen to you to do the Thanksgiving grocery shopping! You have been given a $60 budget, but you can't make up your mind on what to buy, so you decide to consult the all-knowing SQL.*
You have access to all the possible things you could purchase as well as their
costs in the supermarket
table. Write a SQL query that creates a table
shopping_cart
that lists of all possible ways you could fill your budget with
delicious Thanksgiving eats. The final table should have 2 columns:
- Comma separated list of items, from least to most expensive.
- Amount of your budget left over.
Finally, order your results in ascending order of leftover budget. For lists that have the same remaining budgets, order them alphabetically.
You should get the following output:
sqlite> SELECT * FROM shopping_cart LIMIT 15;
CAKE!|0
cornbread, cornbread, cornbread, cornbread, cornbread|0
cranberries, cranberries, cranberries, cornbread, cornbread, pumpkin pie|0
cranberries, cranberries, cranberries, cranberries, cornbread, tofurky|0
cranberries, cranberries, cranberries, cranberries, cranberries, potatoes, pumpkin pie|0
cranberries, cranberries, cranberries, cranberries, potatoes, potatoes, cornbread|0
cranberries, cranberries, potatoes, cornbread, cornbread, cornbread|0
potatoes, potatoes, potatoes, potatoes, potatoes, potatoes|0
potatoes, potatoes, potatoes, potatoes, tofurky|0
potatoes, potatoes, potatoes, pumpkin pie, pumpkin pie|0
potatoes, potatoes, potatoes, turkey|0
potatoes, potatoes, tofurky, tofurky|0
potatoes, pumpkin pie, pumpkin pie, tofurky|0
potatoes, tofurky, turkey|0
pumpkin pie, pumpkin pie, pumpkin pie, pumpkin pie|0
Hint: To order by more than one column, separate them with commas and put them
after ORDER BY
. Keep in mind that ORDER BY col_a, col_b
will order by
col_a
and then col_b
.
CREATE TABLE shopping_cart as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
with cart(list, last, budget) as (
SELECT item, price, 60 - price FROM supermarket WHERE price <= 60 UNION
SELECT list || ", " || item, price, budget - price FROM cart, supermarket
WHERE price <= budget AND price >= last
)
SELECT list, budget FROM cart ORDER BY budget, list;
* Of course, SQL is very good at enumerating ALL possible shopping sprees. However, as you will see, they don't always make sense! As bonus challenge, try modifying your table so that you never pick more than two of any item.
Use OK to test your code:
python3 ok -q shopping-cart
SQL Aggregation
Previously, we have been dealing with queries that process one row at a time.
When we join, we make pairwise combinations of all of the rows. When we use
WHERE
, we filter out certain rows based on the condition. Alternatively,
applying an aggregate function
such as MAX(column)
combines the values in multiple rows.
By default, all rows are combined together. What if we wanted to group together
the values in similar rows and perform the aggregation operations within those
groups? We use a GROUP BY
clause.
Here's an example using our flights
table.
For each unique departure, collect all the rows having the same departure
airport into a group. Then, select the price
column and apply the MIN
aggregation to recover the price of the cheapest departure from that group. The
end result is a table of departure airports and the cheapest departing flight.
sqlite> SELECT departure, MIN(price) FROM flights GROUP BY departure;
AUH|932
LAS|50
LAX|89
SEA|32
SFO|40
SLC|42
Just like how we can filter out rows with WHERE
, we can also filter out
groups with HAVING
. Important: A HAVING
clause should use an aggregate
function. Suppose we want to see all airports with at least two departures:
sqlite> SELECT departure FROM flights GROUP BY departure HAVING COUNT(*) >= 2;
LAX
SFO
SLC
Note that the COUNT(*)
aggregate just counts the number of rows in each group.
Say we want to count the number of distinct airports instead. Then, we could
use the following query:
sqlite> SELECT COUNT(DISTINCT departure) FROM flights;
6
This enumerates all the different departure airports available in our flights
table (in this case: SFO, LAX, AUH, SLC, SEA, and LAS).
Self Restraint
Tragically, many people find themselves overeating during Thanksgiving. With your new knowledge from 61A, you resolve to use SQL and plan a healthier meal!
You are given a table main_course
where each row corresponds to a possible
Thanksgiving meal with two components: the meat and the side dish (in an amazing
display of restraint, you are limiting yourself to just one side dish). You are
also given a second table pies
containing different types of pies as well as
their caloric content. The idea is that you will pair the two items consisting
of your main course (a row of the table main_course
) with a pie that you will
have for dessert. Use SQL's aggregation features to answer the following
questions.
Question 4: Self Restraint, Part I
For this first part, we want to know how many selections of meats we have for our meal. Use a select statement to see how many different types of meats we have in our list of main courses. (We would like to point out that the 61A staff is inclusive, and we have included tofurky as the "meat" in some meals.)
Store this answer in a one column, one row table called number_of_options
.
CREATE TABLE number_of_options as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
SELECT COUNT(DISTINCT meat) from main_course;
Use OK to test your code:
python3 ok -q meals-part1
Question 5: Self Restraint, Part II
Use aggregation in a select statement to count the number of "full" meals (i.e main course plus a pie) we can make with under 2500 calories total. For example, if you have turkey and cranberries along with pumpkin pie, you will have 2000 + 500 = 2500 calories total (2000 from the main course, 500 from the pie).
Store this answer in a one column, one row table called calories
.
CREATE TABLE calories as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
SELECT COUNT(*) FROM main_course as m, pies as p
WHERE m.calories + p.calories < 2500;
Use OK to test your code:
python3 ok -q meals-part2
Question 6: Self Restraint, Part III
We are mainly concerned with what meat is in our planned meal. For every type of
meat, we want to see how healthy a meal exists with this meat. Include this information
for each meat in a table healthiest_meats
.
Also, if it is possible to make ANY full meal of more than 3000 calories (even just one) using a certain type of meat, then temptation will take over. For this reason, exclude such types of meat from your table.
The healthiest_meats
table should have two columns: meat and total calories.
Each row should correspond to the caloric content of the healthiest meal involving
each type of meat (excluding meats that the above condition filters out).
Hint: You shouldn't need to do anything special to choose among several possible healthiest meals, but for completeness, choose the side with the cranberries.
CREATE TABLE healthiest_meats as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
SELECT meat, MIN(m.calories + p.calories) as calories
FROM main_course as m, pies as p
GROUP BY meat HAVING MAX(m.calories + p.calories) < 3000;
Use OK to test your code:
python3 ok -q meals-part3
Shopping Spree
Question 7: Price Check
After you are full from your Thanksgiving dinner, you realize that you still need to buy gifts for all your loved ones over the holidays. However, you also want to spend as little money as possible.
Let's start off by surveying our options. Using the products
table, write a
query that creates a table average_prices
that lists categories and the
average price of items in the category.
You should get the following output:
sqlite> SELECT * FROM average_prices;
computer|109.09
games|349.99
phone|89.99
CREATE TABLE average_prices as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
SELECT category, AVG(MSRP) FROM products GROUP BY category;
-- alternate solution
-- SELECT category, SUM(MSRP)/COUNT(*) FROM products GROUP BY category;
Use OK to test your code:
python3 ok -q cyber-monday-part1
Question 8: The Price is Right
Now, you want to figure out with stores sell each item in products for the
lowest price. Write a SQL query that uses the inventory
table to create a
table lowest_prices
that lists items, the stores that sells that item for the
lowest price, and the price that the store sells that item for.
You should expect the following output:
sqlite> SELECT * FROM lowest_prices;
GameStation|Hallmart|298.98
QBox|Targive|390.98
iBook|Targive|110.99
kBook|RestBuy|94.99
qPhone|Hallmart|85.99
rPhone|Hallmart|69.99
uPhone|RestBuy|89.99
wBook|RestBuy|114.29
CREATE TABLE lowest_prices as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
SELECT item, store, MIN(price) FROM inventory GROUP BY item;
Use OK to test your code:
python3 ok -q cyber-monday-part2
Question 9: Bang for your Buck
You want to make a shopping list by choosing the item that is the best deal possible for every category. For example, for the "phone" category, the uPhone is the best deal because the MSRP price of a uPhone divided by its ratings yields the lowest cost. That means that uPhones cost the lowest money per rating point out of all of the phones.
Write a query to create a table shopping_list
that lists the items that you
want to buy from each category.
After you've figured out which item you want to buy for each category, add another column that lists the store that sells that item for the lowest price.
Hint: You should use the lowest_prices
table you created in the previous
question.
You should expect the following output:
sqlite> SELECT * FROM shopping_list;
GameStation|Hallmart
uPhone|RestBuy
wBook|RestBuy
CREATE TABLE shopping_list as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
with shopping_list_helper (name, price) as (
SELECT name, min(MSRP/rating) FROM products GROUP BY category
)
SELECT s.name as item, l.store as store
FROM lowest_prices as l, shopping_list_helper as s
WHERE l.item = s.name;
Use OK to test your code:
python3 ok -q cyber-monday-part3
Question 10: Driving the Cyber Highways
Using the MiBs (megabytes) column from the stores
table in
data.sql
, write a query to calculate the total amount of bandwidth needed to
get everything in your shopping list.
Hint: You should use the shopping_list
table you created in the
previous question.
CREATE TABLE total_bandwidth as
-- REPLACE THIS LINE
SELECT 'YOUR CODE HERE';
SELECT SUM(s.MiBs) FROM stores as s, shopping_list as sl WHERE s.store = sl.store;
Use OK to test your code:
python3 ok -q cyber-monday-part4