Lab 13: SeQueL
Starter Files
Download lab13.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.
A 61A Thanksgiving
This lab is optional, but highly recommended for practice.
In this lab, we will move past the basics of SQL and start practicing recursive select statements and aggregation functions. Keep in mind that questions towards the end will require extensive use of both!
Some helpful notes:
- Check out Lab 12 for SQL setup and usage instructions.
- The
.tables
command can be used to get a list of all the currently loaded tables.- Likewise, the
.schema
command can be used to get all the currently loaded tables and their column names.
Recursive SQL
A table defined within a with
clause may have a single recursive case
that defines output rows in terms of previous output rows. The table is only visible
to the line immediately after and can be thought of as "helper" tables.
sqlite> with
...> fib(previous, current) as (
...> select 0, 1 union
...> select current, previous + current from fib
...> where current <= 20
...> )
...> select previous from fib;
0
1
1
2
3
5
8
13
So what is really happening here?
We first build a table called fib
that has two columns, previous
and
current
. The first row consists of 0 and 1. The second row is then created
from the first row, with the two values 1, 1 (current
and previous +
current
). The third row is then built from the second row with the current
and previous + current
so 1, 2. This continues until the condition current <=
20
evaluates to False. Note 21 is NOT included in our result table, meaning our
recursion stops at the first value where the condition is false, but does not
include it.
Also recall that you can perform string concatenation with the || operator:
sqlite> select "hello" || " " || "world"
hello world
Question 1: Going Home
It's Thanksgiving! You want to book your flight home, but there's just one problem: everybody knows that as we get closer to Thanksgiving, the prices of airplane tickets shoot up for no good reason.
Assume that on November 1st, 2nd, and 3rd, your tickets home cost $20, $30, and $40 dollars respectively. What we want to do is find the prices you'll have to pay if you wait.
You have a friend in the airline business, so you know the following information about ticket prices in November: on an day after the 3rd, the price of a ticket is equal to the average of the previous two days, which is then added to 5 times the value of the numerical date (i.e 5th, 6th, etc) mod 7 (to account for the fact that demand rises through the week).
For example, if the price of a ticket on the 14th was $10, and on the 15th it was $20, the average of those two prior days is $15. We then add that to 5*(16%7), which is 10. Thus, on the 16th tickets should be $25 (Yikes!).
You may notice that this definition is recursive, which is why we're going to solve this problem with recursive selects in SQL!
Your end goal is to return a table flight_costs
with the dates in November, from November 1st
to 25th (the day after Thanksgiving), along with the ticket price on each day:
Day | Price |
---|---|
1 | 20 |
2 | 30 |
3 | 40 |
And so on.
Note: You might notice that your output numbers are all whole numbers if you use the "/" operator to do your division (which is what we want you to do in this part). This is because SQL will only produce an integer data type when it divides two integer data types. Don't worry about that for this question: having only integer outputs is OK. Just make sure you get the recursive idea correct, and you should be fine.
create table flight_costs as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
with temp_costs(day, cur, prev) as (
select 1, 20, 0 union
select 2, 30, 20 union
select 3, 40, 30 union
select day + 1, (cur + prev)/2 + ((day+1) % 7) * 5, cur from temp_costs where day < 25 and day > 2
)
select day as day, cur as price from temp_costs;
Hint You CAN use the % operator in SQL! :)
Use OK to test your code:
python3 ok -q flights
Question 2: A Friend in Need Requires Turkey Indeed
One of your friends has made the terrible mistake of booking Thanksgiving plane tickets late! However, upon seeing how well you did on your 61A SQL lab, your friend asked you to help find the best plane tickets for flying home in this darkest of hours.
Of course, you'd like to find the cheapest flight possible, but regardless of the savings, you would also like to make sure you don't send your friend on too many flight transfers.
Therefore, to help out your friend, find the cheapest set of flights from SFO to PDX but do not include options with more than two flights! You should generate a table with the following columns:
- The set of airports that the flights pass through.
- Total cost of a set of flights.
Be sure to order your table from the cheapest to most expensive option.
All of the available flights as well as their prices can be found in the
flights
table.
You should get the following output:
sqlite> SELECT * FROM schedule;
SFO, SLC, PDX|176
SFO, LAX, PDX|186
SFO, PDX|192
create table schedule as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
with trips(path, ending, flights, cost) as (
select departure || ", " || arrival, arrival, 1, price from flights
where departure = "sfo" union
select path || ", " || arrival, arrival, flights + 1, cost + price
from trips, flights
where ending = departure and flights < 2
)
select path, cost from trips where ending = "pdx" order by cost;
Hint: As with before, you may find it helpful to create a table using recursive select. What are all the things you need to keep track of? For example, it might be helpful to save the number of flights taken on the current path.
Hint: If your table is taking a long time to generate, it might be stuck in a loop somewhere. Notice that there are some flight paths that loop, e.g SLC to LAX to SLC again. To handle this, consider when you should stop adding rows to your table.
Use OK to test your code:
python3 ok -q schedule
Question 3: Shopping Cart
Seeing as how you are now a responsible college student, it has finally fallen to you to do the Thanksgiving grocery shopping! You have been given a $60 budget, but you can't make up your mind on what to buy, so you decide to consult the all-knowing SQL.1
You have access to all the possible things you could purchase as well as their
costs in the supermarket
table. Write a SQL query that creates a table
shopping_cart
that lists of all possible ways you could fill your budget with
delicious Thanksgiving eats. The final table should have 2 columns:
- Comma separated list of items, from least to most expensive.
- Amount of your budget left over.
Finally, order your results in ascending order of leftover budget. For lists that have the same remaining budgets, order them alphabetically.
You should get the following output:
sqlite> SELECT * FROM shopping_cart LIMIT 15;
CAKE!|0
cornbread, cornbread, cornbread, cornbread, cornbread|0
cranberries, cranberries, cranberries, cornbread, cornbread, pumpkin pie|0
cranberries, cranberries, cranberries, cranberries, cornbread, tofurky|0
cranberries, cranberries, cranberries, cranberries, cranberries, potatoes, pumpkin pie|0
cranberries, cranberries, cranberries, cranberries, potatoes, potatoes, cornbread|0
cranberries, cranberries, potatoes, cornbread, cornbread, cornbread|0
potatoes, potatoes, potatoes, potatoes, potatoes, potatoes|0
potatoes, potatoes, potatoes, potatoes, tofurky|0
potatoes, potatoes, potatoes, pumpkin pie, pumpkin pie|0
potatoes, potatoes, potatoes, turkey|0
potatoes, potatoes, tofurky, tofurky|0
potatoes, pumpkin pie, pumpkin pie, tofurky|0
potatoes, tofurky, turkey|0
pumpkin pie, pumpkin pie, pumpkin pie, pumpkin pie|0
create table shopping_cart as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
with cart(list, last, budget) as (
SELECT item, price, 60 - price FROM supermarket WHERE price <= 60 UNION
SELECT list || ", " || item, price, budget - price FROM cart, supermarket
WHERE price <= budget AND price >= last
)
SELECT list, budget FROM cart ORDER BY budget, list;
1 Of course, SQL is very good at enumerating ALL possible shopping sprees. However, as you will see, they don't always make sense! As bonus challenge, try modifying your table so that you never pick more than two of any item.Hint: To order by more than one column, separate them with commas and put them after
order by
. Keep in mind thatorder by col_a, col_b
will order bycol_a
and thencol_b
.
Use OK to test your code:
python3 ok -q shopping-cart
SQL Aggregation
Previously, we have been dealing with queries that process one row at a time.
When we join, we make pairwise combinations of all of the rows. When we use
where
, we filter out certain rows based on the condition. Alternatively,
applying an aggregate function
such as max(column)
combines the values in multiple rows.
By default, we combine the values of the entire table. For example, if we
wanted to count the number of flights from our flights
table, we could use:
sqlite> select count(*) from flights;
13
What if we wanted to group together the values in similar rows and perform the
aggregation operations within those groups? We use a group by
clause.
Here's another example. For each unique departure, collect all the rows having
the same departure airport into a group. Then, select the price
column and
apply the min
aggregation to recover the price of the cheapest departure from
that group. The end result is a table of departure airports and the cheapest
departing flight.
sqlite> select departure, min(price) from flights group by departure;
AUH|932
LAS|50
LAX|89
SEA|32
SFO|40
SLC|42
Just like how we can filter out rows with where
, we can also filter out
groups with having
. Important: A having
clause should use an aggregate
function. Suppose we want to see all airports with at least two departures:
sqlite> select departure from flights group by departure having count(*) >= 2;
LAX
SFO
SLC
Note that the count(*)
aggregate just counts the number of rows in each group.
Say we want to count the number of distinct airports instead. Then, we could
use the following query:
sqlite> select count(distinct departure) from flights;
6
This enumerates all the different departure airports available in our flights
table (in this case: SFO, LAX, AUH, SLC, SEA, and LAS).
Self Restraint
Tragically, many people find themselves overeating during Thanksgiving. With your new knowledge from 61A, you resolve to use SQL and plan a healthier meal!
You are given a table main_course
where each row corresponds to a possible
Thanksgiving meal with two components: the meat and the side dish (in an amazing
display of restraint, you are limiting yourself to just one side dish). You are
also given a second table pies
containing different types of pies as well as
their caloric content. The idea is that you will pair the two items consisting
of your main course (a row of the table main_course
) with a pie that you will
have for dessert. Use SQL's aggregation features to answer the following
questions.
Question 4: Self Restraint, Part I
For this first part, we want to know how many selections of meats we have for our meal. Use a select statement to see how many different types of meats we have in our list of main courses. (We would like to point out that the 61A staff is inclusive, and we have included tofurky as the "meat" in some meals.)
Store this answer in a one column, one row table called number_of_options
.
create table number_of_options as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select count(distinct meat) from main_course;
Use OK to test your code:
python3 ok -q meals-part1
Question 5: Self Restraint, Part II
Use aggregation in a select statement to count the number of "full" meals (i.e main course plus a pie) we can make with under 2500 calories total. For example, if you have turkey and cranberries along with pumpkin pie, you will have 2000 + 500 = 2500 calories total (2000 from the main course, 500 from the pie).
Store this answer in a one column, one row table called calories
.
create table calories as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select count(*) from main_course as m, pies as p
where m.calories + p.calories < 2500;
Use OK to test your code:
python3 ok -q meals-part2
Question 6: Self Restraint, Part III
We are mainly concerned with what meat is in our planned meal.
For every type of meat, we want to see how healthy a meal exists with this meat
(healthier means fewer calories for our purposes). Include this information for
each meat in a table healthiest_meats
.
Also, if it is possible to make ANY full meal of more than 3000 calories (even just one) using a certain type of meat, then temptation will take over. For this reason, exclude such types of meat from your table.
In summary:
- The
healthiest_meats
table should have two columns: meat and total calories. - Each row should have a meat and the number of calories of the full meal with the least calories (full meals include pie) involving that type of meat. Finally, exclude exclude meats that are in any full meals of more than 3000 calories.
create table healthiest_meats as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select meat, min(m.calories + p.calories) as calories
from main_course as m, pies as p
group by meat having max(m.calories + p.calories) < 3000;
Use OK to test your code:
python3 ok -q meals-part3
Shopping Spree
Question 7: Price Check
After you are full from your Thanksgiving dinner, you realize that you still need to buy gifts for all your loved ones over the holidays. However, you also want to spend as little money as possible (you're not cheap, just looking for a great sale!).
Let's start off by surveying our options. Using the products
table, write a
query that creates a table average_prices
that lists categories and the
average price of items in the category.
You should get the following output:
sqlite> SELECT * FROM average_prices;
computer|109.09
games|349.99
phone|89.99
create table average_prices as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select category, avg(msrp) from products group by category;
-- alternate solution
-- select category, sum(msrp)/count(*) from products group by category;
Use OK to test your code:
python3 ok -q cyber-monday-part1
Question 8: The Price is Right
Now, you want to figure out with stores sell each item in products for the
lowest price. Write a SQL query that uses the inventory
table to create a
table lowest_prices
that lists items, the stores that sells that item for the
lowest price, and the price that the store sells that item for.
You should expect the following output:
sqlite> SELECT * FROM lowest_prices;
GameStation|Hallmart|298.98
QBox|Targive|390.98
iBook|Targive|110.99
kBook|RestBuy|94.99
qPhone|Hallmart|85.99
rPhone|Hallmart|69.99
uPhone|RestBuy|89.99
wBook|RestBuy|114.29
create table lowest_prices as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select name, store, price from inventory, products where name = item
group by name having min(price);
Use OK to test your code:
python3 ok -q cyber-monday-part2
Question 9: Bang for your Buck
You want to make a shopping list by choosing the item that is the best deal possible for every category. For example, for the "phone" category, the uPhone is the best deal because the MSRP price of a uPhone divided by its ratings yields the lowest cost. That means that uPhones cost the lowest money per rating point out of all of the phones.
Write a query to create a table shopping_list
that lists the items that you
want to buy from each category.
After you've figured out which item you want to buy for each category, add another column that lists the store that sells that item for the lowest price.
You should expect the following output:
sqlite> SELECT * FROM shopping_list;
GameStation|Hallmart
uPhone|RestBuy
wBook|RestBuy
create table shopping_list as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select l.name, store from products as p, lowest_prices as l
where l.name = p.name
group by category having min(MSRP/rating);
Hint: You should use the
lowest_prices
table you created in the previous question.
Use OK to test your code:
python3 ok -q cyber-monday-part3
Question 10: Driving the Cyber Highways
Using the Mb (megabits) column from the stores
table in
data.sql
, write a query to calculate the total amount of bandwidth needed to
get everything in your shopping list.
create table total_bandwidth as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select sum(s.mbs) from stores as s, shopping_list as sl where s.store = sl.store;
Hint: You should use the
shopping_list
table you created in the previous question.
Use OK to test your code:
python3 ok -q cyber-monday-part4