Starter Files

Download lab13.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.

A 61A Thanksgiving

This lab is optional, but highly recommended for practice.

In this lab, we will move past the basics of SQL and start practicing recursive select statements and aggregation functions. Keep in mind that questions towards the end will require extensive use of both!

Some helpful notes:

  • Check out Lab 12 for SQL setup and usage instructions.
  • The .tables command can be used to get a list of all the currently loaded tables.
  • Likewise, the .schema command can be used to get all the currently loaded tables and their column names.

Recursive SQL

A table defined within a with clause may have a single recursive case that defines output rows in terms of previous output rows. The table is only visible to the line immediately after and can be thought of as "helper" tables.

sqlite> with
   ...>   fib(previous, current) as (
   ...>     select 0, 1 union
   ...>     select current, previous + current from fib
   ...>       where current <= 20
   ...>   )
   ...> select previous from fib;
0
1
1
2
3
5
8
13

So what is really happening here?

We first build a table called fib that has two columns, previous and current. The first row consists of 0 and 1. The second row is then created from the first row, with the two values 1, 1 (current and previous + current). The third row is then built from the second row with the current and previous + current so 1, 2. This continues until the condition current <= 20 evaluates to False. Note 21 is NOT included in our result table, meaning our recursion stops at the first value where the condition is false, but does not include it.

Also recall that you can perform string concatenation with the || operator:

sqlite> select "hello" || " " || "world"
hello world

Question 1: Going Home

It's Thanksgiving! You want to book your flight home, but there's just one problem: everybody knows that as we get closer to Thanksgiving, the prices of airplane tickets shoot up for no good reason.

Assume that on November 1st, 2nd, and 3rd, your tickets home cost $20, $30, and $40 dollars respectively. What we want to do is find the prices you'll have to pay if you wait.

You have a friend in the airline business, so you know the following information about ticket prices in November: on an day after the 3rd, the price of a ticket is equal to the average of the previous two days, which is then added to 5 times the value of the numerical date (i.e 5th, 6th, etc) mod 7 (to account for the fact that demand rises through the week).

For example, if the price of a ticket on the 14th was $10, and on the 15th it was $20, the average of those two prior days is $15. We then add that to 5*(16%7), which is 10. Thus, on the 16th tickets should be $25 (Yikes!).

You may notice that this definition is recursive, which is why we're going to solve this problem with recursive selects in SQL!

Your end goal is to return a table flight_costs with the dates in November, from November 1st to 25th (the day after Thanksgiving), along with the ticket price on each day:

Day Price
1 20
2 30
3 40

And so on.

Note: You might notice that your output numbers are all whole numbers if you use the "/" operator to do your division (which is what we want you to do in this part). This is because SQL will only produce an integer data type when it divides two integer data types. Don't worry about that for this question: having only integer outputs is OK. Just make sure you get the recursive idea correct, and you should be fine.

create table flight_costs as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
with temp_costs(day, cur, prev) as ( select 1, 20, 0 union select 2, 30, 20 union select 3, 40, 30 union select day + 1, (cur + prev)/2 + ((day+1) % 7) * 5, cur from temp_costs where day < 25 and day > 2 ) select day as day, cur as price from temp_costs;

Hint You CAN use the % operator in SQL! :)

Use OK to test your code:

python3 ok -q flights

Question 2: A Friend in Need Requires Turkey Indeed

One of your friends has made the terrible mistake of booking Thanksgiving plane tickets late! However, upon seeing how well you did on your 61A SQL lab, your friend asked you to help find the best plane tickets for flying home in this darkest of hours.

Of course, you'd like to find the cheapest flight possible, but regardless of the savings, you would also like to make sure you don't send your friend on too many flight transfers.

Therefore, to help out your friend, find the cheapest set of flights from SFO to PDX but do not include options with more than two flights! You should generate a table with the following columns:

  • The set of airports that the flights pass through.
  • Total cost of a set of flights.

Be sure to order your table from the cheapest to most expensive option.

All of the available flights as well as their prices can be found in the flights table.

You should get the following output:

sqlite> SELECT * FROM schedule;
SFO, SLC, PDX|176
SFO, LAX, PDX|186
SFO, PDX|192
create table schedule as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
with trips(path, ending, flights, cost) as ( select departure || ", " || arrival, arrival, 1, price from flights where departure = "sfo" union select path || ", " || arrival, arrival, flights + 1, cost + price from trips, flights where ending = departure and flights < 2 ) select path, cost from trips where ending = "pdx" order by cost;

Hint: As with before, you may find it helpful to create a table using recursive select. What are all the things you need to keep track of? For example, it might be helpful to save the number of flights taken on the current path.

Hint: If your table is taking a long time to generate, it might be stuck in a loop somewhere. Notice that there are some flight paths that loop, e.g SLC to LAX to SLC again. To handle this, consider when you should stop adding rows to your table.

Use OK to test your code:

python3 ok -q schedule

Question 3: Shopping Cart

Seeing as how you are now a responsible college student, it has finally fallen to you to do the Thanksgiving grocery shopping! You have been given a $60 budget, but you can't make up your mind on what to buy, so you decide to consult the all-knowing SQL.1

You have access to all the possible things you could purchase as well as their costs in the supermarket table. Write a SQL query that creates a table shopping_cart that lists of all possible ways you could fill your budget with delicious Thanksgiving eats. The final table should have 2 columns:

  • Comma separated list of items, from least to most expensive.
  • Amount of your budget left over.

Finally, order your results in ascending order of leftover budget. For lists that have the same remaining budgets, order them alphabetically.

You should get the following output:

sqlite> SELECT * FROM shopping_cart LIMIT 15;
CAKE!|0
cornbread, cornbread, cornbread, cornbread, cornbread|0
cranberries, cranberries, cranberries, cornbread, cornbread, pumpkin pie|0
cranberries, cranberries, cranberries, cranberries, cornbread, tofurky|0
cranberries, cranberries, cranberries, cranberries, cranberries, potatoes, pumpkin pie|0
cranberries, cranberries, cranberries, cranberries, potatoes, potatoes, cornbread|0
cranberries, cranberries, potatoes, cornbread, cornbread, cornbread|0
potatoes, potatoes, potatoes, potatoes, potatoes, potatoes|0
potatoes, potatoes, potatoes, potatoes, tofurky|0
potatoes, potatoes, potatoes, pumpkin pie, pumpkin pie|0
potatoes, potatoes, potatoes, turkey|0
potatoes, potatoes, tofurky, tofurky|0
potatoes, pumpkin pie, pumpkin pie, tofurky|0
potatoes, tofurky, turkey|0
pumpkin pie, pumpkin pie, pumpkin pie, pumpkin pie|0
create table shopping_cart as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
with cart(list, last, budget) as ( SELECT item, price, 60 - price FROM supermarket WHERE price <= 60 UNION SELECT list || ", " || item, price, budget - price FROM cart, supermarket WHERE price <= budget AND price >= last ) SELECT list, budget FROM cart ORDER BY budget, list;

Hint: To order by more than one column, separate them with commas and put them after order by. Keep in mind that order by col_a, col_b will order by col_a and then col_b.

1 Of course, SQL is very good at enumerating ALL possible shopping sprees. However, as you will see, they don't always make sense! As bonus challenge, try modifying your table so that you never pick more than two of any item.

Use OK to test your code:

python3 ok -q shopping-cart

SQL Aggregation

Previously, we have been dealing with queries that process one row at a time. When we join, we make pairwise combinations of all of the rows. When we use where, we filter out certain rows based on the condition. Alternatively, applying an aggregate function such as max(column) combines the values in multiple rows.

By default, we combine the values of the entire table. For example, if we wanted to count the number of flights from our flights table, we could use:

sqlite> select count(*) from flights;
13

What if we wanted to group together the values in similar rows and perform the aggregation operations within those groups? We use a group by clause.

Here's another example. For each unique departure, collect all the rows having the same departure airport into a group. Then, select the price column and apply the min aggregation to recover the price of the cheapest departure from that group. The end result is a table of departure airports and the cheapest departing flight.

sqlite> select departure, min(price) from flights group by departure;
AUH|932
LAS|50
LAX|89
SEA|32
SFO|40
SLC|42

Just like how we can filter out rows with where, we can also filter out groups with having. Important: A having clause should use an aggregate function. Suppose we want to see all airports with at least two departures:

sqlite> select departure from flights group by departure having count(*) >= 2;
LAX
SFO
SLC

Note that the count(*) aggregate just counts the number of rows in each group. Say we want to count the number of distinct airports instead. Then, we could use the following query:

sqlite> select count(distinct departure) from flights;
6

This enumerates all the different departure airports available in our flights table (in this case: SFO, LAX, AUH, SLC, SEA, and LAS).

Self Restraint

Tragically, many people find themselves overeating during Thanksgiving. With your new knowledge from 61A, you resolve to use SQL and plan a healthier meal!

You are given a table main_course where each row corresponds to a possible Thanksgiving meal with two components: the meat and the side dish (in an amazing display of restraint, you are limiting yourself to just one side dish). You are also given a second table pies containing different types of pies as well as their caloric content. The idea is that you will pair the two items consisting of your main course (a row of the table main_course) with a pie that you will have for dessert. Use SQL's aggregation features to answer the following questions.

Question 4: Self Restraint, Part I

For this first part, we want to know how many selections of meats we have for our meal. Use a select statement to see how many different types of meats we have in our list of main courses. (We would like to point out that the 61A staff is inclusive, and we have included tofurky as the "meat" in some meals.)

Store this answer in a one column, one row table called number_of_options.

create table number_of_options as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select count(distinct meat) from main_course;

Use OK to test your code:

python3 ok -q meals-part1

Question 5: Self Restraint, Part II

Use aggregation in a select statement to count the number of "full" meals (i.e main course plus a pie) we can make with under 2500 calories total. For example, if you have turkey and cranberries along with pumpkin pie, you will have 2000 + 500 = 2500 calories total (2000 from the main course, 500 from the pie).

Store this answer in a one column, one row table called calories.

create table calories as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select count(*) from main_course as m, pies as p where m.calories + p.calories < 2500;

Use OK to test your code:

python3 ok -q meals-part2

Question 6: Self Restraint, Part III

We are mainly concerned with what meat is in our planned meal. For every type of meat, we want to see how healthy a meal exists with this meat (healthier means fewer calories for our purposes). Include this information for each meat in a table healthiest_meats.

Also, if it is possible to make ANY full meal of more than 3000 calories (even just one) using a certain type of meat, then temptation will take over. For this reason, exclude such types of meat from your table.

In summary:

  • The healthiest_meats table should have two columns: meat and total calories.
  • Each row should have a meat and the number of calories of the full meal with the least calories (full meals include pie) involving that type of meat. Finally, exclude exclude meats that are in any full meals of more than 3000 calories.
create table healthiest_meats as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select meat, min(m.calories + p.calories) as calories from main_course as m, pies as p group by meat having max(m.calories + p.calories) < 3000;

Use OK to test your code:

python3 ok -q meals-part3

Shopping Spree

Question 7: Price Check

After you are full from your Thanksgiving dinner, you realize that you still need to buy gifts for all your loved ones over the holidays. However, you also want to spend as little money as possible (you're not cheap, just looking for a great sale!).

Let's start off by surveying our options. Using the products table, write a query that creates a table average_prices that lists categories and the average price of items in the category.

You should get the following output:

sqlite> SELECT * FROM average_prices;
computer|109.09
games|349.99
phone|89.99
create table average_prices as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select category, avg(msrp) from products group by category; -- alternate solution -- select category, sum(msrp)/count(*) from products group by category;

Use OK to test your code:

python3 ok -q cyber-monday-part1

Question 8: The Price is Right

Now, you want to figure out with stores sell each item in products for the lowest price. Write a SQL query that uses the inventory table to create a table lowest_prices that lists items, the stores that sells that item for the lowest price, and the price that the store sells that item for.

You should expect the following output:

sqlite> SELECT * FROM lowest_prices;
GameStation|Hallmart|298.98
QBox|Targive|390.98
iBook|Targive|110.99
kBook|RestBuy|94.99
qPhone|Hallmart|85.99
rPhone|Hallmart|69.99
uPhone|RestBuy|89.99
wBook|RestBuy|114.29
create table lowest_prices as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select name, store, price from inventory, products where name = item group by name having min(price);

Use OK to test your code:

python3 ok -q cyber-monday-part2

Question 9: Bang for your Buck

You want to make a shopping list by choosing the item that is the best deal possible for every category. For example, for the "phone" category, the uPhone is the best deal because the MSRP price of a uPhone divided by its ratings yields the lowest cost. That means that uPhones cost the lowest money per rating point out of all of the phones.

Write a query to create a table shopping_list that lists the items that you want to buy from each category.

After you've figured out which item you want to buy for each category, add another column that lists the store that sells that item for the lowest price.

You should expect the following output:

sqlite> SELECT * FROM shopping_list;
GameStation|Hallmart
uPhone|RestBuy
wBook|RestBuy
create table shopping_list as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select l.name, store from products as p, lowest_prices as l where l.name = p.name group by category having min(MSRP/rating);

Hint: You should use the lowest_prices table you created in the previous question.

Use OK to test your code:

python3 ok -q cyber-monday-part3

Question 10: Driving the Cyber Highways

Using the Mb (megabits) column from the stores table in data.sql, write a query to calculate the total amount of bandwidth needed to get everything in your shopping list.

create table total_bandwidth as
select "REPLACE THIS LINE WITH YOUR SOLUTION";
select sum(s.mbs) from stores as s, shopping_list as sl where s.store = sl.store;

Hint: You should use the shopping_list table you created in the previous question.

Use OK to test your code:

python3 ok -q cyber-monday-part4