## Starter Files

Download lab13.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.

## A 61A Thanksgiving

This lab is optional, but highly recommended for practice.

In this lab, we will move past the basics of SQL and start practicing recursive select statements and aggregation functions. Keep in mind that questions towards the end will require extensive use of both!

## Recursive SQL

A table defined within a `with` clause may have a single recursive case that defines output rows in terms of other output rows.

``````sqlite> with
...>   fib(previous, current) as (
...>     SELECT 0, 1 UNION
...>     SELECT current, previous + current FROM fib
...>       WHERE current <= 20
...>   )
...> SELECT previous FROM fib;
0
1
1
2
3
5
8
13``````

Also recall that you can perform string concatenation with the || operator:

``````sqlite> select "hello" || " " || "world"
hello world``````

### Question 1: Going Home

It's Thanksgiving! You want to book your flight home, but there's just one problem: everybody knows that as we get closer to Thanksgiving, the prices of airplane tickets shoot up for no good reason.

Assume that on November 1st, 2nd, and 3rd, your tickets home cost \$20, \$30, and \$40 dollars respectively. What we want to do is find the prices you'll have to pay if you wait.

You have a friend in the airline business, so you know the following information about ticket prices in November: on an day after the 3rd, the price of a ticket is equal to the average of the previous two days, which is then added to 5 times the value of the numerical date (i.e 5th, 6th, etc) mod 7 (to account for the fact that demand rises through the week).

For example, consider the price of tickets on November 16. If the price of a ticket on the 14th was \$10 (this is not true in our model), and on the 15th it was \$20, the average of those two prior days is \$15. We then add that to 5*(16%7), which is 10. Thus, on the 16th tickets should be \$25 (Yikes!).

You may notice that this definition is recursive, which is why we're going to solve this problem with recursive selects in SQL!

Your end goal is to return a table `flight_costs` with the dates in November, from November 1st to 25th (the day before Thanksgiving), along with the ticket price on each day:

Day Price
1 20
2 30
3 40

And so on.

Hint: You CAN use the % operator in SQL! :)

Hint: You might notice that your output numbers are all whole numbers if you use the "/" operator to do your division (which is what we want you to do in this part). This is because SQL will only produce an integer data type when it divides two integer data types. Don't worry about that for this question: having only integer outputs is OK. Just make sure you get the recursive idea correct, and you should be fine.

``````CREATE TABLE flight_costs as
-- REPLACE THIS LINE
with flights(day, cur, prev, prev2) as (
SELECT 1, 20, 0, 0 UNION
SELECT 2, 30, 20, 0 UNION
SELECT 3, 40, 30, 20 UNION
SELECT day + 1, (cur + prev)/2 + (5 * ((day+1) % 7)), cur, prev FROM flights
WHERE day >= 3 and day < 25
)
SELECT day as Day, cur as Price FROM flights;``````

Use OK to test your code:

``python3 ok -q flights``

### Question 2: A Friend in Need Requires Turkey Indeed

One of your friends has made the terrible mistake of booking Thanksgiving plane tickets late! However, upon seeing how well you did on your 61A SQL lab, your friend asked you to help find the best plane tickets for flying home in this darkest of hours.

Of course, you'd like to find the cheapest flight possible, but regardless of the savings, you would also like to make sure you don't send your friend on too many flight transfers.

Therefore, to help your friend out, find the cheapest set of flights from SFO to PDX but do not include options with more than two flights! You should generate a table with the following columns:

• The set of airports that the flights pass through.
• Total cost of a set of flights.

Be sure to order your table from the cheapest to most expensive option.

All of the available flights as well as their prices can be found in the `flights` table.

You should get the following output:

``````sqlite> SELECT * FROM schedule;
SFO, SLC, PDX|176
SFO, LAX, PDX|186
SFO, PDX|192``````

Hint: As with before, you may find it helpful to create a table using recursive select. What are all the things you need to keep track of? For example, it might be helpful to save the number of flights taken on the current path.

Hint: If your table is taking a long time to generate, it might be stuck in a loop somewhere. Notice that there are some flight paths that loop, e.g SLC to LAX to SLC again. To handle this, consider when you should stop adding rows to your table.

``````CREATE TABLE schedule as
-- REPLACE THIS LINE
with trips(path, ending, flights, cost) as (
SELECT departure || ", " || arrival, arrival, 1, price FROM flights
WHERE departure = "SFO" UNION
SELECT path || ", " || arrival, arrival, flights + 1, cost + price
FROM trips, flights
WHERE ending = departure AND flights < 2
)
SELECT path, cost FROM trips WHERE ending = "PDX" ORDER BY cost;``````

Use OK to test your code:

``python3 ok -q schedule``

### Question 3: Shopping Cart

Seeing as how you are now a responsible college student, it has finally fallen to you to do the Thanksgiving grocery shopping! You have been given a \$60 budget, but you can't make up your mind on what to buy, so you decide to consult the all-knowing SQL.*

You have access to all the possible things you could purchase as well as their costs in the `supermarket` table. Write a SQL query that creates a table `shopping_cart` that lists of all possible ways you could fill your budget with delicious Thanksgiving eats. The final table should have 2 columns:

• Comma separated list of items, from least to most expensive.
• Amount of your budget left over.

Finally, order your results in ascending order of leftover budget. For lists that have the same remaining budgets, order them alphabetically.

You should get the following output:

``````sqlite> SELECT * FROM shopping_cart LIMIT 15;
CAKE!|0
cranberries, cranberries, cranberries, cranberries, cornbread, tofurky|0
cranberries, cranberries, cranberries, cranberries, cranberries, potatoes, pumpkin pie|0
cranberries, cranberries, cranberries, cranberries, potatoes, potatoes, cornbread|0
potatoes, potatoes, potatoes, potatoes, potatoes, potatoes|0
potatoes, potatoes, potatoes, potatoes, tofurky|0
potatoes, potatoes, potatoes, pumpkin pie, pumpkin pie|0
potatoes, potatoes, potatoes, turkey|0
potatoes, potatoes, tofurky, tofurky|0
potatoes, pumpkin pie, pumpkin pie, tofurky|0
potatoes, tofurky, turkey|0
pumpkin pie, pumpkin pie, pumpkin pie, pumpkin pie|0``````

Hint: To order by more than one column, separate them with commas and put them after `ORDER BY`. Keep in mind that `ORDER BY col_a, col_b` will order by `col_a` and then `col_b`.

``````CREATE TABLE shopping_cart as
-- REPLACE THIS LINE
with cart(list, last, budget) as (
SELECT item, price, 60 - price FROM supermarket WHERE price <= 60 UNION
SELECT list || ", " || item, price, budget - price FROM cart, supermarket
WHERE price <= budget AND price >= last
)
SELECT list, budget FROM cart ORDER BY budget, list;``````

* Of course, SQL is very good at enumerating ALL possible shopping sprees. However, as you will see, they don't always make sense! As bonus challenge, try modifying your table so that you never pick more than two of any item.

Use OK to test your code:

``python3 ok -q shopping-cart``

## SQL Aggregation

Previously, we have been dealing with queries that process one row at a time. When we join, we make pairwise combinations of all of the rows. When we use `WHERE`, we filter out certain rows based on the condition. Alternatively, applying an aggregate function such as `MAX(column)` combines the values in multiple rows.

By default, all rows are combined together. What if we wanted to group together the values in similar rows and perform the aggregation operations within those groups? We use a `GROUP BY` clause.

Here's an example using our `flights` table. For each unique departure, collect all the rows having the same departure airport into a group. Then, select the `price` column and apply the `MIN` aggregation to recover the price of the cheapest departure from that group. The end result is a table of departure airports and the cheapest departing flight.

``````sqlite> SELECT departure, MIN(price) FROM flights GROUP BY departure;
AUH|932
LAS|50
LAX|89
SEA|32
SFO|40
SLC|42``````

Just like how we can filter out rows with `WHERE`, we can also filter out groups with `HAVING`. Important: A `HAVING` clause should use an aggregate function. Suppose we want to see all airports with at least two departures:

``````sqlite> SELECT departure FROM flights GROUP BY departure HAVING COUNT(*) >= 2;
LAX
SFO
SLC``````

Note that the `COUNT(*)` aggregate just counts the number of rows in each group. Say we want to count the number of distinct airports instead. Then, we could use the following query:

``````sqlite> SELECT COUNT(DISTINCT departure) FROM flights;
6``````

This enumerates all the different departure airports available in our `flights` table (in this case: SFO, LAX, AUH, SLC, SEA, and LAS).

### Self Restraint

Tragically, many people find themselves overeating during Thanksgiving. With your new knowledge from 61A, you resolve to use SQL and plan a healthier meal!

You are given a table `main_course` where each row corresponds to a possible Thanksgiving meal with two components: the meat and the side dish (in an amazing display of restraint, you are limiting yourself to just one side dish). You are also given a second table `pies` containing different types of pies as well as their caloric content. The idea is that you will pair the two items consisting of your main course (a row of the table `main_course`) with a pie that you will have for dessert. Use SQL's aggregation features to answer the following questions.

### Question 4: Self Restraint, Part I

For this first part, we want to know how many selections of meats we have for our meal. Use a select statement to see how many different types of meats we have in our list of main courses. (We would like to point out that the 61A staff is inclusive, and we have included tofurky as the "meat" in some meals.)

Store this answer in a one column, one row table called `number_of_options`.

``````CREATE TABLE number_of_options as
-- REPLACE THIS LINE
SELECT COUNT(DISTINCT meat) from main_course;``````

Use OK to test your code:

``python3 ok -q meals-part1``

### Question 5: Self Restraint, Part II

Use aggregation in a select statement to count the number of "full" meals (i.e main course plus a pie) we can make with under 2500 calories total. For example, if you have turkey and cranberries along with pumpkin pie, you will have 2000 + 500 = 2500 calories total (2000 from the main course, 500 from the pie).

Store this answer in a one column, one row table called `calories`.

``````CREATE TABLE calories as
-- REPLACE THIS LINE
SELECT COUNT(*) FROM main_course as m, pies as p
WHERE m.calories + p.calories < 2500;``````

Use OK to test your code:

``python3 ok -q meals-part2``

### Question 6: Self Restraint, Part III

We are mainly concerned with what meat is in our planned meal. For every type of meat, we want to see how healthy a meal exists with this meat. Include this information for each meat in a table `healthiest_meats`.

Also, if it is possible to make ANY full meal of more than 3000 calories (even just one) using a certain type of meat, then temptation will take over. For this reason, exclude such types of meat from your table.

The `healthiest_meats` table should have two columns: meat and total calories. Each row should correspond to the caloric content of the healthiest meal involving each type of meat (excluding meats that the above condition filters out).

Hint: You shouldn't need to do anything special to choose among several possible healthiest meals, but for completeness, choose the side with the cranberries.

``````CREATE TABLE healthiest_meats as
-- REPLACE THIS LINE
SELECT meat, MIN(m.calories + p.calories) as calories
FROM main_course as m, pies as p
GROUP BY meat HAVING MAX(m.calories + p.calories) < 3000;``````

Use OK to test your code:

``python3 ok -q meals-part3``

### Question 7: Price Check

After you are full from your Thanksgiving dinner, you realize that you still need to buy gifts for all your loved ones over the holidays. However, you also want to spend as little money as possible.

Let's start off by surveying our options. Using the `products` table, write a query that creates a table `average_prices` that lists categories and the average price of items in the category.

You should get the following output:

``````sqlite> SELECT * FROM average_prices;
computer|109.09
games|349.99
phone|89.99``````
``````CREATE TABLE average_prices as
-- REPLACE THIS LINE
SELECT category, AVG(MSRP) FROM products GROUP BY category;
-- alternate solution
-- SELECT category, SUM(MSRP)/COUNT(*) FROM products GROUP BY category;``````

Use OK to test your code:

``python3 ok -q cyber-monday-part1``

### Question 8: The Price is Right

Now, you want to figure out with stores sell each item in products for the lowest price. Write a SQL query that uses the `inventory` table to create a table `lowest_prices` that lists items, the stores that sells that item for the lowest price, and the price that the store sells that item for.

You should expect the following output:

``````sqlite> SELECT * FROM lowest_prices;
GameStation|Hallmart|298.98
QBox|Targive|390.98
iBook|Targive|110.99
qPhone|Hallmart|85.99
rPhone|Hallmart|69.99
``````CREATE TABLE lowest_prices as
-- REPLACE THIS LINE
SELECT item, store, MIN(price) FROM inventory GROUP BY item;``````

Use OK to test your code:

``python3 ok -q cyber-monday-part2``

### Question 9: Bang for your Buck

You want to make a shopping list by choosing the item that is the best deal possible for every category. For example, for the "phone" category, the uPhone is the best deal because the MSRP price of a uPhone divided by its ratings yields the lowest cost. That means that uPhones cost the lowest money per rating point out of all of the phones.

Write a query to create a table `shopping_list` that lists the items that you want to buy from each category.

After you've figured out which item you want to buy for each category, add another column that lists the store that sells that item for the lowest price.

Hint: You should use the `lowest_prices` table you created in the previous question.

You should expect the following output:

``````sqlite> SELECT * FROM shopping_list;
GameStation|Hallmart
``````CREATE TABLE shopping_list as
-- REPLACE THIS LINE
with shopping_list_helper (name, price) as (
SELECT name, min(MSRP/rating) FROM products GROUP BY category
)
SELECT s.name as item, l.store as store
FROM lowest_prices as l, shopping_list_helper as s
WHERE l.item = s.name;``````

Use OK to test your code:

``python3 ok -q cyber-monday-part3``

### Question 10: Driving the Cyber Highways

Using the MiBs (megabytes) column from the `stores` table in `data.sql`, write a query to calculate the total amount of bandwidth needed to get everything in your shopping list.

Hint: You should use the `shopping_list` table you created in the previous question.

``````CREATE TABLE total_bandwidth as
-- REPLACE THIS LINE
SELECT SUM(s.MiBs) FROM stores as s, shopping_list as sl WHERE s.store = sl.store;``````

Use OK to test your code:

``python3 ok -q cyber-monday-part4``