Due at 11:59pm on 4/22/2016.

Starter Files

Download lab12.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.

Submission

By the end of this lab, you should have submitted the lab with python3 ok --submit. You may submit more than once before the deadline; only the final submission will be graded.

  • To receive credit for this lab, you must complete Questions 2-7 in lab12.sql, and submit through OK.
  • Question 8 is considered extra practice. It can be found in the lab12_extra.sql file. It is recommended that you complete it on your own time.

Setup

The simplest way to start using SQLite is to download a precompiled binary from the SQLite website. The latest version of SQLite at the time of writing is 3.12.1, but you can check for additional updates on the website.

Windows

  1. Visit the download page linked above and navigate to the section Precompiled Binaries for Windows. Click on the link sqlite-tools-win32-x86-*.zip to download the binary.
  2. Unzip the file. There should be a sqlite3.exe file in the directory after extraction.
  3. Navigate to the folder containing the sqlite3.exe file and check that the version is at least 3.8.3:

    $ cd path/to/sqlite
    $ ./sqlite3 --version
    3.12.1 2016-04-08 15:09:49 fe7d3b75fe1bde41511b323925af8ae1b910bc4d

Mac OS X Yosemite (10.10) or El Capitan (10.11)

SQLite comes pre-installed. Check that you have a version that's greater than 3.8.3:

    $ sqlite3
    SQLite version 3.8.10.2

Mac OS X Mavericks (10.9) or older

SQLite comes pre-installed, but it is the wrong version.

  1. Visit the download page linked above and navigate to the section Precompiled Binaries for Mac OS X (x86). Click on the link sqlite-tools-osx-x86-*.zip to download the binary.
  2. Unzip the file. There should be a sqlite3 file in the directory after extraction.
  3. Navigate to the folder containing the sqlite3 file and check that the version is at least 3.8.3:

    $ cd path/to/sqlite
    $ ./sqlite3 --version
    3.12.1 2016-04-08 15:09:49 fe7d3b75fe1bde41511b323925af8ae1b910bc4d

Ubuntu

The easiest way to use SQLite on Ubuntu is to install it straight from the native repositories (the version will be slightly behind the most recent release):

$ sudo apt-get install sqlite3
$ sqlite3 --version
3.8.6 2014-08-15 11:46:33 9491ba7d738528f168657adb43a198238abde19e

Usage

Note: If you downloaded a precompiled binary above, make sure that sqlite3.exe file is in the same directory as your .sql file. (Extract and move it out from the zip file you downloaded.)

After writing your code in the .sql file, you can test and verify your output in Terminal or Git Bash with one of the two following commands.

1.) Runs your code and then exits SQLite immediately afterwards.

  • Ubuntu / Mac OS X (Yosemite or newer)

    sqlite3 < lab12.sql
  • Windows / Mac OS X (Mavericks or older)

    ./sqlite3 < lab12.sql

2.) Runs your code and keeps SQLite open for further commands, which is similar to running Python code with the interactive -i flag. You can type .help to see some of the commands you can run.

  • Ubuntu / Mac OS X (Yosemite or newer)

    sqlite3 --init lab12.sql
  • Windows / Mac OS X (Mavericks or older)

    ./sqlite3 --init lab12.sql

To exit out of SQLite after using the second command, type .exit or .quit or you can hit Ctrl-C. Also if you see ...> you probably forgot a ;

SQL Basics

Creating Tables

You can create SQL tables either from scratch or from existing tables.

Below creates the table from scratch, without referencing any other existing tables.

CREATE TABLE [table_name] AS
  SELECT [val1] AS [column1], [val2] AS [column2], ... UNION
  SELECT [val3]             , [val4]             , ... UNION
  SELECT [val5]             , [val6]             , ...;

Note: You do not need to repeat the AS keyword in subsequent SELECT statements when creating the table.

Here is an example where we construct a table with the CREATE TABLE statement. UNION is used here to join rows and AS assigns a table column to a new name.

CREATE TABLE Football AS
  SELECT 30 AS Berkeley, 7 AS Stanford, 2002 AS Year UNION
  SELECT 28,             16,            2003         UNION
  SELECT 17,             38,            2014;

football

Here we have created a table called Football, which has three attributes (columns): Berkeley, Stanford, and Year. We can later access the values from this table by referencing the table's columns.

To create tables from existing tables, the SELECT command references another table.

Selecting From Tables

More commonly, we will create new tables by selecting specific columns that we want from existing tables. SELECT statements can include optional clauses such as:

  • FROM: tells SQL which tables to select values from
  • WHERE: filters by some condition
  • ORDER BY: enforces an ordering by some attribute or attributes (usually a column or columns)
  • LIMIT: limits the number of rows in the output table

    SELECT [columns] FROM [tables] WHERE [condition] ORDER BY [attributes] LIMIT [limit];

Notes about the arguments:

  • [columns]: a comma separated list of the columns to select, * can be used to select all of them
  • [tables]: a comma separated list of tables to select values from
  • [condition]: a Boolean expression
  • [attributes]: a comma separated list of attributes, which are usually columns, but could also be named aggregates (which we will learn later)
  • [limit]: an integer

We can select all the values of an attribute from a table with the SELECT statement. In addition, we can apply a filter using the WHERE clause. Here, we filter by Year > 2002, which makes the SELECT statement keep only the rows in the table whose Year value is greater than 2002.

sqlite> SELECT Berkeley FROM Football WHERE Year > 2002;
17
28

Here we selected Berkeley's score for all years after 2002.

Expressions in SQL

Here are some fundamental operations you can perform:

  • comparisons: =, >, <, <=, >=, <> ("not equal")
  • booleans: AND, OR
  • arithmatic: +, -, *, /

We can also perform string concatenation using ||, which behaves similarly to + on strings in Python.

sqlite> SELECT "hello" || " " || "world";
hello world

Note we capitalize SQL syntax purely for style. It makes it much easier to read, though will work if you don't capitalize it.

Getting to Know Your Fellow 61A Students

In the past week, we asked you and your fellow students to complete a brief online survey through Google Forms, which involved relatively random but fun questions. In this lab, we will interact with the results of the survey by using SQL queries to see if we can find interesting things in the data.

First, take a look at sp16data.sql and examine the table defined in it. Note its structure. You will be working with:

  • students: The main results of the survey. Each column represents a different question from the survey, except for the first column, which is the time of when the result was submitted. This time is a unique identifier for each of the rows in the table.

    Column Name Question
    time The unique timestamp that identifies the submission
    number What's your favorite number between 1 and 100?
    color What is your favorite color?
    seven Choose the number 7 below.
    Options:
    • 7
    • You are not the boss of me!
    • I do what I want.
    • I'm a rebel
    • Choose this option instead.
    • YOLO!
    song If you could listen to only one of these songs for the rest of your life, which would it be?
    Options:
    • "Hotline Bling" by Drake
    • "Work" by Rihanna
    • "Trololol" by Eduard Khil
    • "Baby" by Justin Bieber
    • "Sandstorm" by Darude
    • "Hello" by Adele
    • "Fur Elise" by Ludwig van Beethoven
    date Pick a day of the year!
    pet If you could have any animal in the world as a pet, what would it be?
    hilfinger Choose your favorite photo of Paul Hilfinger! (Options shown under Question 2)
    smallest Try to guess the smallest unique positive INTEGER that anyone will put!

Note: If you are looking for your personal response within the data, you may have noticed that some of your answers are slightly different from what you inputted. In order to make SQLite accept our data, and to optimize for as many matches as possible during our joins, we did the following things to clean up the data:

  • number and smallest: If you did not input a number, we put the number -1 in as a placeholder.
  • color and pet: We converted all the strings to be completely lowercase.

You will write all of your solutions in the starter file lab12.sql provided. As with other labs, you can test your solutions with OK. In addition, you can use either of the following commands. You may need to refer to the Usage section to find the appropriate command for your OS:

sqlite3 < lab12.sql
sqlite3 --init lab12.sql

Question 1: What Would SQL print?

First, load the tables into sqlite3. If you're on Windows or Mac OS X (Mavericks or older), use the following command:

$ ./sqlite3 --init lab12.sql

If you're on Ubuntu or Mac OS X (Yosemite or newer), use:

$ sqlite3 --init lab12.sql

Before we start, inspect the schema of the tables that we've created for you:

sqlite> .schema

This tells you the name of each of our tables and their attributes.

Let's also take a look at some of the entries in our table. There are a lot of entries though, so let's just output the first 20:

sqlite> SELECT * FROM students LIMIT 20;

If you're curious about some of the answers students put into the Google form, open up sp16data.sql in your favorite text editor and take a look!

For each of the SQL queries below, think about what the query is looking for, then try running the query yourself and see!

sqlite> SELECT * FROM students; -- This is a comment. * is shorthand for all columns!
______
selects all records from students;
sqlite> SELECT color FROM students WHERE number = 16;
______
selects the color from students who said their favorite number was 16;
sqlite> SELECT song, pet FROM students WHERE color = "blue" AND date = "12/25";
______
selects the song and pet from students who said their favorite color was blue and picked December 25th;

Remember, to exit out of SQLite after using the second command, type .exit or .quit or hit Ctrl-C. Also, if you see ...> you probably forgot a ;

Question 2: Obedience

To warm-up, let's ask a simple question related to our data: Is there a correlation between whether students do as they're told and their favorite image of Paul Hilfinger?

hilfinger

Write a SQL query to create a table that contains the columns seven (this column representing "obedience") and hilfinger (the image students selected) from the students table.

You should get the following output:

sqlite> SELECT * FROM obedience LIMIT 10;
7|Image 1
I do what I want.|Image 4
7|Image 3
I do what I want.|Image 1
Choose this option instead.|Image 1
7|Image 1
7|Image 1
7|Image 4
I do what I want.|Image 5
7|Image 1
CREATE TABLE obedience AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT seven, hilfinger FROM students;

Use OK to test your code:

python3 ok -q obedience

Question 3: The Smallest Unique Integer

Who successfully managed to guess the smallest unique integer value? Let's find out!

Unfortunately we have not learned how to do aggregations, which can help us count the number of times a specific value was selected, in SQL just yet. As such, we can only hand inspect our data to determine it. However, an anonymous elf has informed us that the smallest unique value is greater than 12!

Write a SQL query with the columns time and smallest to try to determine what the smallest integer value is. In order to make it easier for us to inspect these values, use WHERE to restrict the answers to numbers greater than 12, ORDER BY to sort the numerical values, and LIMIT your result to the first 20 values that are greater than the number 12.

The first 5 lines of your output should look like this:

sqlite> SELECT * FROM smallest_int LIMIT 5;
4/12/2016 12:19:56|13
4/12/2016 13:08:07|13
4/12/2016 14:26:38|13
4/12/2016 14:27:18|13
4/15/2016 0:40:14|13
CREATE TABLE smallest_int AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT time, smallest FROM students WHERE smallest > 12 ORDER BY smallest LIMIT 20;

Use OK to test your code:

python3 ok -q smallest-int

After you've successfully passed the OK test, actually take a look at the table smallest_int that you just created and find the smallest unique integer value! If you're curious how to do this with aggregations, check the Extra Question.

To do this, try the following:

$ sqlite3 --init lab12.sql
sqlite> SELECT * FROM smallest_int; -- No LIMIT this time!

Joins

We can use joins to include rows from another table that satisfy the WHERE predicate. Joins can either be on different tables, or the same table if we include an alias. Here we are referencing the football table twice, using AS to bind the football table once as the alias a and once as the alias b.

sqlite> SELECT b.Berkeley - a.Berkeley, b.Stanford - a.Stanford, a.Year, b.Year
...>        FROM Football AS a, Football AS b WHERE a.Year < b.Year;
-11|22|2003|2014
-13|21|2002|2014
-2|9|2002|2003

What is this query asking for?

You may notice that it does not seem like we actually performed any operations to do the join. However, the join is implicit in the fact that we listed more than one table after the FROM. In this example, we joined the table Football with itself and gave each instance of the table an alias, a and b so that we could distinctly refer to each table's attributes and perform selections and comparisons on them, such as a.Year < b.Year.

One way to think of a join is that it produces a cross-product between the two tables by matching each row from the first table with every other row in the second table, which creates a new, larger joined table.

Here's an illustration of what happened in the joining process during the above query.

joins

From here, the select statement examines the joined table and selects the values it desires: b.Berkeley - a.Berkeley and b.Stanford - a.Stanford but only from the rows WHERE a.Year < b.Year. This prevents duplicate results from appearing in our output and also removes rows where the years are the same.

Question 4: Great Students Think Alike

We administered a similar survey for Lab 12 last semester to 61A students (see the table fa15students in fa15data.sql). The only questions that have changed are song and hilfinger, so let's play around with this data using joins!

Great students think alike. We want to print out the information of students that have the same favorite date, number and pet! How do their favorite colors differ?

Hint: When joining table names where column names are the same, use dot notation to distinguish which columns are from which table. The structure is [table_name].[column name]. This sometimes may get verbose, so it’s stylistically better to give tables an alias using the AS keyword.

Write a SQL query to create a table that has 5 columns:

  • The shared favorite date between semesters
  • The shared favorite number between semesters
  • The shared favorite pet between semesters
  • The favorite color of this semester's student
  • The favorite color of last semester's student

You should get the following output:

sqlite> SELECT * FROM greatstudents;
1/1|1|dog|gray|black
2/29|13|tiger|black|#ff7700
CREATE TABLE greatstudents AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT a.date, a.number, a.pet, a.color, b.color FROM students AS a, fa15students AS b WHERE a.date = b.date AND a.number = b.number AND a.pet = b.pet;

Use OK to test your code:

python3 ok -q greatstudents

After you've successfully passed the OK test, play around some more with joins to see what cool connections you can draw between the data from both semesters!

Question 5: Matchmaker, Matchmaker

Did you take 61A with the hope of finding your soul mate? Well you're in luck! With all this data in hand, it's easy for us to find your perfect match. If two students want the same pet and have the same taste in music, they are clearly meant to be together! In order to provide some more information for the potential lovebirds to converse about, let's include the favorite colors of the two individuals as well!

In order to match up students, you will have to do a join on the students table with itself. When you do a join, SQLite will match every single row with every single other row, so make sure you do not match anyone with themselves, or match any given pair twice!

Hint: Remember you only want a pair of students to appear once. Refer back to the football example on how that was done.

Write a SQL query to create a table that has 4 columns:

  • The shared preferred pet of the couple
  • The shared favorite song of the couple
  • The favorite color of the first person
  • The favorite color of the second person

You should get the following output:

sqlite> SELECT * FROM matchmaker LIMIT 10;
dog|Hello|blue|blue
dog|Work|green|black
dragon|Sandstorm|black|black
|Work|citrine|green
tiger||purple|blue
dog|Fur Elise|gray|red
dog|Fur Elise|gray|light blue
doge|Sandstorm|brown|blue
paul hilfinger|Sandstorm|69|blue
paul hilfinger|Sandstorm|69|blurpul

The blank columns mark the places where students had no responses.

CREATE TABLE matchmaker AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT a.pet, a.song, a.color, b.color FROM students AS a, students AS b WHERE a.time < b.time AND a.pet = b.pet AND a.song = b.song;

Use OK to test your code:

python3 ok -q matchmaker

The COUNT Aggregator

How many people liked each pet? What is the biggest date chosen this semester? How many obedient people chose Image 1? Is there a difference between last semester's average favorite number and this semester's?

To answer these types of questions, we can bring in SQL aggregation, which allows us to accumulate values across rows in our SQL database!

In order to perform SQL aggregation, we can group rows in our table by one or more attributes. Once we have groups, we can aggregate over the groups in our table and find things like the maximum value (MAX), the minimum value (MIN), the number of rows in the group (COUNT), the average over all of the values (AVG), and more! SELECT statements that use aggregation are usually marked by two things: an aggregate function (MAX, MIN, COUNT, AVG, etc.) and a GROUP BY clause. GROUP BY [column(s)] groups together rows with the same value in each column(s). In this section we will only use COUNT, which will count the number of rows in each group, but feel free to check out http://www.sqlcourse2.com/agg_functions.html for more!

For example, the following query will print out the top 10 favorite numbers with their respective counts:

sqlite> SELECT number, COUNT(*) AS count FROM students GROUP BY number
  ORDER BY count DESC LIMIT 10;
69|33
7|19
42|15
1|13
3|12
13|11
2|10
4|9
9|9
17|8

This SELECT statement first groups all of the rows in our table students by number. Then, within each group, we perform aggregation by COUNTing all the rows. By selecting number and COUNT(*), we then can see the highest number and how many students picked that number. We have to order by our COUNT(*), which is saved in the alias count, by DESCending order, so our highest count starts at the top, and we limit our result to the top 10.

Question 6: Let's Count

Let's have some fun with this! For each statement we created a separate table in lab12.sql so fill in the corresponding table and run it using OK. Try working on this on your own or with a neighbor before toggling to see the solutions.

What was the favorite number from last semester?

sqlite> SELECT * from fa15favnum;
7|35
CREATE TABLE fa15favnum AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT number, COUNT(*) AS count FROM fa15students GROUP BY number ORDER BY count DESC LIMIT 1;

What were the top 10 pets last semester?

sqlite> SELECT * from fa15favpets;
dog|49
cat|19
tiger|17
|15
dragon|12
panda|11
monkey|8
wolf|8
cheetah|6
lion|6
CREATE TABLE fa15favpets AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT pet, COUNT(*) AS count FROM fa15students GROUP BY pet ORDER BY count DESC LIMIT 10;

What are the top 10 pets this semester?

sqlite> SELECT * from sp16favpets;
|17
dragon|15
dog|13
paul hilfinger|10
pikachu|7
cat|6
unicorn|6
chinchilla|5
phoenix|5
puppy|5
CREATE TABLE sp16favpets AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT pet, COUNT(*) AS count FROM students GROUP BY pet ORDER BY count DESC LIMIT 10;

How many people marked exactly the word 'dragon' as their ideal pet this semester?

sqlite> SELECT * from sp16dragon;
dragon|15
CREATE TABLE sp16dragon AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT pet, COUNT(*) FROM students WHERE pet = 'dragon';

Although close, our query doesn't give us an entirely accurate picture of what people's favorite pets are. For example, a dragon would not be counted the same as dragon. Let's see how many people actually want a dragon this semester by using LIKE, which compares substrings. We can use it inside WHERE, as in WHERE [column_name] LIKE '%[word]%' to find how many people would like some type of dragon.

sqlite> SELECT * from sp16alldragons;
dragon|28
CREATE TABLE sp16alldragons AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT pet, COUNT(*) FROM students WHERE pet LIKE '%dragon%';

We can find the student's favorite for any column (try it yourself in the interpreter), but let's go back to our Obedience question. Let's see how many obedient students this semester picked each image. We can do this by selecting only the rows that have seven = '7' then GROUP BY hilfinger, and finally we can COUNT them.

sqlite> SELECT * from obedienceimage;
7||1
7|Image 1|42
7|Image 2|18
7|Image 3|14
7|Image 4|27
7|Image 5|24
CREATE TABLE obedienceimage AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT seven, hilfinger, COUNT(*) FROM students WHERE seven = '7' GROUP BY hilfinger;

The possibilities are endless, so have fun experimenting!

Use OK to test your code:

python3 ok -q lets-count

Recursive SQL

A table defined within a with clause may have a single recursive case that defines output rows in terms of other output rows. The table is only visible to the line immediately after and can be thought of as "helper" tables.

sqlite> with
   ...>   fib(previous, current) as (
   ...>     SELECT 0, 1 UNION
   ...>     SELECT current, previous + current FROM fib
   ...>       WHERE current <= 20
   ...>   )
   ...> SELECT previous FROM fib;
0
1
1
2
3
5
8
13

So what is really happening here?

We first build a table called fib that has two columns, previous and current. The first row consists of 0 and 1. The second row is then created from the first row, with the two values 1, 1 (current and previous + current). The third row is then built from the second row with the current and previous + current so 1, 2. This continues until the condition current <= 20 evaluates to False. Note 21 is NOT included in our result table, meaning our recursion stops at the first value where the condition is false, but does not include it.

Question 7: Pairs

Let’s figure out all possible pairs of numbers between 0 and 42 that sum to 42 (the answer to Life, the Universe, and Everything)!

To do this we can build a pairs table that contains all such pairs without duplicates. This means 2,3 appears but 3,2 doesn't.

Hint: You may want to first create a helper table which has every value from 0 to 42. Then try to use a join, and finally make sure you don't have any duplicates.

Try this on your own or with a neighbor in lab12.sql, but if you're short on time, scroll down to check the solution and an explanation.

The first 10 columns should look something like this:

sqlite> SELECT * FROM pairs LIMIT 10;
0|0
0|1
0|2
0|3
0|4
0|5
0|6
0|7
0|8
0|9

Then we can find all the possible combinations of adding pairs to get to 42 with the following query. Your final pairs table columns should be x and y.

sqlite> SELECT x, y FROM pairs WHERE x + y = 42 LIMIT 10;
0|42
1|41
2|40
3|39
4|38
5|37
6|36
7|35
8|34
9|33
CREATE TABLE pairs AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
WITH nums(n) as ( SELECT 0 UNION SELECT n + 1 FROM nums WHERE n < 42 ) SELECT a.n AS x, b.n AS y FROM nums AS a, nums AS b WHERE a.n <= b.n;

The main crux of the problem is to create the pairs table. We first build a helper table nums which will have only one column n. The values will start at 0 and recursively build off of 0 by adding one. The final table will have all the values from 0 to 42, inclusive. We then can use a join to combine the two nums tables, one as a and one as b. By nature using a join will get every possible pair of values, so the final thing we need to do is get rid of duplicates. a.n <= b.n will make sure the first tables number is always less than or equal to the second table and therefore remove duplicates.

Use OK to test your code:

python3 ok -q pairs

Extra Question

The following question is for extra practice — it can be found in the lab12_extra.sql file. It is recommended that you complete this problem as well, but you do not need to turn it in for credit.

Question 8: The Smallest Unique Integer (Part 2)

Now, let's revisit the previous problem of finding the smallest integer that anyone chose, and take a closer look at the COUNT aggregate.

Write a SQL query that uses the COUNT aggregate to create a table that pairs the attribute smallest with the number of times it was chosen by a student (this is the aggregation part). In order to cut out the people who chose not to respond, and the sneaky cheaters that tried to put small non-integer values, limit your results to the number 1 and greater!

Hint: You may find that there isn't a particular attribute you should have to perform the COUNT aggregation over. If you are only interested in counting the number of rows in a group, you can just say COUNT(*).

Hint: Think about what attribute you need to GROUP BY.

After you've defined your table, you should get something like:

sqlite> SELECT * FROM smallest_int_count LIMIT 20;
1|39
2|12
3|8
4|11
5|4
6|5
7|5
8|8
9|6
10|2
11|11
12|9
13|6
14|9
15|1
16|5
17|10
18|4
19|9
20|1    
CREATE TABLE smallest_int_count AS
select "REPLACE THIS LINE WITH YOUR SOLUTION";
SELECT smallest, COUNT(*) FROM students WHERE smallest >= 1 GROUP BY smallest;

Use OK to test your code:

python3 ok -q smallest-int-count

It looks like the number 15 only had one person choose it! Were you the lucky student that put it down?