Homework 7: SQL

Due by 11:59pm on Wednesday, August 11

Instructions

Download hw07.zip. Inside the archive, you will find a file called hw07.sql, along with a copy of the ok autograder.

Submission: When you are done, submit with python3 ok --submit. You may submit more than once before the deadline; only the final submission will be scored. Check that you have successfully submitted your code on okpy.org. See Lab 0 for more instructions on submitting assignments.

Using Ok: If you have any questions about using Ok, please refer to this guide.

Readings: You might find the following references useful:

Grading: Homework is graded based on correctness. Each incorrect problem will decrease the total score by one point. There is a homework recovery policy as stated in the syllabus. This homework is out of 3 points.

Usage

First, check that a file named sqlite_shell.py exists alongside the assignment files. If you don't see it, or if you encounter problems with it, scroll down to the Troubleshooting section to see how to download an official precompiled SQLite binary before proceeding.

You can start an interactive SQLite session in your Terminal or Git Bash with the following command:

python3 sqlite_shell.py

While the interpreter is running, you can type .help to see some of the commands you can run.

To exit out of the SQLite interpreter, type .exit or .quit or press Ctrl-C. Remember that if you see ...> after pressing enter, you probably forgot a ;.

You can also run all the statements in a .sql file by doing the following:

  1. Runs your code and then exits SQLite immediately afterwards.

    python3 sqlite_shell.py < hw07.sql
  2. Runs your code and then opens an interactive SQLite session, which is similar to running Python code with the interactive -i flag.

    python3 sqlite_shell.py --init hw07.sql

Questions

Dog Data

In each question below, you will define a new table based on the following tables.

CREATE TABLE parents AS
  SELECT "abraham" AS parent, "barack" AS child UNION
  SELECT "abraham"          , "clinton"         UNION
  SELECT "delano"           , "herbert"         UNION
  SELECT "fillmore"         , "abraham"         UNION
  SELECT "fillmore"         , "delano"          UNION
  SELECT "fillmore"         , "grover"          UNION
  SELECT "eisenhower"       , "fillmore";

CREATE TABLE dogs AS
  SELECT "abraham" AS name, "long" AS fur, 26 AS height UNION
  SELECT "barack"         , "short"      , 52           UNION
  SELECT "clinton"        , "long"       , 47           UNION
  SELECT "delano"         , "long"       , 46           UNION
  SELECT "eisenhower"     , "short"      , 35           UNION
  SELECT "fillmore"       , "curly"      , 32           UNION
  SELECT "grover"         , "short"      , 28           UNION
  SELECT "herbert"        , "curly"      , 31;

CREATE TABLE sizes AS
  SELECT "toy" AS size, 24 AS min, 28 AS max UNION
  SELECT "mini"       , 28       , 35        UNION
  SELECT "medium"     , 35       , 45        UNION
  SELECT "standard"   , 45       , 60;

Your tables should still perform correctly even if the values in these tables change. For example, if you are asked to list all dogs with a name that starts with h, you should write:

SELECT name FROM dogs WHERE "h" <= name AND name < "i";

Instead of assuming that the dogs table has only the data above and writing

SELECT "herbert";

The former query would still be correct if the name grover were changed to hoover or a row was added with the name harry.

Q1: Size of Dogs

The Fédération Cynologique Internationale classifies a standard poodle as over 45 cm and up to 60 cm. The sizes table describes this and other such classifications, where a dog must be over the min and less than or equal to the max in height to qualify as a size.

Create a size_of_dogs table with two columns, one for each dog's name and another for its size.

-- The size of each dog
CREATE TABLE size_of_dogs AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

The output should look like the following:

sqlite> select * from size_of_dogs;
abraham|toy
barack|standard
clinton|standard
delano|standard
eisenhower|mini
fillmore|mini
grover|toy
herbert|mini

Use Ok to test your code:

python3 ok -q size_of_dogs

Q2: By Parent Height

Create a table by_parent_height that has a column of the names of all dogs that have a parent, ordered by the height of the parent from tallest parent to shortest parent.
-- All dogs with parents ordered by decreasing height of their parent
CREATE TABLE by_parent_height AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

For example, fillmore has a parent (eisenhower) with height 35, and so should appear before grover who has a parent (fillmore) with height 32. The names of dogs with parents of the same height should appear together in any order. For example, barack and clinton should both appear at the end, but either one can come before the other.

sqlite> select * from by_parent_height;
herbert
fillmore
abraham
delano
grover
barack
clinton

Use Ok to test your code:

python3 ok -q by_parent_height

Q3: Sentences

There are two pairs of siblings that have the same size. Create a table that contains a row with a string for each of these pairs. Each string should be a sentence describing the siblings by their size.
-- Filling out this helper table is optional
CREATE TABLE siblings AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

-- Sentences about siblings that are the same size
CREATE TABLE sentences AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

Each sibling pair should appear only once in the output, and siblings should be listed in alphabetical order (e.g. "barack plus clinton..." instead of "clinton plus barack..."), as follows:

sqlite> select * from sentences;
The two siblings, barack plus clinton have the same size: standard
The two siblings, abraham plus grover have the same size: toy

Hint: First, create a helper table containing each pair of siblings. This will make comparing the sizes of siblings when constructing the main table easier.

Hint: If you join a table with itself, use AS within the FROM clause to give each table an alias.

Hint: In order to concatenate two strings into one, use the || operator.


Use Ok to test your code:

python3 ok -q sentences

Q4: Stacks

Sufficiently sure-footed dogs can stand on either other's backs to form a stack (up to a point). We'll say that the total height of such a stack is the sum of the heights of the dogs.

Create a two-column table describing all stacks of four dogs at least 175 cm high. The first column should contain a comma-separated list of dogs in the stack, and the second column should contain the total height of the stack. Order the stacks in increasing order of total height.

Note: there are no stacks of less than 4 dogs that reach 175cm in height.

-- Ways to stack 4 dogs to a height of at least 175, ordered by total height
CREATE TABLE stacks_helper(dogs, stack_height, last_height, n);

-- Add your INSERT INTOs here

CREATE TABLE stacks AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

A valid stack of dogs includes each dog only once, and the dogs should be listed in increasing order of height within the stack. You may assume that no two dogs have the same height.

sqlite> select * from stacks;
herbert, delano, clinton, barack|176
fillmore, delano, clinton, barack|177
eisenhower, delano, clinton, barack|180

You should use the provided helper table stacks_helper. It has 4 columns: (1) dogs - a stack of dogs as a comma separated list of dog names, (2) stack_height - the height of the stack, (3) last_height - the height of the last dog added to the stack (in order to ensure we have the right order in the stack), and (4) n - the number of dogs in the current stack.

First, fill this table up by doing the following:

  1. Use an INSERT INTO to add stacks of just one dog into stacks_helper. You can use this syntax to insert rows from a table called t1 into a table called t2:

    INSERT INTO t2 SELECT [expression] FROM t1 ...;

    For example:

    sqlite> CREATE TABLE t1 AS
    ...>        SELECT 1 as a, 2 as b;
    sqlite> CREATE TABLE t2(c, d);
    sqlite> INSERT INTO t2 SELECT a, b FROM t1;
    sqlite> SELECT * FROM t2;
    1|2
  2. Now, use the stacks of one dog to insert stacks of two dogs. It's possible to INSERT INTO a table rows selected from that same table. For example,

    sqlite> CREATE TABLE ints AS
       ...>   SELECT 1 AS n UNION
       ...>   SELECT 2      UNION
       ...>   SELECT 3;
    sqlite> INSERT INTO ints(n) SELECT n+3 FROM ints;
    sqlite> SELECT * FROM ints;
    1
    2
    3
    4
    5
    6
  3. Repeat step 3 to create stacks of three dogs, then of four dogs.

Once you've built up to stacks of four dogs in your stacks_helper table, use it to fill in the stacks table!


Use Ok to test your code:

python3 ok -q stacks

Extra Questions

Q5: Low Variance

We want to create a table which contains the total height of all dogs that share a fur type, but only for the fur types where the heights are not spread out too much. To approximate this, we'll only consider the fur types where each dog with that fur is within 30% of the average height of all dogs with that fur.

For example, if the average height for short-haired dogs is 10, then in order to be included in our output, all dogs with short hair must have a height of at most 13 and at least 7.

To achieve this, we can use MIN, MAX, and AVG. For this problem, we'll want to find the average height and make sure that:

  • There are no heights smaller than .7 of the average.
  • There are no heights greater than 1.3 of the average.

Your output should first include the fur type and then the total height for the fur types that meet this criteria

-- Total size for each fur type where all of the heights differ by no more than 30% from the average height
CREATE TABLE low_variance AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

-- Example:
SELECT * FROM low_variance;
-- Expected output:
-- curly|63

Explanation: There is at least one long haired dog with a height that is not within 30% of the average height of long haired dogs, and at least one short haired dog with a height not within 30% of the average height of short haired dogs, so neither short nor long haired dogs are included in the output. There are two curly haired dogs: one with height 32, and one with height 31, for a total height of 63.

Use Ok to test your code:

python3 ok -q low_variance

Q6: Above Average

Create a table above_average that includes the height and name of every dog that shares the 10's digit of its height with at least one other dog and has a height that is at least the average of all dogs that have the same 10's digit.

-- Heights and names of dogs that are above average in height among
-- dogs whose height has the same first digit.
CREATE TABLE above_average AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

-- Example:
SELECT * FROM above_average ORDER BY height;
-- Expected output:
-- 47|clinton
-- 35|eisenhower
-- 28|grover

Explanation: The average height of 20-something dogs is 27, of 30-something dogs is 32.67, of 40-something dogs is 46.5, and of 50-something dogs is 52. barack is not included because there is only one 50-something dog.

Hint: You'll find it convenient to use a WITH clause to get a table of average heights for each first digit.

Hint: A GROUP BY clause can contain arithmetic expressions as well as simple column names.

Use Ok to test your code:

python3 ok -q above_average

Q7: Non-Parents

This question is optional but recommended for practice. You can receive full credit for the homework without attempting this problem.

A non-parent relation is either an ancestor that is not a parent (such as a grandparent or great-grandparent) or a descendant that is not a child (such as a grandchild or great-grandchild). Siblings are not relations under this definition.

Select all pairs that form non-parent relations ordered by the difference in height between one dog and the other.

-- non_parents is an optional, but recommended question
-- All non-parent relations ordered by height difference
CREATE TABLE non_parents as
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

The shortest paired with the tallest should appear first, and the tallest paired with the shortest should appear last. If two pairs have the same height difference, they may appear together in any order.

-- Example:
SELECT * FROM non_parents;
-- Expected output:
--   fillmore|barack
--   eisenhower|barack
--   fillmore|clinton
--   eisenhower|clinton
--   eisenhower|delano
--   abraham|eisenhower
--   grover|eisenhower
--   herbert|eisenhower
--   herbert|fillmore
--   fillmore|herbert
--   eisenhower|herbert
--   eisenhower|grover
--   eisenhower|abraham
--   delano|eisenhower
--   clinton|eisenhower
--   clinton|fillmore
--   barack|eisenhower
--   barack|fillmore

Hint: Start by generating a table of grandparents and their grandchildren. How can we use this to generate all ancestors?

Use Ok to test your code:

python3 ok -q relations

Troubleshooting

Python already comes with a built-in SQLite database engine to process SQL. However, it doesn't come with a "shell" to let you interact with it from the terminal. Because of this, until now, you have been using a simplified SQLite shell written by us. However, you may find the shell is old, buggy, or lacking in features. In that case, you may want to download and use the official SQLite executable.

If running python3 sqlite_shell.py didn't work, you can download a precompiled sqlite directly by following the following instructions and then use sqlite3 and ./sqlite3 instead of python3 sqlite_shell.py based on which is specified for your platform.

Another way to start using SQLite is to download a precompiled binary from the SQLite website. The latest version of SQLite at the time of writing is 3.28.0, but you can check for additional updates on the website.

However, before proceeding, please remove (or rename) any SQLite executables (sqlite3, sqlite_shell.py, and the like) from the current folder, or they may conflict with the official one you download below. Similarly, if you wish to switch back later, please remove or rename the one you downloaded and restore the files you removed.

Windows

  1. Visit the download page linked above and navigate to the section Precompiled Binaries for Windows. Click on the link sqlite-tools-win32-x86-*.zip to download the binary.
  2. Unzip the file. There should be a sqlite3.exe file in the directory after extraction.
  3. Navigate to the folder containing the sqlite3.exe file and check that the version is at least 3.8.3:

    $ cd path/to/sqlite
    $ ./sqlite3 --version
    3.12.1 2016-04-08 15:09:49 fe7d3b75fe1bde41511b323925af8ae1b910bc4d

macOS Yosemite (10.10) or newer

SQLite comes pre-installed. Check that you have a version that's greater than 3.8.3:

    $ sqlite3
    SQLite version 3.8.10.2

Mac OS X Mavericks (10.9) or older

SQLite comes pre-installed, but it is the wrong version.

  1. Visit the download page linked above and navigate to the section Precompiled Binaries for Mac OS X (x86). Click on the link sqlite-tools-osx-x86-*.zip to download the binary.
  2. Unzip the file. There should be a sqlite3 file in the directory after extraction.
  3. Navigate to the folder containing the sqlite3 file and check that the version is at least 3.8.3:

    $ cd path/to/sqlite
    $ ./sqlite3 --version
    3.12.1 2016-04-08 15:09:49 fe7d3b75fe1bde41511b323925af8ae1b910bc4d

Ubuntu

The easiest way to use SQLite on Ubuntu is to install it straight from the native repositories (the version will be slightly behind the most recent release):

$ sudo apt install sqlite3
$ sqlite3 --version
3.8.6 2014-08-15 11:46:33 9491ba7d738528f168657adb43a198238abde19e