Homework 8: Streams, SQL

Due by 11:59pm on Tuesday, August 11

Instructions

Download hw08.zip.

Submission: When you are done, submit with python3 ok --submit. You may submit more than once before the deadline; only the final submission will be scored. Check that you have successfully submitted your code on okpy.org. See Lab 0 for more instructions on submitting assignments.

Using Ok: If you have any questions about using Ok, please refer to this guide.

Readings: You might find the following references useful:

Grading: Homework is graded based on correctness. Each incorrect problem will decrease the total score by one point. There is a homework recovery policy as stated in the syllabus. This homework is out of 3 points.

Usage

First, check that a file named sqlite_shell.py exists alongside the assignment files. If you don't see it, or if you encounter problems with it, scroll down to the Troubleshooting section to see how to download an official precompiled SQLite binary before proceeding.

You can start an interactive SQLite session in your Terminal or Git Bash with the following command:

python3 sqlite_shell.py

While the interpreter is running, you can type .help to see some of the commands you can run.

To exit out of the SQLite interpreter, type .exit or .quit or press Ctrl-C. Remember that if you see ...> after pressing enter, you probably forgot a ;.

You can also run all the statements in a .sql file by doing the following:

  1. Runs your code and then exits SQLite immediately afterwards.

    python3 sqlite_shell.py < lab13.sql
  2. Runs your code and then opens an interactive SQLite session, which is similar to running Python code with the interactive -i flag.

    python3 sqlite_shell.py --init lab13.sql

To complete this homework assignment, you will need to use SQLite version 3.8.3 or greater.

To check your progress, you can run sqlite3 directly by running:

python3 sqlite_shell.py --init hw08.scm

You should also check your work using ok:

python3 ok

Questions

Streams

Q1: Run-Length Encoding

Run-length encoding is a very simple data compression technique, whereby runs of data are compressed and stored as a single value. A run is defined to be a contiguous sequence of the same number. For example, in the (finite) sequence

1, 1, 1, 1, 1, 6, 6, 6, 6, 2, 5, 5, 5

there are four runs: one each of 1, 6, 2, and 5. We can represent the same sequence as a sequence of two-element lists:

(1 5), (6 4), (2 1), (5 3)

Notice that the first element of each list is the number in a run, and the second element is the number of times that number appears in the run.

We will extend this idea to streams. Write a function called rle that takes in a stream of data, and returns a corresponding stream of two-element lists, which represents the run-length encoded version of the stream. You do not have to consider compressing infinite streams - the stream passed in will eventually terminate with nil.

scm> (define s (cons-stream 1 (cons-stream 1 (cons-stream 2 nil))))
s
scm> (define encoding (rle s))
encoding
scm> (car encoding)  ; Run of number 1 of length 2
(1 2)
scm> (car (cdr-stream encoding))  ; Run of number 2 of length 1
(2 1)
scm> (define s (list-to-stream '(1 1 2 2 2 3))) ; Makes a stream with the same elements as the list passed in
scm> (stream-to-list (rle s))
((1 2) (2 3) (3 1))
(define (rle s)
  'YOUR-CODE-HERE
)

Use Ok to test your code:

python3 ok -q rle

Q2: Group By Nondecreasing

Define a function group-by-nondecreasing, which takes in a stream of numbers and outputs a stream of lists, which overall has the same numbers in the same order, but grouped into segments that are non-decreasing.

For example, if the input is a stream containing elements

1 2 3 4 1 2 3 4 1 1 1 2 1 1 0 4 3 2 1 ...

the output should contain elements

(1 2 3 4) (1 2 3 4) (1 1 1 2) (1 1) (0 4) (3) (2) (1) ...

If the input is an infinite stream, the output should be an infinite stream and if the input is finite, the output should also be finite. Even if the input is an infinite stream, you can assume that every non-decreasing segment is finite.

Hint: avoid any direct recursive calls outside the context of a second part of a call to cons-stream, otherwise your solution won't work for infinite streams!


(define (group-by-nondecreasing s)
    'YOUR-CODE-HERE)

Use Ok to test your code:

python3 ok -q group-by-nondecreasing

SQL

Dog Data

In each question below, you will define a new table based on the following tables.

CREATE TABLE parents AS
  SELECT "abraham" AS parent, "barack" AS child UNION
  SELECT "abraham"          , "clinton"         UNION
  SELECT "delano"           , "herbert"         UNION
  SELECT "fillmore"         , "abraham"         UNION
  SELECT "fillmore"         , "delano"          UNION
  SELECT "fillmore"         , "grover"          UNION
  SELECT "eisenhower"       , "fillmore";

CREATE TABLE dogs AS
  SELECT "abraham" AS name, "long" AS fur, 26 AS height UNION
  SELECT "barack"         , "short"      , 52           UNION
  SELECT "clinton"        , "long"       , 47           UNION
  SELECT "delano"         , "long"       , 46           UNION
  SELECT "eisenhower"     , "short"      , 35           UNION
  SELECT "fillmore"       , "curly"      , 32           UNION
  SELECT "grover"         , "short"      , 28           UNION
  SELECT "herbert"        , "curly"      , 31;

CREATE TABLE sizes AS
  SELECT "toy" AS size, 24 AS min, 28 AS max UNION
  SELECT "mini"       , 28       , 35        UNION
  SELECT "medium"     , 35       , 45        UNION
  SELECT "standard"   , 45       , 60;

Your tables should still perform correctly even if the values in these tables change. For example, if you are asked to list all dogs with a name that starts with h, you should write:

SELECT name FROM dogs WHERE "h" <= name AND name < "i";

Instead of assuming that the dogs table has only the data above and writing

SELECT "herbert";

The former query would still be correct if the name grover were changed to hoover or a row was added with the name harry.

Q3: Size of Dogs

The Fédération Cynologique Internationale classifies a standard poodle as over 45 cm and up to 60 cm. The sizes table describes this and other such classifications, where a dog must be over the min and less than or equal to the max in height to qualify as a size.

Create a size_of_dogs table with two columns, one for each dog's name and another for its size.

-- The size of each dog
CREATE TABLE size_of_dogs AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

The output should look like the following:

sqlite> select * from size_of_dogs;
abraham|toy
barack|standard
clinton|standard
delano|standard
eisenhower|mini
fillmore|mini
grover|toy
herbert|mini

Use Ok to test your code:

python3 ok -q size_of_dogs

Q4: By Parent Height

Create a table by_parent_height that has a column of the names of all dogs that have a parent, ordered by the height of the parent from tallest parent to shortest parent.
-- All dogs with parents ordered by decreasing height of their parent
CREATE TABLE by_parent_height AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

For example, fillmore has a parent (eisenhower) with height 35, and so should appear before grover who has a parent (fillmore) with height 32. The names of dogs with parents of the same height should appear together in any order. For example, barack and clinton should both appear at the end, but either one can come before the other.

sqlite> select * from by_parent_height;
herbert
fillmore
abraham
delano
grover
barack
clinton

Use Ok to test your code:

python3 ok -q by_parent_height

Q5: Sentences

There are two pairs of siblings that have the same size. Create a table that contains a row with a string for each of these pairs. Each string should be a sentence describing the siblings by their size.
-- Filling out this helper table is optional
CREATE TABLE siblings AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

-- Sentences about siblings that are the same size
CREATE TABLE sentences AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

Each sibling pair should appear only once in the output, and siblings should be listed in alphabetical order (e.g. "barack and clinton..." instead of "clinton and barack..."), as follows:

sqlite> select * from sentences;
abraham and grover are toy siblings
barack and clinton are standard siblings

Hint: First, create a helper table containing each pair of siblings. This will make comparing the sizes of siblings when constructing the main table easier.

Hint: If you join a table with itself, use AS within the FROM clause to give each table an alias.

Hint: In order to concatenate two strings into one, use the || operator.

Use Ok to test your code:

python3 ok -q sentences

Q6: Stacks

Sufficiently sure-footed dogs can stand on either other's backs to form a stack (up to a point). We'll say that the total height of such a stack is the sum of the heights of the dogs.

Create a two-column table describing all stacks of four dogs at least 170 cm high. The first column should contain a comma-separated list of dogs in the stack, and the second column should contain the total height of the stack. Order the stacks in increasing order of total height.

Note: there are no stacks of less than 4 dogs that reach 170cm in height.

-- Ways to stack 4 dogs to a height of at least 170, ordered by total height
CREATE TABLE stacks_helper(dogs, stack_height, last_height, n);

-- Add your INSERT INTOs here

CREATE TABLE stacks AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

A valid stack of dogs includes each dog only once, and the dogs should be listed in increasing order of height within the stack. You may assume that no two dogs have the same height.

sqlite> select * from stacks;
abraham, delano, clinton, barack|171
grover, delano, clinton, barack|173
herbert, delano, clinton, barack|176
fillmore, delano, clinton, barack|177
eisenhower, delano, clinton, barack|180

You should use the provided helper table stacks_helper. It has 4 columns: (1) dogs - a stack of dogs as a comma separated list of dog names, (2) stack_height - the height of the stack, (3) last_height - the height of the last dog added to the stack (in order to ensure we have the right order in the stack), and (4) n - the number of dogs in the current stack.

First, fill this table up by doing the following:

  1. Use an INSERT INTO to add stacks of just one dog into stacks_helper. You can use this syntax to insert rows from a table called t1 into a table called t2:

    INSERT INTO t2 SELECT [expression] FROM t1 ...;

    For example:

    sqlite> CREATE TABLE t1 AS
    ...>        SELECT 1 as a, 2 as b;
    sqlite> CREATE TABLE t2(c, d);
    sqlite> INSERT INTO t2 SELECT a, b FROM t1;
    sqlite> SELECT * FROM t2;
    1|2
  2. Now, use the stacks of one dog to insert stacks of two dogs. It's possible to INSERT INTO a table rows selected from that same table. For example,

    sqlite> CREATE TABLE ints AS
       ...>   SELECT 1 AS n UNION
       ...>   SELECT 2      UNION
       ...>   SELECT 3;
    sqlite> INSERT INTO ints(n) SELECT n+3 FROM ints;
    sqlite> SELECT * FROM ints;
    1
    2
    3
    4
    5
    6
  3. Repeat step 3 to create stacks of three dogs, then of four dogs.

Once you've built up to stacks of four dogs in your stacks_helper table, use it to fill in the stacks table!

Use Ok to test your code:

python3 ok -q stacks

Submit

Make sure to submit this assignment by running:

python3 ok --submit

Troubleshooting/Advanced SQLite

Troubleshooting

Python already comes with a built-in SQLite database engine to process SQL. However, it doesn't come with a "shell" to let you interact with it from the terminal. Because of this, until now, you have been using a simplified SQLite shell written by us. However, you may find the shell is old, buggy, or lacking in features. In that case, you may want to download and use the official SQLite executable.

If running python3 sqlite_shell.py didn't work, you can download a precompiled sqlite directly by following the following instructions and then use sqlite3 and ./sqlite3 instead of python3 sqlite_shell.py based on which is specified for your platform.

Another way to start using SQLite is to download a precompiled binary from the SQLite website. The latest version of SQLite at the time of writing is 3.28.0, but you can check for additional updates on the website.

However, before proceeding, please remove (or rename) any SQLite executables (sqlite3, sqlite_shell.py, and the like) from the current folder, or they may conflict with the official one you download below. Similarly, if you wish to switch back later, please remove or rename the one you downloaded and restore the files you removed.

Windows

  1. Visit the download page linked above and navigate to the section Precompiled Binaries for Windows. Click on the link sqlite-tools-win32-x86-*.zip to download the binary.
  2. Unzip the file. There should be a sqlite3.exe file in the directory after extraction.
  3. Navigate to the folder containing the sqlite3.exe file and check that the version is at least 3.8.3:

    $ cd path/to/sqlite
    $ ./sqlite3 --version
    3.12.1 2016-04-08 15:09:49 fe7d3b75fe1bde41511b323925af8ae1b910bc4d

macOS Yosemite (10.10) or newer

SQLite comes pre-installed. Check that you have a version that's greater than 3.8.3:

    $ sqlite3
    SQLite version 3.8.10.2

Mac OS X Mavericks (10.9) or older

SQLite comes pre-installed, but it is the wrong version.

  1. Visit the download page linked above and navigate to the section Precompiled Binaries for Mac OS X (x86). Click on the link sqlite-tools-osx-x86-*.zip to download the binary.
  2. Unzip the file. There should be a sqlite3 file in the directory after extraction.
  3. Navigate to the folder containing the sqlite3 file and check that the version is at least 3.8.3:

    $ cd path/to/sqlite
    $ ./sqlite3 --version
    3.12.1 2016-04-08 15:09:49 fe7d3b75fe1bde41511b323925af8ae1b910bc4d

Ubuntu

The easiest way to use SQLite on Ubuntu is to install it straight from the native repositories (the version will be slightly behind the most recent release):

$ sudo apt install sqlite3
$ sqlite3 --version
3.8.6 2014-08-15 11:46:33 9491ba7d738528f168657adb43a198238abde19e