Project 1: DB61B

Due: Friday, 13 October 2017

Introduction

This project involves writing a miniature relational database management system (DBMS) that stores tables of data, where a table consists of some number of labeled columns of information. Our system will include a very simple query language for extracting information from these tables. For the purposes of this project, we will deal only with very small databases, and therefore will not consider speed and efficiency at all. For that sort of stuff, you might consider taking CS186 at some point.

As an example, consider the following set of three tables containing information about students, about course numbers and locations, and about grades in these courses. Each table has a name (given above the upper-left corner) and each column of each table has a name (given above the double line).

students
SID Lastname Firstname SemEnter YearEnter Major
101 Knowles Jason F 2003 EECS
102 Chan Valerie S 2003 Math
103 Xavier Jonathan S 2004 LSUnd
104 Armstrong Thomas F 2003 EECS
105 Brown Shana S 2004 EECS
106 Chan Yangfan F 2003 LSUnd

schedule
CCN Num Dept Time Room Sem Year
21228 61A EECS 2-3MWF 1 Pimentel F 2003
21231 61A EECS 1-2MWF 1 Pimentel S 2004
21229 61B EECS 11-12MWF 155 Dwinelle F 2003
21232 61B EECS 1-2MWF 2050 VLSB S 2004
21103 54 Math 1-2MWF 2050 VLSB F 2003
21105 54 Math 1-2MWF 1 Pimentel S 2004
21001 1A English 9-10MWF 2301 Tolman F 2003
21005 1A English 230-5TuTh 130 Wheeler S 2004

enrolled
SID CCN Grade
101 21228 B
101 21105 B+
101 21232 A-
101 21001 B
102 21231 A
102 21105 A-
102 21229 A
102 21001 B+
103 21105 B+
103 21005 B+
104 21228 A-
104 21229 B+
104 21105 A-
104 21005 A-
105 21228 A
105 21001 B+
106 21103 A
106 21001 B
106 21231 A

Describing Command Syntax

You communicate with the database system using an artificial notation usually known as a language, although it is much simpler than any human language. The definition and processing of such languages is an important skill for any computer scientists. We normally think of programming languages such as Java, but there are many other contexts where small, domain-specific languages (DSLs) are appropriate engineering solutions to a design or implementation problem.

We typically describe the syntax (grammatical structure) of a language using a specialized metalanguage---a language for describing languages. Here, we'll use a version of a common metalanguage: BNF (Backus-Naur Form). A BNF grammar consists of a set of rules, For example:


<create statement> ::= "create" "table" <name> <table definition>
<table definition> ::= "(" <name>,+ ")"
    | "as" <select clause>

means that a create statement consists of the (literal) words "create" and "table" followed by a name, followed by a table definition. A table definition, in turn, consists of either

The labels bracketed by <> are metavariables and stand for sets of possible strings, as described by the rules.

We have extended BNF with four concise definitions:

We define a few of the metavariables in English (sort of as base cases):

As is traditional, we ignore whitespace (spaces, tabs, newlines) in the descriptions that follow. Whitespace or other punctuation must separate words from each other. For example, createtable is just a name, and not the two words create and table. Comments, which are the same as /* */ comments in Java, are treated as whitespace.

Commands

Our database system uses a very restricted dialect of SQL (Structured Query Language), a widely used notation for communicating with relational databases. When you run the database system, it will accept a sequence of commands from the standard input (i.e., normally the terminal), according to the following syntax:

<program> ::= <statement>*
<statement> ::=
     <create> <statement>
    | <exit> <statement>
    | <insert> <statement>
    | <load> <statement>
    | <print> <statement>
    | <select> <statement>
    | <store> <statement>

<create statement> :: "create" "table" <name> <table definition> ";"
<table definition> ::=
      "(" <column name>,+ ")"
    | "as" <select clause>
<print statement> ::= "print" <table name> ";"

<insert statement> ::=
      "insert" "into" <table name> "values" ( "(" <literal>,+ ")" ),+ ";"
<load statement> ::= "load" <name> ";"
<store statement> ::= "store" <table name> ";"
<exit statement> ::= "quit" ";" | "exit" ";"

<select statement> ::= <select clause> ";"
<select clause> ::= "select" <column name>+, "from" <tables> <condition clause>
<condition clause> ::=
      "where" <condition> ("and" <condition>)*
    | <empty>
<tables> ::= <table name> | <table name> "," <table name>
<condition> ::=
      <column name> <relation> <column name>
    | <column name> <relation> <literal>

<relation> ::= "<" | ">" | "=" | "!=" | "<=" | ">="
<table name> ::= <name>
<column name> ::= <name>

The above only defines the syntax, but doesn't say what these statements mean or do (known as the semantics). For that, we just use English:

Select clauses. Select clauses are used in select statements and in create statements. They denote tables whose information is selected from other tables.

Condition clauses. An empty condition clause does not restrict the rows that appear in a select. Otherwise, it contains one or more conditions separated by and, all of which must be satisfied. The tests compare column values against either other columns or literal values. The relation symbols mean what they do in Java, except that all values are treated as strings (use the compareTo method on Strings to compare them). Thus you can write things like

  Lastname >= 'Chan'

to get all rows in which the value of the Lastname column comes after "Chan" in lexicographic order (i.e, by their first character, then their second, etc., ordering characters according to Java's collating sequence). The result is roughly like alphabetical order, with the major difference being that all digits come before all capital letters, which come before all lower-case letters, with punctuation and other characters inserted in various places.

Format of .db Files

A .db file starts with a line containing all column names (at least one) separated by commas, with any leading or trailing whitespace removed (see the .split and .trim methods of the String class). Column names must be valid identifiers and must be distinct. This is followed by any number of lines (zero of more), one for each row, containing the values for that row, separated by commas (again, leading or trailing whitespace is removed). For example, the students table shown previously would look like this in a file:

  SID,Lastname,Firstname,SemEnter,YearEnter,Major
  101,Knowles,Jason,F,2003,EECS
  102,Chan,Valerie,S,2003,Math
  103,Xavier,Jonathan,S,2004,LSUnd
  104,Armstrong,Thomas,F,2003,EECS
  105,Brown,Shana,S,2004,EECS
  106,Chan,Yangfan,F,2003,LSUnd

Example

If the information in these tables exists in three files—students.db, schedule.db, and enrolled.db—then a session with our DBMS might look like the following transcript. Lines that start with prompts (> or ...) are input from the user; the rest are output from the program

  DB61B System.  Version 2.0
  > load students ;
  Loaded students.db
  > load enrolled ;
  Loaded enrolled.db
  > load schedule ;
  Loaded schedule.db
  > /* What are the names and SIDS of all students whose last name
  ...  is 'Chan'? */
  > select SID, Firstname from students
  ...  where Lastname = 'Chan';
  Search results:
    102 Valerie
    106 Yangfan
  > /* Who took the course with CCN 21001, and what were their grades? */
  > select Firstname, Lastname, Grade
  ...      from students, enrolled where CCN = '21001';
  Search results:
    Jason Knowles B
    Shana Brown B+
    Valerie Chan B+
    Yangfan Chan B
  > /* Who has taken the course named 61A from EECS? */
  > /* First, create a table that contains SIDs */
  > create table enrolled2 as select SID
  ...  from enrolled, schedule
  ...  where Dept = 'EECS' and Num = '61A';
  > /* Print these SIDs */
  > print enrolled2;
  Contents of enrolled2:
    101
    102
    104
    105
    106
  > /* Now print the names of the students in this list */
  > select Firstname, Lastname from students, enrolled2;
  Search results:
    Jason Knowles
    Shana Brown
    Thomas Armstrong
    Valerie Chan
    Yangfan Chan
  > quit ;

Your Task

First, make sure that everything in your repository is properly updated and checked in. Before you start, the command

$ cd  ~/repo
$ git status

should report that the directory is clean and that there are no untracked files that should be added and committed. Never start a new project without doing this.

To obtain the skeleton files (and set up an initial entry for your project in the repository), you can use the command sequence

git fetch shared
git merge shared/proj1 -m "Get proj1 skeleton"
git push

from your Git working directory. Should we update the skeleton, you can use the same sequence (with an appropriate change in the -m parameter) to update your project with the same changes.

You are free to modify anything in the skeleton, as long as the final program conforms to this specification and the requirements below. You needn't ask permission to do so. Filling in parts we've marked with "FIXME" or similar comments is not part of the assignment; rather, it's a suggestion. Do be sure, however, that your documentation comments reflect any changes or additions you make.

Be sure to include tests of your program (yes, that is part of the grade). The makefile we provide has a convenient target for running such tests. Our skeleton directory contains a couple of trivial tests, but these do not constitute an adequate set of tests! Make up your tests ahead of time and update your makefile to run them.

The input to your program will come from fallible humans. Therefore, part of the problem is dealing gracefully with errors. When the user makes a syntax error, or mentions a non-existent column, your program should not simply halt and catch fire, but should give some sort of message and then try to get back to a usable state. For syntactic errors (errors in the format of commands) you should skip to the next semicolon (if it hasn't occurred yet) or the end-of-file, whichever comes first. For all errors, you should print a single, separate line starting with the word "error" (in upper or lower case) and followed by any message, and the erroneous command should have no effect on any table. Your program should not exit with a cryptic stack trace.

Our testing of your projects (but not our grading!) will be automated. The testing program will be finicky, so be sure that:

Submission and Version Control

It is important that you commit work to your repository at frequent intervals. Version control is a powerful tool for saving yourself when you mess something up or your dog eats your project, but you must use it regularly if it is to be of any use. Feel free to commit every 15 minutes; Git only saves what has changed, even though it acts as if it takes a snapshot of your entire project.

The command git status will tell you what you have modified, removed, or possibly added since the last commit. It will also tell you how much you have not yet sent to your central repository. You needn't just assume that things are as you expect; git status will tell you whether you've committed and pushed everything.

If you are switching between using a clone of your central repository on the instructional computers and another at home (or on your laptop), be careful to synchronize your work. When you are done working on one system, be sure push your work to the central repository:

$ git status       # To see what needs to be added or committed.
$ git commit -a    # If needed to get everything committed.
$ git push

When you start working on the other system, you then do

$ git status          # To make sure you didn't accidentally leave
                      # stuff uncommitted or untracked.
$ git pull --rebase   # Get changes from your central repo.

Submit your project by committing and tagging it:

$ git tag proj1-0     # Or proj1-1, etc.
$ git push
$ git push --tags

Be sure to respond to all prompts and to make sure the messages you get indicate that the submission was successful. Don't just "say the magic words" and assume that everything's OK.

Advice

You will find this project challenging enough without helping to make it harder. Much of what might look complicated to you is actually pretty easy, once you take the time to look to see what has already been written for you—specifically what is in the skeleton and the standard Java library of classes. So, before doing any hard work on anything that looks tricky, browse the skeleton and the documentation.

If you're not using the skeleton, you'll have a couple of extra things worth mentioning to figure out at some point during the project, namely reading input files and choosing a way to store the rows of a table. For reading from files (the load command), you'll probably want to use java.io.FileReader together with java.util.Scanner, which is also good for reading from the standard input. In fact, we've tried to design the syntax to make it particularly easy to parse using Scanners. If you find yourself writing lots of complicated junk just to read in and interpret the input, you might back off and ask one of us to look over your approach. The standard Java string type, java.lang.String, also contains useful member functions for doing comparisons (including comparisons that ignore case). For choosing a way to store rows in a table, you can use arrays or devise a kind of linked list to do so, but you might instead want to take a look at such interfaces as java.util.List, java.util.Set, and java.util.Map and at the library classes that implement them (such as java.util.ArrayList, java.util.HashSet, and java.util.HashMap. Read the on-line documentation for all of these.

It's important to have something working as soon as possible. You'll prevent really serious trouble by doing so. I suggest the following order to getting things working:

You may throw all our skeleton code away if you prefer. You are not required to implement Table, Column, etc. However, we strongly recommend that you don't do this if your only reason is that you can't figure out how to do it our way. In that case, pester us with questions until you do understand.