CS 3 project choice 1
Data base program

Problem

A data base consists of a collection of records, each of which is a collection of fields. For this project, you will implement a program to query a data base.

The program framework is here, and also online in the file ~cs3/public_html/programs/db.fw.scm .) When the procedure startup is called, it asks you to choose a data base file. A legal data base file consists of two lists:

Figure 1 displays the contents of a sample data base file. You should create your own files for the purposes of testing.

    ((name string) (favorite-musician string)
     (birthday date) (years-in-CS3 number) )
    (("mike" "britney spears" (august 17 1950) 7)
     ("tim" "frank sinatra" (december 1 1969) 3)
     ("jieae" "barry manilow" (february 29 1972) 2)
     ("brendan" "lawrence welk" (march 28 1970) 4)
     ("robert" "placido domingo" (july 31 1970) 3)
     ("clint" "tammy wynette" (april 15 1965) 1)
     ("paul" "frank sinatra" (may 19 1977) 2))

Figure 1 — A sample data base file

Startup then calls the process procedure. Process takes three inputs: the "working data base", a sublist of the data base from the file; the complete data base from the file; and the field descriptor list. Process repeatedly prints the contents of the working data base, prompts for a command, then reads and processes the command; the processing may modify the working data base. The following are legal commands:

  1. (quit)
    asks for a file into which to write the descriptor list and the working data base, updates these files, then quits.

  2. (reset)
    reinitializes the working data base to its original contents, namely, the list of records that was read from the file prior to the user giving any commands. This command essentially undoes the effect of earlier commands.

  3. (sort field-name1 field-name2 ...)
    rearranges the records in the working data base so that the contents of the first field are in ascending order. Records containing identical values in the first field are ordered by the contents of the second field, and so on. The field names must all be different, so the maximum number of arguments to the sort command is the number of fields. There must be at least one field given with a sort command.

    If the working data base is as given in Figure 1, (sort name) will change it to

        (("brendan" "lawrence welk" (march 28 1970) 4)
         ("clint" "tammy wynette" (april 15 1965) 1)
         ("jieae" "barry manilow" (february 29 1972) 2)
         ("mike" "britney spears" (august 17 1950) 7)
         ("paul" "frank sinatra" (may 19 1977) 2)
         ("robert" "placido domingo" (july 31 1970) 3)
         ("tim" "frank sinatra" (december 1 1969) 3))

    and (sort favorite-musician years-in-CS3) will change it to

        (("jieae" "barry manilow" (february 29 1972) 2)
         ("mike" "britney spears" (august 17 1950) 7)
         ("paul" "frank sinatra" (may 19 1977) 2)
         ("tim" "frank sinatra" (december 1 1969) 3)
         ("brendan" "lawrence welk" (march 28 1970) 4)
         ("robert" "placido domingo" (july 31 1970) 3)
         ("clint" "tammy wynette" (april 15 1965) 1))

    The command (sort favorite-musician birthday) would have switched the order of the records for Paul and Tim.

  4. ( field-name operator value )
    filters the working data base, keeping only those records for which the given field has the given relationship to the given value. A command of this kind is called a query. Legal operators are <, =, and >. For numeric fields, these have the standard meanings. For string fields, < and > mean "precedes in alphabetical order" and "follows in alphabetical order", and = means "is the same string as". For dates, the order is chronological; later dates are "greater than" earlier dates. Some examples, again using the data base from Figure 1:

    command

    new working data base

    (birthday > (may 1 1970))
    (("jieae" "barry manilow" (february 29 1972) 2)
     ("robert" "placido domingo" (july 31 1970) 3)
     ("paul" "frank sinatra" (may 19 1977) 2))
    (name < "master")
    (("jieae" "barry manilow" (february 29 1972) 2)
     ("brendan" "lawrence welk" (march 28 1970) 4)
     ("clint" "tammy wynette" (april 15 1965) 1))
    (favorite-musician = "frank sinatra")
    (("tim" "frank sinatra" (december 1 1969) 3)
     ("paul" "frank sinatra" (may 19 1977) 2))

    (or query1 query2 ...)
    filters the working data base, keeping only those records that satisfy at least one of the given queries. For example, again using the data base from Figure 1, the command (or (name < "master") (years-in-CS3 > 2)) would change the working data base to

    (("mike" "britney spears" (august 17 1950) 7)
     ("tim" "frank sinatra" (december 1 1969) 3)
     ("jieae" "barry manilow" (february 29 1972) 2)
     ("brendan" "lawrence welk" (march 28 1970) 4)
     ("robert" "placido domingo" (july 31 1970) 3)
     ("clint" "tammy wynette" (april 15 1965) 1))

    Note that records appear only once even if they satisfy more than one of the or queries. There must be at least one query given with an or command, but there is no upper limit on the number of queries given with an or . (One can get the effect of and by giving multiple queries in sequence.)

  5. (total field )
    given a numeric field, prints the sum of the values of the fields in records in the working data base. If the working data base is as given in Figure 1, the command (total years-in-CS3) prints 22. Specifying a string or date field with the total command is illegal.

Problem

You are to fill in the blanks in the framework and supply missing procedures to complete the program. Procedures you are to provide include the following:

legal-descr-list? , which returns true if its input is a correctly formatted field descriptor list and false otherwise. Correct format means

legal-db? , which returns true exactly when its first input is a correctly formatted data base according to its second input, a legal field descriptor list. Correct format means

legal-cmd? , which takes two inputs, a command provided by the user and a descriptor list, and returns true exactly when the command has the form of one of the commands just described.

These procedures will resemble procedures you wrote for lab activities early in CS 3 to test for various kinds of legal sentences (e.g. dates).

You will also need to write the code (and probably change the framework code a little bit) to make queries, sort, or, and total work. The framework already includes a call to a procedure called field-total, but you have to define it. The framework doesn't contain any calls to sort, or, or query code.

You will, of course, need helper procedures. Three that we found very useful are

Miscellaneous requirements

Put your completed program in a file named db.scm in a directory named finalproject, along with a file named db.tests that contains your tests. (Also provide the file partners.readme if you're working in partnership.) As in previous assignments, test your program in such a way as to exercise it completely. Provide example runs in which all the commands are used, all error cases are tested, and all types of query (<, =, and > both with strings, numbers, and dates) are included. Also provide examples of queries that filter some, none, and all of the working data base. Test your procedures in isolation as well. Don't change the code we've provided other than to fill in the blanks in the process procedure.

This project involves the use of Scheme procedures that you have not yet used, for instance, predicates for handling strings:

Procedures for reading from and writing to files are described in chapter 22 of Simply Scheme. If you wish, you may just take this code on faith, since you shouldn't include reading or printing procedures in any of the code that you write.

Suggested approaches

One possible approach to this project would be to design and test the legal-...? procedures first. They will be easy to test in isolation since they don't interact with any other part of the program. Another approach would be to assume that all input will be correctly formatted, and to plunge into the design of the query-handling code.

Much of this project should involve the reuse of code from previous assignments. In your first checkoff, let your t.a. know what code you plan to reuse and how you plan to use it.