University of California, Berkeley

CS 186 Spring 2007

Homework 0 - Are you in the course?

Due Tuesday January 23, 2007 10:00pm

[Homeworks] [Syllabus and Lecture Notes] [Resources]

Introduction

This assignment has two goals:

To determine who is really taking part in the course.
To help you get started with the development environment that you'll be using for Homeworks 1, 2, and 4.

To satisfy the first goal, people who do not turn in Homework 0 will be removed from the course. This assignment must be done individually by each student.

This shouldn't be a problem for anybody who's actually in the course, since the assignment is quite simple:

Get a cs186 instructional account.
Download some files, compile them, and submit the output.

Details

As you will learn in Chapter 9 of the Cow Book, database systems do not trust the Operating System to optimally perform memory and file management. Databases explicitly organize records in memory pages, pages into "heap files", and keep track of which pages are in memory at any point in time. In homework 1 you will become more aquainted with buffer management, and in homework 2 you'll work with the structure of heap files in great detail.

The homework projects for this class are based on Minibase, a fully functional instructional database created at the University of Wisconsin. The version we are using was written in Java. It will be possible to work on the assignments using command-line tools on the instructional machines. It will also be possible to use the Eclipse IDE on either the instructional machines or your own computer. (Eclipse is available for free at http://www.eclipse.org.) Eclipse is a fabulous environment to program and debug in, but it is fairly resource intensive, and will perform poorly on a heavily loaded machine.

The file you will need to download is: Homework0.zip. This is a complete eclipse project zipped into a single file, but you can also run everything from the command line. No matter whether you want to use the command line or eclipse, start by copying "Homework0.zip" to the main directory of your cs186 account, and unzip it with the command: "unzip Homework0.zip".

Command Line Instructions

Change to the "Homework_0" subdirectory: "cd Homework_0"
Compile the java source file, which is either:

(unix) javac -classpath "minibase-exceptions.jar:minibase-heap.jar:minibase-diskmgr.jar:minibase-global.jar" HeapFileScan.java
(windows) javac -classpath "minibase-exceptions.jar;minibase-heap.jar;minibase-diskmgr.jar;minibase-global.jar" HeapFileScan.java

Then run the program with one of the following commands:

(unix) java -classpath "minibase-exceptions.jar:minibase-heap.jar:minibase-diskmgr.jar:minibase-global.jar:." HeapFileScan
(windows) java -classpath "minibase-exceptions.jar;minibase-heap.jar;minibase-diskmgr.jar;minibase-global.jar;." HeapFileScan

Submit the output that appears on the command line.

Eclipse Instructions

Start up eclipse. It may ask you for a workspace, choose any subdirectory in your account.
You'll need to create a project, so select "File" -> "New" -> "Project"

In the wizard that appears, select "Java Project" as the type of project, and click "Next"
In the next panel, specify "Homework_0" as the project name, and click on the "Create project from existing source" button and specify the unzipped Homework_0 directory.
Click Finish.

If you haven't used eclipse before, you may not see anything but a "Welcome" screen at this point. If so, select "Window" -> "Open Perspective" -> "Java". If you still don't see anything but the welcome screen, you may need to click the button in the upper right of the welcome screen.
There should now be a navigation tree in the left side of the screen, containing "Homework_0".
If "Project" -> "Build Automatically" is not selected, you will need to right-click on the "Homework_0" project in the package explorer, and select "Build Project" from the popup menu.
Right-click on the "Homework_0" project in the package explorer, and select "Run As..." -> "Java Application". A selection box will appear, from which you must choose "HeapFileScan" as the class to run.
The output from the program will appear in the console window, submit this text.

Submission Instructions

Copy the output you wish to submit into a file named hw0.output.
Create a new directory named hw0.
Move hw0.output into the hw0 directory.
From within the hw0 directory, type submit hw0.

The HeapFileScan program uses the database's "heapfile" package to:

create a database on disk.
create two files in that database containing different numbers of records,
then read through the files using a sequential scan to make sure they contain the right number of records.

Take a look at the source code for HeapFileScan.java, and see how files are written and read. This will be good background for future assignments.

Extra "Credit"

For the future assignments, it will help to know some more about how files work in the database. If you're inclined, spend some time using the debugger to see how files are created, records inserted, and files scanned. This is optional, but you might learn something.

Some background: first, a Minibase database is a fixed collection of of pages. All information stored in the database must fit within these pages. Initially these pages are allocated on disk, and a buffer manager migrates these pages back and forth to memory whenever they need to be read or written. In most applications, the number of pages that fit in memory is smaller than the size of the database on disk, so the buffer manager needs to be selective about what it keeps in memory (you'll be implementing a buffer manager in Homework 1). In the "HeapFileScan.java" file, the following initialization takes place:

        Minibase.initBufMgr(new DummyBufMgr());

This creates a buffer manager for the database to use. In this case, we're using the DummyBufferManager which is not smart, and just allocates main memory for every page, leading to lots of swapping on any large database. After the buffer manager is created, the database is created on disk with a given size. This needs to happen after the buffer manager is created, since the database needs to use the buffer manager to make sure certain pages with database catalog information get into memory, and then are written to disk.

        Minibase.initDiskMgr(dbpath, 100);

Minibase keeps files, called heapfiles, as a set of pages that can be either on disk or in memory, and movement between disk and memory is performed by the buffer manager. Some of the pages contain only tuples, i.e., fixed length data records. Other pages contain directory information, indicating which database pages are part of the file. Some of the classes used to implement heapfiles are:

heap.Tuple - the class that encapsulates a fixed length record with a specified number of fields, each field is either an integer, float, or fixed length string. This class can be serialized as a fixed length array of bytes, to put into a page. Each tuple starts with an array of data types and offsets, so that the structure of the tuple can be determined from the bytes.
diskmgr.Page - this class represents a fixed size page that can moved between the database on disk and the buffer pool in memory. The contents of the page are raw data and not interpreted.
heap.HFPage - this is a subclass of diskmgr.Page used for holding records (arrays of bytes). A record could be a tuple, or anything else that can be serialized as an array of bytes. Each record put into a slot in the page. Space is reserved at the beginning of each page for holding the number of slots used, amount of free space, and page IDs for the current, previous, and next page in the file. After this is an array of number pairs, one for each slot, listing the size and offset of each record slot. Each record can be a different size, though of course records may not be larger than the size of a page. In this implementation, the size/offset array grows up from the beginning of the page, and the record-occupied space grows down from the end up the page, with the middle always a contiguous area of free space. The HFPage class provides a way to iterate through the records stored in a page.
heap.Heapfile - this class describes a database file. The file has a 2-level structure, with directory pages containing pointers to data pages, which contain the actual records. Both types of pages are implement as HFPage objects - the data pages contain whatever Tuples are added to the file, but the directory pages instead contain DataPageInfo objects (see below). When a heapfile is created, it has only a single directory page. Each time a record is added, the directory pages are searched for a data page with enough space to hold the new record. If none is found, a new data page is created, and an entry for that data page is added to the directory page. When directory pages are filled up, new ones are added as needed.
heap.DataPageInfo - this class is similar to Tuple, in that it can be serialized into an array of bytes for storage in a page. While a Tuple is a general data structure, the DataPageInfo only holds 3 integers describing a page: the pageID, number of records, and available space in the page. DataPageInfo is more compact that Tuple, however, lacking the array of size/offsets at the beginning.
heap.Scan - this class provides methods for doing a scan of a heapfile, iterating record-by-record over the file.

The source code for the heap package can be found here. If you're using eclipse, you can point it to this jar file when stepping through heap classes in the debugger.