University of California, Berkeley

CS 186 Spring 2007

Homework 0 - Are you in the course?

Due Tuesday January 23, 2007 10:00pm


 [Homeworks]       [Syllabus and Lecture Notes]      [Resources

Introduction

This assignment has two goals:
To satisfy the first goal, people who do not turn in Homework 0 will be removed from the course.  This assignment must be done individually by each student.

This shouldn't be a problem for anybody who's actually in the course, since the assignment is quite simple:

Details

As you will learn in Chapter 9 of the Cow Book, database systems do not trust the Operating System to optimally perform memory and file management.  Databases explicitly organize records in memory pages, pages into "heap files", and keep track of which pages are in memory at any point in time.  In homework 1 you will become more aquainted with buffer management, and in homework 2 you'll work with the structure of heap files in great detail.

The homework projects for this class are based on Minibase, a fully functional instructional database created at the University of Wisconsin.  The version we are using was written in Java.  It will be possible to work on the assignments using command-line tools on the instructional machines.  It will also be possible to use the Eclipse IDE on either the instructional machines or your own computer.  (Eclipse is available for free at http://www.eclipse.org.)  Eclipse is a fabulous environment to program and debug in, but it is fairly resource intensive, and will perform poorly on a heavily loaded machine.

The file you will need to download is: Homework0.zip.  This is a complete eclipse project zipped into a single file, but you can also run everything from the command line.  No matter whether you want to use the command line or eclipse, start by copying "Homework0.zip" to the main directory of your cs186 account, and unzip it with the command: "unzip Homework0.zip".
Command Line Instructions
Eclipse Instructions
Submission Instructions


The HeapFileScan program uses the database's "heapfile" package to:
Take a look at the source code for HeapFileScan.java, and see how files are written and read.  This will be good background for future assignments.

Extra "Credit"

For the future assignments, it will help to know some more about how files work in the database.  If you're inclined, spend some time using the debugger to see how files are created, records inserted, and files scanned.  This is optional, but you might learn something.

Some background: first, a Minibase database is a fixed collection of of pages.  All information stored in the database must fit within these pages.  Initially these pages are allocated on disk, and a buffer manager migrates these pages back and forth to memory whenever they need to be read or written.  In most applications, the number of pages that fit in memory is smaller than the size of the database on disk, so the buffer manager needs to be selective about what it keeps in memory (you'll be implementing a buffer manager in Homework 1).  In the "HeapFileScan.java" file, the following initialization takes place:

        Minibase.initBufMgr(new DummyBufMgr());
      
This creates a buffer manager for the database to use.  In this case, we're using the DummyBufferManager which is not smart, and just allocates main memory for every page, leading to lots of swapping on any large database.  After the buffer manager is created, the database is created on disk with a given size.  This needs to happen after the buffer manager is created, since the database needs to use the buffer manager to make sure certain pages with database catalog information get into memory, and then are written to disk.

        Minibase.initDiskMgr(dbpath, 100);

Minibase keeps files, called heapfiles, as a set of pages that can be either on disk or in memory, and movement between disk and memory is performed by the buffer manager.  Some of the pages contain only tuples, i.e., fixed length data records.  Other pages contain directory information, indicating which database pages are part of the file.  Some of the classes used to implement heapfiles are:
The source code for the heap package can be found here.  If you're using eclipse, you can point it to this jar file when stepping through heap classes in the debugger.