To determine who is really taking part in the course.
To help you get started with the development environment that
you'll be using for Homeworks 1, 2, and 4.
To satisfy the first goal, people who
do not turn in Homework 0 will be removed from the course.
This assignment must be done individually by each student.
This shouldn't be a problem for anybody who's actually in the course,
since the assignment is quite simple:
Get a cs186 instructional account.
Download some files, compile them, and submit the output.
Details
As you will learn in Chapter 9 of the Cow Book, database systems do not
trust the Operating System to optimally perform memory and file
management. Databases explicitly organize records in memory
pages, pages into "heap files", and keep track of which pages are in
memory at any point in time. In homework 1 you will become more
aquainted with buffer management, and in homework 2 you'll work with
the structure of heap files in great detail.
The homework projects for this class are based on Minibase, a fully
functional instructional database created at the University of
Wisconsin. The version we are using was written in Java. It
will be possible to work on the assignments using command-line tools on
the instructional machines. It will also be possible to use the
Eclipse IDE on either the instructional machines or your own
computer. (Eclipse is available for free at http://www.eclipse.org.)
Eclipse is a fabulous environment to program and debug in, but it is
fairly resource intensive, and will perform poorly on a heavily loaded
machine.
The file you will need to download is: Homework0.zip.
This is a
complete eclipse project zipped into a single file, but you can also
run everything from the command line. No matter whether you want
to use the command line or eclipse, start by copying "Homework0.zip"
to the main directory of your cs186 account, and unzip it with the
command: "unzip Homework0.zip".
Command Line Instructions
Change to the "Homework_0" subdirectory: "cd Homework_0"
Submit the output that appears on the command line.
Eclipse Instructions
Start up eclipse. It may ask you for a workspace, choose
any subdirectory in your account.
You'll need to create a project, so select "File" -> "New"
-> "Project"
In the wizard that appears, select "Java Project" as the type
of project, and click "Next"
In the next panel, specify "Homework_0" as the project name,
and click on the "Create project from existing source" button and
specify the unzipped Homework_0 directory.
Click Finish.
If you haven't used eclipse before, you may not see anything but
a "Welcome" screen at this point. If so, select "Window" ->
"Open Perspective" -> "Java". If you still don't see anything
but the welcome screen, you may need to click the button in the upper
right of the welcome screen.
There should now be a navigation tree in the left side of the
screen, containing "Homework_0".
If "Project" -> "Build Automatically" is not selected, you
will need to right-click on the "Homework_0" project in the package
explorer,
and select "Build Project" from the popup menu.
Right-click on the "Homework_0" project in the package explorer,
and select "Run As..." -> "Java Application". A selection box
will appear, from which you must choose "HeapFileScan" as the class to
run.
The output from the program will appear in the console window,
submit this text.
Submission Instructions
Copy the output you wish to submit into a file named hw0.output.
Create a new directory named hw0.
Move hw0.output into the hw0 directory.
From within the hw0 directory, type submit hw0.
The HeapFileScan program uses the database's "heapfile" package to:
create a database on disk.
create two files in that database containing different numbers of
records,
then read through the files using a sequential scan to make sure
they contain the right number of records.
Take a look at the source code for HeapFileScan.java, and see how files
are written and read. This will be good background for future
assignments.
Extra "Credit"
For the future assignments, it will help to know some more about how
files work in the database. If you're inclined, spend some time
using the debugger to see how files are created, records inserted, and
files scanned. This is optional, but you might learn something.
Some background: first, a Minibase database
is a fixed collection of of pages. All information stored
in the database must fit within these pages. Initially these
pages are allocated on disk, and a buffer
manager migrates these pages back and forth to
memory whenever they need to be read or written. In most
applications, the number of pages that fit in memory is smaller than
the size of the database on disk, so the buffer manager needs to be
selective about what it keeps in memory (you'll be implementing a
buffer manager in Homework 1). In the "HeapFileScan.java" file,
the following initialization takes place:
Minibase.initBufMgr(new
DummyBufMgr());
This creates a buffer manager for the database to use. In this
case, we're using the DummyBufferManager which is not smart, and just
allocates main memory for every page, leading to lots of swapping on
any large database. After the buffer manager is created, the
database is created on disk with a given size. This needs to
happen after the buffer manager is created, since the database needs to
use the buffer manager to make sure certain pages with database catalog
information get into memory, and then are written to disk.
Minibase.initDiskMgr(dbpath, 100);
Minibase keeps files, called heapfiles,
as a set of pages that can be
either on disk or in memory, and movement between disk and memory is
performed by the buffer manager. Some of the pages contain only
tuples, i.e., fixed
length data records. Other pages contain directory information,
indicating which database pages are part of the file. Some of the
classes used to implement heapfiles are:
heap.Tuple - the
class that encapsulates a fixed length record with a
specified number of fields, each field is either an integer, float, or
fixed length string. This class can be serialized as a fixed
length array of bytes, to put into a page. Each tuple starts with
an array of data types and offsets, so that the structure of the tuple
can be determined from the bytes.
diskmgr.Page - this
class represents a fixed size page that can moved between the database
on disk and the buffer pool in memory. The contents of the page
are raw data and not interpreted.
heap.HFPage - this
is a subclass of diskmgr.Page used for holding records (arrays of
bytes). A record could be a tuple, or anything else that can be
serialized as an array of bytes. Each record put into a slot in the page. Space is
reserved at the beginning of each page for holding the number of slots
used, amount of free space, and page IDs for the current, previous, and
next page in the file. After this is an array of number pairs,
one for each slot, listing the size and offset of each record
slot.
Each record can be a different size, though of course records may not
be larger than the size of a page. In this implementation, the
size/offset array grows up from the beginning of the page, and the
record-occupied space grows down from the end up the page, with the
middle always a contiguous area of free space. The HFPage class
provides a way to iterate through the records stored in a page.
heap.Heapfile - this
class describes a database file. The file has a 2-level
structure, with directory pages containing pointers to data pages,
which contain the actual records. Both types of pages are
implement as HFPage objects - the data pages contain whatever Tuples
are added to the file, but the directory pages instead contain
DataPageInfo objects (see below). When a heapfile is created, it
has only a single directory page. Each time a record is added,
the directory pages are searched for a data page with enough space to
hold the new record. If none is found, a new data page is
created, and an entry for that data page is added to the directory
page. When directory pages are filled up, new ones are added as
needed.
heap.DataPageInfo -
this class is similar to Tuple, in that it can be serialized into an
array of bytes for storage in a page. While a Tuple is a general
data structure, the DataPageInfo only holds 3 integers describing a
page: the pageID, number of records, and available space in the
page. DataPageInfo is more compact that Tuple, however, lacking
the array of size/offsets at the beginning.
heap.Scan - this
class provides methods for doing a scan of a heapfile, iterating
record-by-record over the file.
The source code for the heap package can be found here. If you're using eclipse, you
can point it to this jar file when stepping through heap classes in the
debugger.