UNIVERSITY OF CALIFORNIA
|
|||||||||||||||||||||||||||||||||||||||||
Assignment 1 |
Prof. Joe Hellerstein Dr. Minos Garofalakis |
||||||||||||||||||||||||||||||||||||||||
Assignment 1: PostgreSQL 8.0.3 Buffer Manager |
|||||||||||||||||||||||||||||||||||||||||
In the previous assignment, we learned how to set up a database cluster
(using initdb), how to start the postgres master process (using pg_ctl), how
to create a specific database (using createdb), and how to query this
database (using a front end tool like pgaccess or psql). Now we start with the actual C code hacking. The aim is to modify
functionality in the backend server of PostgreSQL. For this first assignment,
we focus on just one core module - the buffer manager. The actual coding for this assignment is minimal,
given the highly organized and modular PostgreSQL code (the code for the
buffer manager policy is cleanly separated from the rest of the code base).
However, you have to figure out what code to modify, which is non-trivial!
There are three major parts to this project:
Partners?There are two parts to this assignment, the first of which is to be done
individually and the second in teams of 2 people. For this, you will work in
groups of 2. Please form a your team of 2 and register at the group
registration webpage by Friday Sept. 9. DeadlinesYou have to submit this assignment in TWO PHASES:
The second part
is expected to take much more time than the first. So manage your time
accordingly! Tasks and Grade Percentage
The rest of this document describes these steps in detail. Note that in the code and in this document a "buffer" or "slot" is what the text and course notes refer as a "buffer frame". Similarly, the terms "refcount" and "pincount" are used interchangeably. 1. Understanding Different Buffer Replacement Strategies. (20%)The textbook describes LRU, MRU and CLOCK strategies in section 9.4.1.
However, you may need to read the entire section (9.4) to get a full image of
a buffer manager in the DBMS. A fourth (more recent) strategy is called 2Q,
and is used in the latest version of Postgres (8.0.3). LRU, MRU, and CLOCK
strategies work well enough for most systems so we are going to replace 2Q
with a CLOCK strategy. But first you'll need to exercise your understanding
of the LRU, MRU, and CLOCK buffer replacement strategies. This part of the assignment requires no code. We will give you the number
of page slots that the buffer manager must manage and a sequence of data
blocks accessed by the different DBMS layers above the buffer manager layer.
You will have to track the behavior of the buffer manager (which pages it has
to replace, when, etc.) using your, and yours alone, knowledge of the three
aforementioned buffer replacement strategies. Solution format for this step: You
must record your answers in three plain text files called strategy.lru,
strategy.mru, strategy.clock. These files will contain the buffer/slot state
and buffer manager replacement decisions in accordance to the LRU, MRU, and
CLOCK policies respectively. The format of each file is as follows:
For example, assume there are four page slots your buffer manager must
manage: P1, P2, P3, P4. All four slots are initially empty. When the buffer
pool has unused slots (e.g., in the beginning when all four slots are empty),
it will put the newly read data in the first unused slot. The blocks to be
read from disk are labeled A through F. For simplicity here, we assume that
after each access the page is pinned, and then immediately unpinned (you
cannot make this same assumption when writing the actual code. A page may be
pinned for any length of time in a DBMS!) Given this information and the following workload:
Now we come to the problem that you will have to solve. Suppose the buffer
manager has five page slots to manage: P1, P2, P3, P4, P5. All slots are
initially empty. Suppose seven disk blocks (A, B, ..., G) are accessed by the
layers above the buffer manager layer in the DBMS. How to submit: You will need to
use the unix submit program to
hand in your assignment: 1. Save your
three files in a directory called "hw1p1" within your cs186 home
directory. 3. Implementing the CLOCK Buffer Replacement Strategy. (80%)3.1. Setting up PostgreSQL and Other Files/ScriptsEverything needed for the second part of the
assignment is available at: ~cs186/Hw1/ on x86 solaris instructional
machines. Because the source tree of Postgres is quite large, you'll need to
delete the database you have setup for homework 0 before you begin. So start
by: /usr/bin/rm -rf ~/pgdata
cp -r ~cs186/Hw1 ~/. This creates a Hw1 directory in your top level home directory. Inside that directory, you will see two files: setup.py and config.in. The python script setup.py does precisely what its name suggests, so go ahead and execute it (./setup.py) on any instructional machines (basically, if setup.py runs then you can use that machine). After the script completes, you should see the following directory structure setup for you.
Here are instructions for compiling PostgreSQL and incorporating freelist.skel.c into PostgreSQL. Note that compiling/updating may take a few minutes.
cd
Hw1/postgresql-8.0.3/ gmake install # Default: install to location $HOME/pgsql
cp
freelist.skel.c ~/Hw1/postgresql-8.0.3/src/backend/storage/buffer/freelist.c NOTE: If you're working on the instructional machines and need to kill you postmaster instance (due to it not shutting down properly) you can use the kill_postmaster command. In most cases you should be able to cleanly shutdown using pg_ctl -D <data directory> stop. Also, we STRONGLY recommend that you do not modify any other files in the postgresql-8.0.3 directory! Moreover, we suggest you work from your /MyCode directory and replace freelist.c in src/backend/storage/buffer/ only when you need to test your freelist.skel.c changes. 3.1.1 Working from a local machineA tar gzipped file containing the PostgreSQL 8.0.3 code base can be found here.
This should provide you with the necessary files to complete this assignment
using whatever machine you'd like (assuming PostgreSQL 8.0.3 compiles). We
will not support platform issues concerning this code base (I have
successfully tested it using Linux and OS X). 3.2 Implementing CLOCK
Each page in the buffer pool is in one of three states:
To find a page for replacement you start from the current page (initialized with the current clock hand), and repeat the following steps until an available page is found.
3.2.1. Files of interestYou can add and manage any new data
structures that you need. The existing code is not extremely clear, but it is
understandable. It may take you a few hours (or more) to digest it. Since
understanding the existing code is a significant part of the assignment, the
TAs and Professors will not assist you in your understanding of the code base
(beyond what we discuss here). ·
src/backend/storage/buffer/freelist.c - has functions that implement the replacement
strategy. This is the only file you need to replace using your filled in
freelist.skel.c. ·
src/include/storage/buf_internals.h - contains the definition of each buffer frame
(called BufferDesc). Most of the fields in BufferDesc are managed by other
parts of the code. The fields that you'll be interested in are the
BufferDesc.refcount and BufferDesc.reference. The reference field is set to true by the buffer manager when the refcount (a.k.a. pin count) transitions from 1 to 0. You should convince yourself that
this yields the proper semantics needed to successfully execute the CLOCK
algorithm. ·
src/backend/storage/buffer/buf_init.c - Some initialization of the buffer frames occur
in this file. However, you should do all your initialization in freelist.c
(see StrategyInitialize routine in freelist.c). 3.2.2. How to debug?We suggest that
you debug using gdb or ddd with the src/backend/postgres standalone
executable. Make sure you run your version of postgres and not the one in the
class account directory! To start the backend server in stand-alone mode, run
src/backend/postgres -s -D <DATADIR> test. It will use the data directory (and data) you prepared using initdb
-D <DATADIR> and database test you
created using createdb test.
The interface is directly to the backend, so psql features will not work. If
your binary does not work correctly, it may corrupt the data, so be warned. Through the backend, you can directly type SQL statements, although the output is somewhat painful to read. To exit, type CTRL-D. In addition, the -s option will tell the server to print statistics at the end of each query, including the number of buffer hits! Finally, you can restrict the number of buffers PostgreSQL will use by adding the -B <nbuffers> option (<nbuffers> should not be smaller than 16.) so that even small queries generate buffer misses. More information on using the backend in
stand-alone executable can be found on the PostgreSQL web site: http://www.postgresql.org/. We will
provide a C Programming help session (see course blog for this announcement)
that will cover debugging in PostgreSQL.
elog(DEBUG,<format>,<arg>) Please see bufmgr.c for example uses of this routine. 4. Test harness code. (0%)A small test harness stub program can be found in
Hw1/MyStuff/MyCode/buftest.c. This stub file tests the correctness of your
buffer policy (freelist.c) by bypassing the PostgreSQL code and directly
testing your freelist.c implementation.
5. Final submission.How to submit: You will need to use the unix submit program to hand in your assignment: 1. Save your freelist.c
and buftest.c files in a directory called "hw1p2" in your cs186
home directory. Warning: Compilation errors in freelist.c or buftest.c will be harmful to your credit on the respective parts. If your code files depend on changes that you made to other files (not freelist.c or buftest.c), then you should redesign your solution without such modifications. |
|||||||||||||||||||||||||||||||||||||||||