The aim of this assignment is to
add functionality to the backend server of PostgreSQL. For this
first assignment, we focus on just one core module - the buffer
manager. The actual coding for this assignment is minimal, given the
highly organized and modular postgreSQL code (the code for buffer
manager is cleanly separated from the rest of the code base). However,
you have to figure out what code to add and where, which is non-trivial!
There are three major parts to this project:
-
Understanding different buffer management strategies
-
Implement a new buffer replacement policy -- simplified 2Q
-
Compare performances of LRU, MRU, and 2Q
Partners?
There are two parts in this assignment. The first one has to be done
individually and the second one is a team project.
For this, you will work in groups of two. You must form a
group and register at the
group registration webpage by Wednesday, February 1st, 2006.
Deadlines
You have to submit this assignment in TWO PHASES:
-
The first part tests your understanding of different buffer
management strategies. It is due at 11:59 pm on Wednesday, February 1st, 2006.
This part has to be done individually.
-
The second part requires modifying the actual code and compare different
buffer management strategies. It is due at 11:59 pm on Tuesday, February 14, 2006.
The second part is expected to take much more time than the first. So,
manage your time accordingly!
The rest of this document describes what you have to submit and how
within the relevant sections.
Tasks and Steps
-
Understanding different buffer management strategies (30%)
-
Set up PostgreSQL and related files/scripts
-
Implement the simplified 2Q buffer replacement policy (40%)
-
Compare performances of LRU, MRU, and 2Q (30%)
1. Understanding Different Buffer Replacement Strategies (30%)
The textbook describes LRU, MRU and CLOCK strategies in section 9.4.1.
However, you may need to read the whole section of 9.4 to get a more complete view
of buffer management in DBMS. A fourth (more recent) strategy is called 2Q.
This is the replacement strategy you will implement. LRU, MRU, and CLOCK strategies
work well enough for most systems. However, for some workloads,
'second chance' algorithms or those with more 'history' can work
better. A problem with the LRU and CLOCK strategies is buffer pages that gets
accessed only once must sit through the entire queue cycle before being replaced.
2Q and LRU2 are algorithms that try to address this problem. In this
assignment, you will examine 2Q in depth. Ther are
different versions of 2Q: simplified and full. You will be
implementing only the simple versions in this assignment.
The original research paper introducing 2Q can be downloaded from:
http://www.vldb.org/conf/1994/P439.PDF OR
http://www.acm.org/sigmod/vldb/conf/1994/P439.PDF
You may want to read the description of simplied 2Q in the 2Q paper
(from the beginning of section 2 until "2Q, Full Version")
to better understand the 2Q algorithm before you begin your implementation.
However, this handout contains all the information about 2Q that
you will need to do the assignment.
Simplified 2Q Algorithm
The idea behind the 2Q algorithm is to manage two separate queues in the buffer
manager. The first queue is a FIFO queue (A1) and
the second queue is LRU (Am). The pseudo code of the simplified
version of 2Q for unpin(buf) and find a free slot for a page (p) is the following:
unpin(buf){
if buf was on the Am queue then
put buf on the front of the Am queue
else if buf was on the A1 queue then
remove buf from the A1 queue
put buf on the front of the Am queue
else
put buf on the front of the A1 queue
end if
}
The pseudo code for find_free_slot(p) is:
find_free_slot(p){
if there are free slots available then
put p in a free page slot
else if A1's size is above threshold or Am is empty
delete from the tail of A1
put p in the freed page slot
else
delete from the tail of Am
put p in the freed page slot
end if
}
For this assignment, you can set the threshold to half of
the available buffer pages. Note, that find_free_slot(p) is called
during the access of a new page.
Now we will test your understanding of
the four buffer replacement strategies we've learned so far.
There is no PostgreSQL for this part.
We will give you the number of page slots that
the buffer manager must manage and then the sequence of data blocks accessed by
the different DBMS layers above the buffer manager layer. You will have to
track the behaviour of the buffer manager (which pages it has to replace, when, etc.) based
on your understanding of the four buffer management strategies.
Solution format for this step: You must record your answers in four plain
text files called strategy.lru, strategy.mru, strategy.clock and strategy.s2q.
These files will contain buffer manager behavior information for LRU, MRU, CLOCK and simplified 2Q respectively.
The format of each file is as follows:
- Three columns of text, separated by spaces
- One line of text each time a page miss occurs
- Each row lists the time of a page miss, a space, the buffer pool
page (P1, P2, P3, P4), a space, and if an eviction occurs, the page
evicted from memory
For example, assume there are four page slots your buffer manager must manage: P1,
P2, P3, P4. All four slots are empty at the begining. When the buffer pool has unused slots (such
as at the beginning, when all four slots are empty), it will put newly read
data in the first unused slot. The blocks to be read from disk
are labelled A through F. For simplicity here, we assume that after each access the page is pinned, and
then immediately unpinned (you cannot make this same assumption when
writing the actual code. A page may be pinned for any length of time in a DBMS!)
Given this information and the following workload:
Time
|
T1
|
T2
|
T3
|
T4
|
T5
|
T6
|
T7
|
T8
|
T9
|
T10
|
Block Accessed
|
A
|
B
|
C
|
D
|
E
|
F
|
A
|
B
|
C
|
D
|
the files should be as follows:
strategy.lru
T1 P1
T2 P2
T3 P3
T4 P4
T5 P1 A
T6 P2 B
T7 P3 C
T8 P4 D
T9 P1 E
T10 P2 F
|
strategy.mru
T1 P1
T2 P2
T3 P3
T4 P4
T5 P4 D
T6 P4 E
T10 P3 C
|
strategy.clock
T1 P1
T2 P2
T3 P3
T4 P4
T5 P1 A
T6 P2 B
T7 P3 C
T8 P4 D
T9 P1 E
T10 P2 F
|
strategy.s2q
T1 P1
T2 P2
T3 P3
T4 P4
T5 P1 A
T6 P2 B
T7 P3 C
T8 P4 D
T9 P1 E
T10 P2 F
|
Note that not each time is listed for each strategy, since not each
time has a buffer miss. Note also that the third column will
be blank when the buffer miss does not result in an eviction.
Finally, observe that in this case, for the given number of
pages and sequence of block accesses, MRU emerges the winner (least number of misses).
Now we come to the problem you have to solve. Suppose the buffer manager has
five page slots to manage: P1, P2, P3, P4, P5. All slots are empty to begin with.
Suppose seven disk blocks
(A, B, ..., G) are
accessed by the layers above the buffer manager layer in the DBMS.
To access your access pattern, login to your cs186 account and type
mypattern at the prompt. Remember that you must use your cs186
class account to get your access patter. Based on this pattern, you
should perform a similar analysis as the one above.
Generate the FOUR files strategy.lru, strategy.mru, strategy.clock and strategy.s2q for this problem.
Stick to the specified format (e.g., no extra line at the end of a file),
as we will use file comparison scripts to evaluate your answers.
How to submit:
You will need to use the unix submit program
to hand in your assignment:
1. Save your four files in a
directory called "hw1p1" in your cs186 home directory.
2. cd into hw1p1.
3. Run: submit hw1p1
2. Setting up PostgreSQL and Other Files/Scripts
Everything needed for the second part of the assignment
is available at:
~cs186/Hw1/Hw1.tgz on x86 solaris instructional machines.
You can un-tarzip the file:
gtar -zxvf ~cs186/Hw1/Hw1.tgz
This creates an Hw1 directory in your current
directory. Inside the Hw1 directory, you will see two
subdirectories:
- postgresql-7.2.2/: Source code of PostgreSQL version 7.2.2.
Little code was changed between this version and the actual version.
Note that you DO NOT have to change any file inside the postgreSQL code
while doing this assignment
- MyStuff/:
This is the directory in which you will spend most of your time.
It has the following subdirectories:
- MyCode/:
In this directory you will find freelist.c.mru and buf_internals.h.mru,
which have
our implementation of MRU.
PostgreSQL ships with LRU, so our
implementation of MRU is intended to provide you with an example of how
to change the buffer manager’s replacement policy. You will want to
compare this code to
postgresql-7.2.2/src/backend/storage/buffer/freelist.c
and
postgresql-7.2.2/src/include/storage/buf_internals.h
to study the changes we made. This directory also has the same two files
for the LRU policy, called freelist.c.lru and buf_internals.h.lru
(to give you quick reference to the original LRU files).
You will also find TestStub/ in this
directory, which is the test harness code we will discuss later.
- CompileAndUpdateScripts/:
This has scripts that you will use to compile PostgreSQL, and to
put the files that you create/modify into the right places inside the
PostgreSQL code.
- PerformanceEvaluationScripts/:
This contains scripts for carry out performance evaluations between
different buffer management strategies after
you have written the code and compiled it
successfully with the above scripts.
Here are instructions for compiling PostgreSQL and incorporating your
files into PostgreSQL. Note that compiling/updating may take a few
minutes.
- To completely re-compile and install PostgreSQL,
cd
Hw1/MyStuff/CompileAndUpdateScripts
./compile.sh
- After you have prepared your versions of the two files
freelist.c and buf_internals.h, and tested them using the
test harness code, you will have to put them into the right places
in the PostgreSQL code. This should be done using the update script,
which will also compile PostgreSQL with your files.
cd Hw1/MyStuff/CompileAndUpdateScripts
./update.sh <path to your freelist.c file> <path to your
buf_internals.h
file>
NOTE: You should edit your own versions
of freelist.c and buf_internals.h in the MyCode/ directory, and use the
update.sh script to install them in the right place.
We STRONGLY recommend that you do not
modify the files in the postgresql-7.2.2 directory directly! update.sh
will use your files to replace related files in
src/backend/storage/buffer/ and src/include/storage/, and compile your
code with PostgreSQL.
In this process, the two original files (which existed there before
you ran update.sh)
will be
copied for your
reference as *.c.bak and *.h.bak in their original directories. Of
course, you need to pay attention to any errors produced by the compiler
while running the compile.sh script.
3. Implementing the Simplified 2Q Buffer Replacement Strategy (40%)
3.1 How to implement?
The description of simplified 2Q algorithm is available in Section 1.
The actual implementation of 2Q is rather straightforward once you
understand the existing code. One hint we’d like to give here is
that
you can modify the LRU freelist queue to fit the Am queue. You can add
and
manage any new data structure you need. The existing code is not extremely clear, but it is
understandable. It may take you a few hours (or more) to understand it.
Since understanding the existing code is a significant part of the
assignment, the TAs and Professor will not assist you in understanding
the existing code (beyond what we discuss here).
The buffer manager code is neatly separated from the rest of the
codebase. It is located in src/backend/storage/buffer and
src/include/storage. For 2Q, we are interested in modifying only two
files.
src/backend/storage/buffer/freelist.c
- has functions that implement the replacement strategy.
src/include/storage/buf
internals.h
- contains the definition for each buffer description (called
BufferDesc). Most of the fields in BufferDesc are managed by other
parts of the code, but for efficiency some fields are used to manage the
buffer space while it is eligible for replacement (when it’s on the
existing LRU freelist).
Some initialization of the queue occurs in
src/backend/storage/buffer/buf_init.c. However, you must do all your
initialization in freelist.c.
You will modify these two files to change the implementation from the
existing LRU queue to the simplified 2Q discipline only. You may (depending on your implementation) find that you do
not need to edit both files or change many of the functions.
3.2 How to test and debug?
A small test harness stub program is available in
Hw1/MyStuff/MyCode/TestStub/. This stub tests the correctness of your
two files by bypassing the PostgreSQL code and
directly testing your files
freelist.c and buf_internals.h.
Copy your versions of freelist.c and
buf_internals.h
to override the original ones in TestStub/ and TestStub/storage
respectively, and run make to
compile the
test harness stub. Full details about using the test stub are available
as a README file in the directory.
You may want to modify the stub to add additional
test cases (will need very simple changes to the file buftest.c). In fact,
we recommend you test your two files by using a variety of test cases
through this test stub. Also, check out carefully what your code is
suppose
to do when there are no free buffers.
The actual test stub we use for grading will be different;
the one provided is only for your convenience (and to make sure your
code will compile with our test stub)!
4. Performance Comparison of LRU, MRU, and 2Q (30%)
[IMPORTANT]
-
There are changes to the 2Q algorithm described in the previous section, the changes in short are:
-
the algorithm we previous called as access(p) should be unpin(buf), the old algorithm works for the first part of this assignment because we assume immediate unpining of the page once access/pin it
-
find_free_slot(p) is called in the procedure of get page process (please refer to getPage function in TestStub)
-
if the buffer pool is full, Am is empty, A1 is not empty but does not exceed the threshold, we delete buffers from the tail of A1
-
For Part4, You have to run the performance evaluation script and submit the results from the same account, because different account get different datafile, and thus different results.
Here you will compare 2Q with LRU and MRU. There are three scripts
we
prepared for you, all in Hw1/MyStuff/PerformanceEvaluationScripts:
- ./DoInitDB.sh <DATADIR>: the script for you to init the
data directory.
- ./PrepareTables.sh <DATADIR> [index]:
To create the test database and tables. This will also insert
data into the tables.
[index] is an optional
flag
that you will use in some experiments to create a table with an index
on its primary key.
- ./Query.sh <logfile> <DATADIR> <2q|lru|mru>
<bufnum>:
Will run the test query on the tables created above.
Make sure you have compiled (Refer to Section 2 about
how to compile) and installed your 2Q
version
before you run it with the 2q flag. The statistics of the query will
be written in <logfile>. <DATADIR> is the same directory as
(1) and (2). Use 2q, lru or mru to assign which replacement policy you
want to use. If the flag is set to lru or mru, the script will use our
implementation of LRU and MRU policies that we have stored in a
special directory. If the flag is set to 2q, the script will use your
implementation - which is based on 2Q. You will need to recompile your
code when changing from simplified 2Q implementation to the full 2Q
implementation.
<bufnum> is the number of buffers you want to use.
The <logfile> you get is like:
DEBUG: EXECUTOR STATISTICS
! system usage stats:
! ... ... ... ...
DEBUG: EXECUTOR STATISTICS
! system usage stats:
! ... ... ... ...
! postgres usage stats:
! Shared blocks: 842 read, 0 written, buffer hit rate = 86.31%
! ... ... ... ...
This is interpreted as follows: 842 is the actual read requests that
had to be passed onto the disk (= total page requests for reads -
buffer hits for reads), and 86.31% is the hit rate of the query (=
total buffer hits for reads / total page requests for reads).
The script PrepareTables.sh loads two tables R and S into the
database. It can be passed a flag to make it create an index for S
on its primary key. The script Query.sh executes the following query:
SELECT <some attributes> FROM R, S WHERE
R.ForeignKey = S.PrimaryKey
You are asked to do two experiments.
- Prepare your data without index on S:
cd Hw1/MyStuff/PerformanceEvaluationScripts
./DoInitDB.sh <DATADIR>
./PrepareTables.sh <DATADIR>
Run ./Query.sh with lru or mru or 2q and different <bufnum>,
fill in the table below:
<bufnum> |
280 |
450 |
580 |
620 |
750 |
900 |
LRU_hitrate |
|
|
|
|
|
|
MRU_hitrate |
|
|
|
|
|
|
Simplified 2Q_hitrate |
|
|
|
|
|
|
- Prepare your data with index on primary key of S:
cd Hw1/MyStuff/PerformanceEvaluationScripts
./DoInitDB.sh <DATADIR>
./PrepareTables.sh <DATADIR> index
Run ./Query.sh with lru or mru or 2q and different <bufnum>,
fill in the table below:
<bufnum> |
280 |
450 |
580 |
620 |
750 |
900 |
LRU_hitrate |
|
|
|
|
|
|
MRU_hitrate |
|
|
|
|
|
|
Simplified 2Q_hitrate |
|
|
|
|
|
|
Please answer some questions based on experiment results you get:
- For experiment 1, are the hit rates for LRU and 2Q as expected
from the
discussion in class or section? Explain your answer.
-
For experiment 1, when blocks are accessed as in sequential
flooding, we expect to
get
a non-zero hit rate for the MRU policy for any number of buffers. Is
this happening here? If not, why?
- For experiment 2, what are the main differences compared with
experiment 1? Explain the cause of these differences.
- Tables R and S are the same size. Can you tell the approximate
<bufnum> used by R (or S)? How do you know that?
- What kind of access patterns will 2Q have better performance than LRU in
terms of hitrates?
A Hint: In answering the questions above, you may be interested to
know that:
-
PostgreSQL uses 278 out of the <bufnum> buffers to
initialize
your query (e.g. load
system tables), and then unpins these buffers before the query
processing begins.
-
When there is no index on S, PostgreSQL executes this particular query
using the Nested
Loop Join algorithm. For every tuple from the R relation, the
system scans the complete S relation to find all S tuples that match
this R tuple. Then the next R tuple is read, and so on...
Note that the R relation is scanned only once.
5. Submitting Code and Results from Sections 3 and 4
You must submit at least 5 files
(even if they are incomplete or
unchanged):
- README - Your team name, the members of your group (names and class
accounts),
what works, and what doesn’t.
- freelist.c.s2q - Simplified 2Q implementation.
- buf_internals.h.s2q - Simplified 2Q implementation.
- perf.txt - TEXT ONLY! Performance comparison tables
six columns six rows of numbers separated by space, which represent the two tables we asked you to fill.
template
Submit from the account that you do all the experiments.
- answers.txt - answers to the questions.
How to submit:
You will need to use the unix submit program
to hand in your assignment:
1. Save your five files in a
directory called "hw1p2" in your cs186 home directory.
2. cd into hw1p2.
3. Run: submit hw1p2