As with most modern database systems,
Postgres provides a number of ways to extend it, both from within
Postgres and SQL. The purpose of this exercise is to famiarize
you with C language extensions to postgres, as well as to give you some
practice with external sorting. You will be writing some simple C
functions that will be accessable from within Postgres, and also
writing some more complicated aggregation functions that will be added
to Postgres.
2 Background: C language extensions to Postgres
Postgres permits a programmer to define a C language function as
accessable from SQL. I gave a demonstration of this in
class. The postgres online documentation contains the PostgreSQL
Programmer's Guide. Of relevence to this assignment is the
section on
Server
Programming, especially chapters 12 (
Extending
SQL Functions), and chapter 15 (
Extending
SQL: Aggregates). Read and understand these sections.
3 Sample Code Provided With This Assignment
Makefile
For postgres to access a function
written in C, the source code must be compiled into a shared library
that postgres loads at runtime. I have provided a simple makefile
that will take any .c file and compile it into a shared library with
the .so suffix. For example, if you have a file called 'test.c',
you can make a shared library 'test.so' by typing 'make test.so'.
The compiler arguments are correct for running on Pentagon or
Rhombus. If you want to run on some other architecture, you'll
have to figure out the correct method for compiling on that
architecture.
loadDB.sh
This script creates a database called
'hw5' and loads some sample data into into a table called numtest.
schema.sql
This script is used by loadDB.sh to
create and load the table of test data.
testData
This file contains data for the numtest
table.
addOne.c
This file contains two sample functions
that you can add to postgres. These functions add one to either
an integer or floating point number.
arrayFunc.c
This file contains a stub for Step 1, a
function for working with array parameters.
simpleAggr.c
This file contains a stub for Step 2,
which involves writing a simple aggregate operator.
medianAggr.c
This file contains a stub for Step 3,
which involves writing a median aggregate operator.
medianAggrExt.c
This file contains a stub for Step 4,
which involves writing a median aggregate operator that uses external
sorting in files.
Step 0: Set up your environment, load the schema, try adding some
functions
For this assignment you will be using the standard installed version of
postgres. You will not need to compile postgres. The only
thing you will be compiling are shared libraries that postgres will
load at runtime. The steps to set up your environment are as
follows:
- Log in to pentagon or rhombus.
- Extract the files for the assignment by typing:
gtar
-xvzf ~cs186/fa03/Hw5/Hw5.tar.gz
- The previous command creates a Hw5 directory. Go to that
directory:
cd
Hw5
- Start postgres by typing:
pg_ctl start
- create a database with sample data by typing:
./loadDB.sh
- compile the sample functions in addOne.c:
gmake
addOne.so
- start up postgres, and try adding the functions that were defined
in addOne.c. Be sure to change cs186-xx in the path to your cs186
login.
psql
hw5
cs186-xx=# CREATE FUNCTION
add_one(int4) RETURNS int4
cs186-xx-# AS
'/home/cc/cs186/fa03/class/cs186-xx/Hw5/addOne' LANGUAGE C
cs186-xx-# WITH (isStrict);
cs186-xx=#
cs186-xx=# CREATE FUNCTION
add_one(float8) RETURNS float8
cs186-xx-# AS
'/home/cc/cs186/fa03/class/cs186-xx/Hw5/addOne','add_one_float8'
cs186-xx-# LANGUAGE C WITH (isStrict);
cs186-xx=#
cs186-xx=# cs186-ab=#
select add_one(19);
add_one
---------
20
(1 row)
cs186-xx=#
cs186-xx=# select
add_one(1.9);
add_one
---------
2.9
(1 row)
cs186-ab=#
- Now try and change the addOne functions. Any time you quit
and restart psql, the shared library can be reloaded. Thus, if
you want postgres to pick up a new version of addOne.so, just quit from
psql, recompile addOne.so, and researt psql. Don't change the
names of the functions, just change the amount they add, e.g., have
them add two or three instead of just one to the numbers they were
passed. Then recompile, start up psql, and repeat the above steps
to make sure that your changes were picked up.
If you make it through these steps, you are ready to try writing your
own functions to extend postgres.
Step 1: Write a Simple Function Using Arrays
Postgres supports the use of arrays as data types. This data type
will prove useful in steps 2, 3, and 4. This step lets you see
how parameters of type array work.
The file
arrayFunc.c contains
a stub of a function that accepts a single parameter that is an array
of integers, and returns a single integer result. To see how
parameters of type array work:
gmake
arrayFunc.so
- start up postgres, and add the 'arrayFunc' function to postgres.
psql
hw5
cs186-ab=# create
function arrayFunc(int4[]) returns int4
cs186-ab-#
as '/home/cc/cs186/fa03/class/cs186-ab/Hw5/arrayFunc','arrayFunc'
cs186-ab-#
language C with (isStrict);
CREATE
cs186-ab=# select
arrayFunc('{1,2,3}');
arrayfunc
-----------
0
(1 row)
cs186-ab=#
- Note the syntax that postgres uses for specifying a constant
array: '{1,2,3}'
For this part of the assignment, you need to change the code is
arrayFunc.c to have it look through the array that was passed to it,
and return the highest value. The code shows how to find out from
the array parameter the number of dimensions (postgres supports
multi-dimensional arrays), the size of each dimension, and the actual
data. Given the definition of the function in postgres, you may
assume the array has no more than one dimension. You must make
sure to handle the case where the array is empty, '{}', in which case
the number of dimensions is zero. Hint: the macro
PG_RETURN_NULL() returns null, which is what you should return as the
largest value in an empty array.
Step 2: Write a Simple Aggregation Function
Once you have read the postgres documentation on aggregate functions,
you will know that an aggregate is defined in terms of two functions:
- the transition function, which is called for each value being
aggregated. It returns a data object used to keep track of the
state of the aggregation, and is passed in the next call of the
transition function, and to the final function. For example, an
aggregation performing AVG might use an array of two real numbers to
keep track of the count and sum of all the elements seen so far.
(Or, to reduce chances of overflow, the running average and count so
far.)
- the final function, which is called at the end of the
aggregation, and which returns the result of the aggregation.
The file simpleAggr.c contains stubs of a transition and final function
for an aggregation operator. Your task is to turn these stubs
into an aggregation operator that finds the 5 greatest values in the
set of aggregated values. As such, the transition function will
need to pass along an array giving the 5 greatest values seen so
far. If fewer than 5 values have been seen so far, it should pass
along a shorter array. The values should be listed from greatest to
smallest. You should ignore duplicate values. Thus, if the data
set contains {1,2,3,3,4,5,6}, your result should be {6,5,4,3,2}, To
define the aggregate function, you will
do the following:
gmake
simpleAggr.so
- start psql, and define the individual functions, and also the
aggregate that uses those functions. You can test the aggregate
on the numtest table.
psql
hw5
cs186-ab=# create function
simple_aggr_transition(int4[], int4) returns int4[]
cs186-ab-# as
'/home/cc/cs186/fa03/class/cs186-xx/Hw5/simpleAggr',
cs186-ab-#
'simple_aggr_transition_i4' language C with (isStrict);
cs186-ab=#
cs186-ab=# create function
simple_aggr_final(int4[]) returns int4[]
cs186-ab-# as
'/home/cc/cs186/fa03/class/cs186-xx/Hw5/simpleAggr',
cs186-ab-#
'simple_aggr_final_i4' language C with (isStrict);
cs186-ab=#
cs186-ab=# create aggregate
simpleAggr(basetype = int4,
cs186-ab-#
sfunc = simple_aggr_transition,
cs186-ab-#
stype = int4[],
cs186-ab-#
finalfunc = simple_aggr_final,
cs186-ab-#
initcond = '{}');
cs186-ab=#
cs186-ab=# select
max(num1), avg(num1), simpleAggr(num1) from numtest;
For your convenience, I have included a function called '
makeOneDInt4Array(int count, int
*elements)', which takes in a regular array of integers, and a
count of the number of integers in the array, and it returns the
complicated data structure that postgres requires as an array.
In this example, the final function is actually not necessary.
All it need to is return the array of the 5 largest values seen so
far. But I've left it in since you'll need to understand the
final function in the next step.
Step 3: Write an In-Memory Median Function
One aggregate operator that is not part of postgres is median.
The median value of a set of numbers is defined as the value for which
there are an equal number of values above and below. (If the
number of values is even, the median is defined as the average of the
two values on either side of the middle.) With the median
function, you must include duplicate values (unlike step 2). The
easiest way to
compute median is to sort all the values, and find the middle one (or
average the two surrounding the middle, if the number of values is
even). Your task in step 3 is to use the stubs in medianAggr.c to
create an aggregate function that computes median. You are to do
this by creating an in-memory array containing all the values, sorting
it, and finding the middle value (or values). The transaction
function's job is to take each new value and to create a new array
containing all previous values, plus the new value. The final
function is passed the array of all values, and must use it to compute
the median. Note that even though the aggregate applies to
integers, the return value from the final function should be a float8,
since the median might be the average of two numbers.
Step 4: Write a Median Function with External Sorting
One problem with the approach is step 3 is that it keeps all the
members of the array in memory at once. For a very large
data set this could cause thrashing and bring the computer to its
knees. A more general solution would be to write out the values
to temporary files whenever the array gets too large, then to use
merge/sort to put the values in order and compute the median that
way. You task in step 4 is to fill in medianExtAggr.c with code
to compute median using external sorting.
The way to add an external sort is to start accumulating values in an
array in memory, as in step 3. When the array gets to big, you
will create a temp file using the Postgres Temporary File functions
described below, and write out the contents of the in-memory
array. How can you know, when you get to the final function, if
you've written out values to a file? When you create a temporary
file, the BufFileCreate() function returns a BufFile pointer.
This pointer, luckily, is the same size as an int4. So it is
possible to store the BufFile pointer in the same array that the
transition function uses for keeping integers. I recommend using
the first element of the array to store a BufFile pointer, and have it
initialized to 0 indicating that you haven't written anything to a file
yet. You might also want to use the second element of the array
to indicate how many values have been written to the file yet.
Note that the "CREATE AGGREGATE" command allows you to specify the
initial value to be passed to the transition function the first
time. You could specify '{0,0}' as the initial value if you want
to store a file pointer and value count in the first two slots of the
array.
(Note: later versions of Postgres support complex data types, so with
Postgres 7.5 we could keep a file pointer as a separate field of the
data structure passed from one transition function to the next, but
with Postgres 7.2 we're stuck with this hack.)
You must also read below about Postgres memory contexts. Any
memory allocated in the default context in the transition function will
only last until the subsequent call of the transition function.
As a result, you do not want to allocate a temporary file in the
default context, because it will be gone two transition functions
later. The context that will last until the end of the
transaction is "TransactionCommandContext", so you should switch to
this context before creating the timp file, then switch back.
What value should be the maximum number of values before you create a
temporary file? For now, use the value 1024, but this should be a
#define in your code so you can change it to a different value.
Similarly, when you are sorting runs of integers in the first pass,
sort 1024 at a time. When you are merging temp files together,
postgres takes care of buffering the file I/O, so just read one int4 at
a time from each temp file, and postgres will be smart about reads from
disk. Given this buffering, to make sure that the merges don't
require too much memory, don't merge more than 64 files at once.
(And again, the value 64 should be set with a #define so you can change
it if needed.)
Step 5: Performance Testing
You will want to test your code on large numbers of values. I
have created a number of sample data files containing a single column
of random integers, though with some skew added to ensure that the
median and mean are not the same. These files are in
~cs186/fa03/Hw5/sampleData-* where "*" is the number of values in the
file. I recommend working with about 50,000 values, though you
probably have enough disk space for up to 500,000 values. You
will want to do the following:
- NEW: I
have been informed that local storage is available only on rhombus. If you look at
/home/tmp2, you will find a directory for each class account. You
should use this for all your performance testing, since the NFS mounted
file systems greatly affect performance.
- Run initdb to use the temp directory. E.g.:
initdb -D
/home/tmp2/cs186-xx
- Alternately, you can set the PGDATA environment variable to
/home/tmp2/cs186-xx, and then you don't need to specify the -D option.
- Start postgres using the temp directory:
pg_ctl
start -D /home/tmp2/cs186-xx
createdb
hw5
- Start psql, and create a table with one integer column, and copy
values into it from a file:
psql
hw5
hw5# create table numtest(num1 integer);
CREATE
hw5=# copy numtest from '/home/ff/cs186/fa03/Hw5/sampleData-50001'
using DELIMITERS '|';
COPY
hw5=#
- Now make sure you have defined the functions and aggregates for
median, both in-memory and with external sorting.
- To do performance testing, you will need to run postgres in the
backend mode. You must stop regular postgres before starting the
backend in standalone mode:
pg_ctl
stop -D /home/tmp2/cs186-xx
~cs186/pgsql-7.2.2/src/backend/postgres
-s -D /home/tmp2/cs186-xx hw5
- Find the median using the in-memory or external sorting versions
of the median function, e.g.
backend>
select median(num1) from numtest;
blank
1: median (typeid = 701, len = 8, typmod
= -1, byval = f)
----
Got #values: 10001
1: median = "548380742"
(typeid = 701, len = 8, typmod = -1, byval = f)
----
DEBUG: QUERY STATISTICS
! system usage stats:
!
1.629695 elapsed 1.000000 user 0.030000 system sec
!
[1.000000 user 0.400000 sys total]
!
29/0 [65/66] filesystem blocks in/out
!
29/0 [65/0] page faults/reclaims, 0 [0] swaps
!
0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!
75/142 [295/158] voluntary/involuntary context switches
! postgres usage stats:
!
Shared blocks: 75
read, 0 written,
buffer hit rate = 81.97%
!
Local
blocks: 0
read, 0 written,
buffer hit rate = 0.00%
!
Direct blocks: 0
read, 0 written
I have found that with 50,000 values, the external sort version of
median takes about 2 - 3 seconds of user & system time, whereas the
in-memory version takes about 60-70 seconds. Your mileage will
vary, of course. Also note how the external version has more
voluntary context switches (because the code is waiting on I/O
requests), but fewer involuntary ones. Yet with only 2001
values, the in-memory one is about four times faster than the external
sorting one. This is not at all suprising, since with a small
number of values, it's better not to incur the overhead of writing to
files, but with many values the swapping caused by allocating lots of
memory slows things down a lot. It is possible to find a
compromise between in-memory and external sorting. You can find
this by changing the symbol for the max number of values that you
#defined in your code to be 1024. If that value is larger, then
the code will wait longer before writing values out to a file.
Your mission for step 5 is as follows:
- Load the database with the 50000 value sample file.
- With the max number of values in the array to 1024, compile the
external sorted median code.
- Start up the postgres backend (as described above), and record
the average elapsed, user and system time to execute the median
function across three runs.
- Stop the postgres backend, recompile the median code with max
values set to 4096, and repeat.
- Repeat for max values of 8192, 16384, 24576, and
32768.
- Produce a text file called RESULTS.TXT containing the
tab-delimited columns giving the average times for each max values, of
the following format. The numbers below are made up, yours will
vary.
1024
3.685 0.940 0.969
4096
5.533 4.334 0.999
...etc
- At the bottom of the file, on a separate line, say which value
you think was best.
Notes on Postgres Temporary Files
To write the integers to disk, you will need to be able to open, read,
write and close files. Rather than using the standard C file I/O
library, we require you to use functions within PostgreSQL.
Specifically, the BufFile interfaces that are declared in
postgresql-7.2.2/src/include/storage/buffile.h. These functions are
attractive because you don’t have to deal with file names and you get
buffered I/O for free.
The interface is almost identical to the standard C library functions
(i.e., fopen() and tmpfile(), fclose(), fseek()). Instead of FILE * you
will work with BufFile * pointers, but everything else remains
essentially the same. Note: You have to be careful about “memory
context” in which BufFile structures are allocated. More details in
subsection 4.6. You should only need to use the following functions:
- BufFileCreateTemp(void): This is the equivalent of tmpfile(void)
of the C standard library. It will return a pointer to the associated
file structure, which is allocated in the current memory context.
- BufFileSeek(BufFile *file, int fileno, long offset, int whence):
This is the equivalent of fseek(). The only extra argument is fileno.
The operating system limits the maximum size of a single file
(typically to 231 bytes, i.e. 2 Gbytes for 32-bit architectures).
However, a database file can grow larger than the maximum physical file
size, so PostgreSQL implements BufFiles as a collection of physical
disk files. So file offsets are specified by a fileno, offset pair
(where offset refers within the fileno-th physical file). In any case,
you will only have to seek to the beginning of the file, which you can
do with BufFileSeek(fp, 0, 0L, SEEK SET).
- BufFileClose(BufFile *file): The equivalent of fclose(), will
close the file and de-allocate the space occupied by the file
structure. As soon as you close
a file, it is deallocated and its contents disappear. Do not
close a file until you want it to go away.
- BufFileRead(BufFile *file, void *ptr, size t size): This is the
equivalent of fread() and will read size number of bytes from file into
ptr. As with the C standard library, you have to make sure that you do
not overrun the space pointed to by ptr.
- BufFileWrite(BufFile *file, void *ptr, size t size): This is the
equivalent of fwrite() and will write size bytes from ptr into
file.
Managing memory with contexts
PostgreSQL uses its own memory manager, which performs functions
similar to malloc and free that you are familiar with. However, instead
of allocating memory from a single global heap, the PostgreSQL
allocators work off an abstraction called memory contexts for
convenience and performance. Essentially, each allocated chunk of
memory belongs to a particular context and all chunks can be
deallocated in one shot. Imagine that at some point you allocate a
relatively complex data structure (e.g., a linked list, or a tree, or a
hashtable !). Normally, in order to free the memory it uses, you would
have to traverse the structure and free each node individually. This
traversal is both tedious to write and expensive to perform. Instead,
PostgreSQL allows you to do the following:
treeCxt
= AllocSetContextCreate(...); /* Creates and assigns a new context to
treeCxt */
while (...) { /* Allocate
sizeof(TreeNode) bytes from treeCxt */
newNode
= MemoryContextAlloc(treeCxt, sizeof(TreeNode));
...
}
MemoryContextDelete(treeCxt); /* Free the entire treeCxt context */
For convenience, PostgreSQL provides the functions palloc and pfree
which operate on a default memory context (called the current context
and pointed to by the CurrentContext global variable). Thus
ptr
= MemoryContextAlloc(cxt, n); /* Alloc n bytes from the cxt memory
context */
is exactly equivalent to
oldCxt =
MemoryContextSwitchTo(cxt) /* Current context:save in oldCxt,switch to
cxt */
ptr = palloc(n) /* Allocate n bytes from current (cxt) context */
MemoryContextSwitchTo(oldCxt) /* Restore current contextto oldCxt */
If you are performing several allocations, this can save you some
typing. You can change the current context using
MemoryContextSwitchTo(cxt). It is your code’s responsibility to
properly manage (i.e., set and restore) the current memory context. Be
careful to avoid insidious bugs, in which you forget to reset the
memory context to what it was, and confuse another piece of the code !
Finally, memory contexts are organized in a hierarchy. When creating a
new context with AllocSetContextCreate(), the first argument is the
parent context. Deleting a context also deletes all its child contexts
at the same time. The file
postgresql-7.2.2/src/backend/util/mmgr/README contains a detailed
description of the PostgreSQL memory manager, should you need it,
although the information above should suffice.
Notes on debugging postgres
One problem with extending postgres with your own code is that often
times mistakes in your code simply cause the postgres backend to
crash. The way to debug this is to run postgres in backend mode:
pg_ctl
stop
- launch the backend in the debugger:
gdb
~cs186/pgsql-7.2.2/src/backend/postgres
run
-s hw5
- then type psql commands, and when the backend crashes you can
look at the stack trace, and look at the value of variables.
Also, if you want to do printf style debugging. My experience
shows that doing
'fprintf(stderr,"Hello
World");' seems to work when running either in backend mode or
in regular mode.
What to turn in.
You should submit your versions of arrayFunc.c, simpleAggr.c,
medianAggr.c, and medianExtAggr.c. You should *not* have changed
the names of any functions in those files, otherwise the autograder may
not work. You should also submit RESULTS.TXT, and a README file
containing the names of the people in your group.