Homework 5: Extending Postgres


CS186: Introduction to Database Systems
UC Berkeley

November 18, 2003
Due: Monday, December 8, 2003, at 8PM PST

Note: this will be a group project!

1 Introduction

As with most modern database systems, Postgres provides a number of ways to extend it, both from within Postgres and SQL.  The purpose of this exercise is to famiarize you with C language extensions to postgres, as well as to give you some practice with external sorting.  You will be writing some simple C functions that will be accessable from within Postgres, and also writing some more complicated aggregation functions that will be added to Postgres.

2 Background: C language extensions to Postgres

Postgres permits a programmer to define a C language function as accessable from SQL.  I gave a demonstration of this in class.  The postgres online documentation contains the PostgreSQL Programmer's Guide.  Of relevence to this assignment is the section on Server Programming, especially chapters 12 (Extending SQL Functions), and chapter 15 (Extending SQL: Aggregates).  Read and understand these sections.

3 Sample Code Provided With This Assignment

Makefile
For postgres to access a function written in C, the source code must be compiled into a shared library that postgres loads at runtime.  I have provided a simple makefile that will take any .c file and compile it into a shared library with the .so suffix.  For example, if you have a file called 'test.c', you can make a shared library 'test.so' by typing 'make test.so'.  The compiler arguments are correct for running on Pentagon or Rhombus.  If you want to run on some other architecture, you'll have to figure out the correct method for compiling on that architecture.

loadDB.sh
This script creates a database called 'hw5' and loads some sample data into into a table called numtest.

schema.sql
This script is used by loadDB.sh to create and load the table of test data.

testData
This file contains data for the numtest table.

addOne.c
This file contains two sample functions that you can add to postgres.  These functions add one to either an integer or floating point number.

arrayFunc.c
This file contains a stub for Step 1, a function for working with array parameters.

simpleAggr.c
This file contains a stub for Step 2, which involves writing a simple aggregate operator.

medianAggr.c
This file contains a stub for Step 3, which involves writing a median aggregate operator.

medianAggrExt.c
This file contains a stub for Step 4, which involves writing a median aggregate operator that uses external sorting in files.

Step 0: Set up your environment, load the schema, try adding some functions

For this assignment you will be using the standard installed version of postgres.  You will not need to compile postgres.  The only thing you will be compiling are shared libraries that postgres will load at runtime.  The steps to set up your environment are as follows:
gtar -xvzf ~cs186/fa03/Hw5/Hw5.tar.gz
cd Hw5
pg_ctl start
./loadDB.sh
gmake addOne.so
psql hw5
cs186-xx=# CREATE FUNCTION add_one(int4) RETURNS int4
cs186-xx-#  AS '/home/cc/cs186/fa03/class/cs186-xx/Hw5/addOne' LANGUAGE C
cs186-xx-#  WITH (isStrict);
cs186-xx=#
cs186-xx=# CREATE FUNCTION add_one(float8) RETURNS float8
cs186-xx-#  AS '/home/cc/cs186/fa03/class/cs186-xx/Hw5/addOne','add_one_float8'
cs186-xx-#  LANGUAGE C WITH (isStrict);
cs186-xx=#
cs186-xx=# cs186-ab=# select add_one(19);
 add_one
---------
      20
(1 row)
cs186-xx=#
cs186-xx=# select add_one(1.9);
 add_one
---------
     2.9
(1 row)

cs186-ab=#
If you make it through these steps, you are ready to try writing your own functions to extend postgres.

Step 1: Write a Simple Function Using Arrays

Postgres supports the use of arrays as data types.  This data type will prove useful in steps 2, 3, and 4.  This step lets you see how parameters of type array work.

The file arrayFunc.c contains a stub of a function that accepts a single parameter that is an array of integers, and returns a single integer result.  To see how parameters of type array work:
gmake arrayFunc.so
psql hw5
cs186-ab=#   create function arrayFunc(int4[]) returns int4
cs186-ab-#     as '/home/cc/cs186/fa03/class/cs186-ab/Hw5/arrayFunc','arrayFunc'
cs186-ab-#    language C with (isStrict);
CREATE
cs186-ab=# select arrayFunc('{1,2,3}');
 arrayfunc
-----------
         0
(1 row)

cs186-ab=#
For this part of the assignment, you need to change the code is arrayFunc.c to have it look through the array that was passed to it, and return the highest value.  The code shows how to find out from the array parameter the number of dimensions (postgres supports multi-dimensional arrays), the size of each dimension, and the actual data.  Given the definition of the function in postgres, you may assume the array has no more than one dimension.  You must make sure to handle the case where the array is empty, '{}', in which case the number of dimensions is zero.  Hint: the macro PG_RETURN_NULL() returns null, which is what you should return as the largest value in an empty array.

Step 2: Write a Simple Aggregation Function

Once you have read the postgres documentation on aggregate functions, you will know that an aggregate is defined in terms of two functions:
The file simpleAggr.c contains stubs of a transition and final function for an aggregation operator.  Your task is to turn these stubs into an aggregation operator that finds the 5 greatest values in the set of aggregated values.  As such, the transition function will need to pass along an array giving the 5 greatest values seen so far.  If fewer than 5 values have been seen so far, it should pass along a shorter array. The values should be listed from greatest to smallest. You should ignore duplicate values.  Thus, if the data set contains {1,2,3,3,4,5,6}, your result should be {6,5,4,3,2}, To define the aggregate function, you will do the following:
gmake simpleAggr.so
psql hw5
cs186-ab=#  create function simple_aggr_transition(int4[], int4) returns int4[]
cs186-ab-#    as '/home/cc/cs186/fa03/class/cs186-xx/Hw5/simpleAggr',
cs186-ab-#    'simple_aggr_transition_i4' language C with (isStrict);
cs186-ab=#
cs186-ab=#  create function simple_aggr_final(int4[]) returns int4[]
cs186-ab-#    as '/home/cc/cs186/fa03/class/cs186-xx/Hw5/simpleAggr',
cs186-ab-#    'simple_aggr_final_i4' language C with (isStrict);
cs186-ab=#
cs186-ab=#  create aggregate simpleAggr(basetype = int4,
cs186-ab-#                              sfunc = simple_aggr_transition,
cs186-ab-#                              stype = int4[],
cs186-ab-#                              finalfunc = simple_aggr_final,
cs186-ab-#                              initcond = '{}');
cs186-ab=#
cs186-ab=# select max(num1), avg(num1), simpleAggr(num1) from numtest;

For your convenience, I have included a function called 'makeOneDInt4Array(int count, int *elements)', which takes in a regular array of integers, and a count of the number of integers in the array, and it returns the complicated data structure that postgres requires as an array.

In this example, the final function is actually not necessary.  All it need to is return the array of the 5 largest values seen so far.  But I've left it in since you'll need to understand the final function in the next step.

Step 3: Write an In-Memory Median Function

One aggregate operator that is not part of postgres is median.  The median value of a set of numbers is defined as the value for which there are an equal number of values above and below.  (If the number of values is even, the median is defined as the average of the two values on either side of the middle.)  With the median function, you must include duplicate values (unlike step 2).  The easiest way to compute median is to sort all the values, and find the middle one (or average the two surrounding the middle, if the number of values is even).  Your task in step 3 is to use the stubs in medianAggr.c to create an aggregate function that computes median.  You are to do this by creating an in-memory array containing all the values, sorting it, and finding the middle value (or values).  The transaction function's job is to take each new value and to create a new array containing all previous values, plus the new value.  The final function is passed the array of all values, and must use it to compute the median.  Note that even though the aggregate applies to integers, the return value from the final function should be a float8, since the median might be the average of two numbers.

Step 4: Write a Median Function with External Sorting

One problem with the approach is step 3 is that it keeps all the members of the array in memory at once.   For a very large data set this could cause thrashing and bring the computer to its knees.  A more general solution would be to write out the values to temporary files whenever the array gets too large, then to use merge/sort to put the values in order and compute the median that way.  You task in step 4 is to fill in medianExtAggr.c with code to compute median using external sorting.

The way to add an external sort is to start accumulating values in an array in memory, as in step 3.  When the array gets to big, you will create a temp file using the Postgres Temporary File functions described below, and write out the contents of the in-memory array.  How can you know, when you get to the final function, if you've written out values to a file?  When you create a temporary file, the BufFileCreate() function returns a BufFile pointer.  This pointer, luckily, is the same size as an int4.  So it is possible to store the BufFile pointer in the same array that the transition function uses for keeping integers.  I recommend using the first element of the array to store a BufFile pointer, and have it initialized to 0 indicating that you haven't written anything to a file yet.  You might also want to use the second element of the array to indicate how many values have been written to the file yet.  Note that the "CREATE AGGREGATE" command allows you to specify the initial value to be passed to the transition function the first time.  You could specify '{0,0}' as the initial value if you want to store a file pointer and value count in the first two slots of the array.

(Note: later versions of Postgres support complex data types, so with Postgres 7.5 we could keep a file pointer as a separate field of the data structure passed from one transition function to the next, but with Postgres 7.2 we're stuck with this hack.)

You must also read below about Postgres memory contexts.  Any memory allocated in the default context in the transition function will only last until the subsequent call of the transition function.  As a result, you do not want to allocate a temporary file in the default context, because it will be gone two transition functions later.  The context that will last until the end of the transaction is "TransactionCommandContext", so you should switch to this context before creating the timp file, then switch back.

What value should be the maximum number of values before you create a temporary file?  For now, use the value 1024, but this should be a #define in your code so you can change it to a different value.  Similarly, when you are sorting runs of integers in the first pass, sort 1024 at a time.  When you are merging temp files together, postgres takes care of buffering the file I/O, so just read one int4 at a time from each temp file, and postgres will be smart about reads from disk.  Given this buffering, to make sure that the merges don't require too much memory, don't merge more than 64 files at once.  (And again, the value 64 should be set with a #define so you can change it if needed.)

Step 5: Performance Testing

You will want to test your code on large numbers of values.  I have created a number of sample data files containing a single column of random integers, though with some skew added to ensure that the median and mean are not the same.   These files are in ~cs186/fa03/Hw5/sampleData-* where "*" is the number of values in the file.  I recommend working with about 50,000 values, though you probably have enough disk space for up to 500,000 values.  You will want to do the following:
initdb -D /home/tmp2/cs186-xx
pg_ctl start -D /home/tmp2/cs186-xx
createdb hw5
psql hw5
hw5# create table numtest(num1 integer);
CREATE
hw5=# copy numtest from '/home/ff/cs186/fa03/Hw5/sampleData-50001' using DELIMITERS '|';
COPY
hw5=#
pg_ctl stop -D /home/tmp2/cs186-xx
~cs186/pgsql-7.2.2/src/backend/postgres -s -D /home/tmp2/cs186-xx hw5
backend> select median(num1) from numtest;
blank
         1: median      (typeid = 701, len = 8, typmod = -1, byval = f)
        ----
Got #values: 10001
         1: median = "548380742"        (typeid = 701, len = 8, typmod = -1, byval = f)
        ----
DEBUG:  QUERY STATISTICS
! system usage stats:
!       1.629695 elapsed 1.000000 user 0.030000 system sec
!       [1.000000 user 0.400000 sys total]
!       29/0 [65/66] filesystem blocks in/out
!       29/0 [65/0] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       75/142 [295/158] voluntary/involuntary context switches
! postgres usage stats:
!       Shared blocks:         75 read,          0 written, buffer hit rate = 81.97%
!       Local  blocks:          0 read,          0 written, buffer hit rate = 0.00%
!       Direct blocks:          0 read,          0 written

I have found that with 50,000 values, the external sort version of median takes about 2 - 3 seconds of user & system time, whereas the in-memory version takes about 60-70 seconds.  Your mileage will vary, of course.  Also note how the external version has more voluntary context switches (because the code is waiting on I/O requests), but fewer involuntary ones.   Yet with only 2001 values, the in-memory one is about four times faster than the external sorting one.  This is not at all suprising, since with a small number of values, it's better not to incur the overhead of writing to files, but with many values the swapping caused by allocating lots of memory slows things down a lot.  It is possible to find a compromise between in-memory and external sorting.  You can find this by changing the symbol for the max number of values that you #defined in your code to be 1024.  If that value is larger, then the code will wait longer before writing values out to a file.

Your mission for step 5 is as follows:
1024   3.685   0.940   0.969
4096   5.533   4.334   0.999
...etc


Notes on Postgres Temporary Files

To write the integers to disk, you will need to be able to open, read, write and close files. Rather than using the standard C file I/O library, we require you to use functions within PostgreSQL. Specifically, the BufFile interfaces that are declared in postgresql-7.2.2/src/include/storage/buffile.h. These functions are attractive because you don’t have to deal with file names and you get buffered I/O for free.

The interface is almost identical to the standard C library functions (i.e., fopen() and tmpfile(), fclose(), fseek()). Instead of FILE * you will work with BufFile * pointers, but everything else remains essentially the same. Note: You have to be careful about “memory context” in which BufFile structures are allocated. More details in subsection 4.6. You should only need to use the following functions:

Managing memory with contexts

PostgreSQL uses its own memory manager, which performs functions similar to malloc and free that you are familiar with. However, instead of allocating memory from a single global heap, the PostgreSQL allocators work off an abstraction called memory contexts for convenience and performance. Essentially, each allocated chunk of memory belongs to a particular context and all chunks can be deallocated in one shot. Imagine that at some point you allocate a relatively complex data structure (e.g., a linked list, or a tree, or a hashtable !). Normally, in order to free the memory it uses, you would have to traverse the structure and free each node individually. This traversal is both tedious to write and expensive to perform. Instead, PostgreSQL allows you to do the following:

treeCxt = AllocSetContextCreate(...); /* Creates and assigns a new context to treeCxt */
while (...) { /* Allocate sizeof(TreeNode) bytes from treeCxt */

newNode = MemoryContextAlloc(treeCxt, sizeof(TreeNode));
...

}
MemoryContextDelete(treeCxt); /* Free the entire treeCxt context */


For convenience, PostgreSQL provides the functions palloc and pfree which operate on a default memory context (called the current context and pointed to by the CurrentContext global variable). Thus

ptr = MemoryContextAlloc(cxt, n); /* Alloc n bytes from the cxt memory context */

is exactly equivalent to

oldCxt = MemoryContextSwitchTo(cxt) /* Current context:save in oldCxt,switch to cxt */
ptr = palloc(n) /* Allocate n bytes from current (cxt) context */
MemoryContextSwitchTo(oldCxt) /* Restore current contextto oldCxt */

If you are performing several allocations, this can save you some typing. You can change the current context using MemoryContextSwitchTo(cxt). It is your code’s responsibility to properly manage (i.e., set and restore) the current memory context. Be careful to avoid insidious bugs, in which you forget to reset the memory context to what it was, and confuse another piece of the code !

Finally, memory contexts are organized in a hierarchy. When creating a new context with AllocSetContextCreate(), the first argument is the parent context. Deleting a context also deletes all its child contexts at the same time. The file postgresql-7.2.2/src/backend/util/mmgr/README contains a detailed description of the PostgreSQL memory manager, should you need it, although the information above should suffice.


Notes on debugging postgres

One problem with extending postgres with your own code is that often times mistakes in your code simply cause the postgres backend to crash.  The way to debug this is to run postgres in backend mode:
pg_ctl stop
gdb ~cs186/pgsql-7.2.2/src/backend/postgres
run -s hw5
Also, if you want to do printf style debugging.  My experience shows that doing 'fprintf(stderr,"Hello World");' seems to work when running either in backend mode or in regular mode.

What to turn in.

You should submit your versions of arrayFunc.c, simpleAggr.c, medianAggr.c, and medianExtAggr.c.  You should *not* have changed the names of any functions in those files, otherwise the autograder may not work.  You should also submit RESULTS.TXT, and a README file containing the names of the people in your group.