your perf.txt should be very similar to one of the following files. the only difference is the size of R and S. size: 1200 =========== 0.00 99.83 99.83 99.83 99.83 99.83 0.00 34.71 62.55 69.21 90.62 99.83 0.00 0.00 99.50 99.83 99.83 99.83 85.55 88.77 89.42 89.43 89.43 89.43 0.49 34.21 54.02 58.85 75.66 88.20 85.14 88.46 89.35 89.43 89.43 89.43 size: 1800 =========== 0.00 0.00 99.89 99.89 99.89 99.89 0.00 33.11 58.76 66.67 88.77 92.94 0.00 0.00 0.00 0.00 97.75 99.86 81.83 87.46 89.59 89.88 90.75 90.75 0.36 29.89 45.17 49.67 62.72 77.17 81.76 86.64 88.48 88.91 90.09 90.75 size: 2400 =========== 0.00 0.00 0.00 99.92 99.92 99.92 0.00 17.07 30.48 34.67 48.48 64.19 0.00 0.00 0.00 0.00 0.00 95.99 82.14 87.95 90.34 90.68 91.90 91.90 0.34 28.29 40.91 44.64 56.09 69.17 82.13 87.11 89.05 89.46 90.83 91.90 key to answer.txt ================ 1. For experiment 1, are the hit rates for LRU and 2Q as expected from the discussion in class? Explain your answer. The performance for LRU matched the class description. Because we are doing a join where S, the inner relation, does not have an index on the joining field, we just sequentially scan S. Therefore we have the problem of sequential flooding - buffer management is useless until the number of buffers is more than the size of S. As for S2Q, it did perform worse than LRU. The main reason is the simple access pattern of the join query. When number of buffers is smaller than S, S2Q acts as an FIFO queue, so it would perform worse than LRU. 2. For experiment 1, when blocks are accessed as in sequential flooding, we expect to get a non-zero hit rate for the MRU policy for any number of buffers. Is this happening here? If not, why? No. At 280 buffers, MRU gives 0% hit rates. We know that PostgreSQL uses 278 buffers to initialize the query, unpins these buffers, and does not refer to these buffers again. This explains the unexpected low performance of MRU. 3. For experiment 2, what are the main differences compared with experiment 1? Explain the cause of these differences. We notice that there is a jump in hitrate between <280/450/580> and <450/580/620> bufnums for non-indexed queries when we use LRU, whereas the hitrate is more or less consistent across all bufnums for indexed queries. Because S has an index on the joining field (which happens to be the primary key of S), we can easily look up a tuple. Therefore we only look at one tuple of S for each tuple of R with matching key fields. We can expect the number of page misses to be fairly consistent in LRU, S2Q, and F2Q. Another reason for the high hit rates across all bufnums for LRU, S2Q, and F2Q is that the index is small fits in memory. When there is no index on S, we must sequentially scan S for each tuple of $ and therefore we must do more reads. Once S fits into memory, we can expect the hitrate to be very high for LRU. For S2Q and F2Q, the reasons for the hit rates for experiment 1 are outlined in the answer to Question 1 and 2. 4. Tables R and S are the same size. Can you tell the approximate used by R (or S)? How do you know that? The approximate is <300/500/600>. With bufnum = <280/450/580> we see a hit rate of 0% with LRU and with bufnum = <450/580/620>, we see a hit rate of ~99%, so we can tell that at approximately <300/500/600> buffers, the entire S relation fits into memory. 5. What kind of access patterns will 2Q have better performance than LRU in terms of hitrates? The way in which 2Q improve upon LRU is: instead of cleaning cold pages from the main buffer, 2Q admits only hot pages to the main buffer. The access pattern that has a combination of "hot" pages and "cold" pages, we could potentially set the threshold to make s2q perform better than lru. example: # of frames = 5; threshold = 2 ABABCDEFGHIABJKLMN s2q would perform really good if # of hot pages < (frames - threshold) Hints given: 1. PostgreSQL uses 278 out of the buffers to initialize your query (e.g. load system tables), and then unpins these buffers before the query processing begins. 2. When there is no index on S, PostgreSQL executes this particular query using the Nested Loop Join algorithm. For every tuple from the R relation, the system scans the complete S relation to find all S tuples that match this R tuple. Then the next R tuple is read, and so on... Note that the R relation is scanned only once.