Project 4: Build a Distributed Key-Value Store

Summary

In Project 4, you will implement a distributed key-value store that runs across multiple nodes.

Multiple clients will be communicating with a single coordinating server in a given messaging format (KVMessage) using a client library (KVClient). The coordinator contains a write-through set-associative cache (KVCache), and it uses the cache to serve GET requests without going to the (slave) key-value servers it coordinates. The slave key-value servers are contacted for a GET only upon a cache miss on the coordinator. The coordinator will use the TPCMaster library to forward client requests for PUT and DEL to multiple key-value servers (KVServer) using TPCMessage and follow the 2PC protocol for atomic PUT and DEL operations across multiple key-value servers. Note that, TPCMessage uses the same KVMessage class used in Project 3 with extra fields and message types. KVServers remain the same as Project 3.

Architecture Diagram

Figure: A distributed key-value store with replication factor of two.

Architecture Diagram

Figure: Functional diagram of a distributed key-value store. Components in colors other than black are the key ones you will be developing for this project. Blue depicts GET execution path, while green depicts PUT and DEL; purple ones are shard between operations. Note that, components in black might also require minor modifications to suite your purposes. Not shown in the figure: PUT and DEL will involve updating the write-through cache in the coordinator.

Skeleton Code

The project skeleton you should build on top of is posted at https://bitbucket.org/bruckner/project4skeleton. If you have git installed on your system, you can run git clone https://bitbucket.org/bruckner/project4skeleton.git. Project 4 builds on top of the Single Server key-value Store developed in Project 3; however, several interfaces have been extended to support the required functionalities for Project 4. You can reuse the code developed for Project 3 and define additional classes and methods as you see fit.

You can also download the skeleton without using git from the above-mentioned URL (look for the download link in that page).

Requirements

Tasks (Weights)

  1. (40%) Implement the TPCMaster class that implements 2PC coordination logic in the coordinator server. TPCMaster must select replica locations using consistent hashing. Only a single 2PC operation will be active at a time. You only have to handle parallel 2PC operations for extra credit (see #6 below).
  2. (40%) Implement the TPCMasterHandler class that implements logic for 2PC participants in key-value servers.
  3. (10%) Implement registration logic in SlaveServer (aka key-value Server) and RegistrationHandler in TPCMaster. Each SlaveServer has a unique ID that it uses to register with the TPCMaster in the coordinator.
  4. (10%) Implement the TPCLog class that key-value servers will use to log their states during 2PC operations and for rebuilding during recovery.
  5. Extra Credit (5%) Upon receiving a request at the coordinator server, the server would need to contact the slave servers the key is replicated to (only upon cache miss in the case of GET requests). For the previous tasks, it is fine to contact the slave-server sequentially. For extra credit, you should improve the performance of the system by contacting the slave server in parallel.
  6. Extra Credit (5%) In order to improve the performance even further, the 2PC operations themselves should be done in parallel. If the key in the requests hash to different sets in the coordinator cache, then these requests should be executed in parallel. Otherwise, the operations should be blocked and executed one after the other.

Deliverables

As in previous projects, you will have to submit initial and final design documents as well as the code itself.

  1. Initial Design Document (Due: April 29th, 2013)
  2. Code (Due: May 9th, 2013)
  3. Final Design Document (Due: May 10th, 2013)

Additionally, you will have to submit JUnit test cases for each of the classes you will implement. Following is an outline of the expected progress in terms of test cases in each of the project checkpoints.

Consistent Hashing

As mentioned earlier, key-value servers will have unique 64-bit IDs. The coordinator will hash the keys to 64-bit address space. Then each key-value server will store the first copies of keys with hash values greater than the ID of its immediate predecessor up to its own ID. Note that, each key-value server will also store the keys whose first copies are stored in its predecessor.

Consistent Hashing

Figure: Consistent Hashing. Four key-value servers and three hashed keys along with where they are placed in the 64-bit address space. In this figure for example, the different servers split the key space into S1:; [216 + 1, 223], S2: [223 + 1, 235], S3: [235 + 1, 255] and finally note that the last server owns the key space S4: [255 + 1, 264 - 1] and [0, 216]. Now when a key is hash to say a value 255 + 1, it will be stored in the server that owns the key space, i.e, S4 as well as the immediately next server in the ring S1.

Sequence of Operations During Concurrent GET and PUT/DEL Requests

Sequence Diagram of
Concurrent R/W Operations in Project 4

Figure: Sequence diagram of concurrent read/write operations using the 2PC protocol in Project 4. "Project 3" blocks in the diagram refer to the activities when clients write to a single-node key-value server, where the coordinator is the client to individual key-value servers. GET request from Client 1 is hitting the cache in the above diagram; if it had not, the GET request would have been forwarded to each of the SlaveServers until it is found. The Coordinator (TPCMaster) expects each individual key-value server to respond back with an ACK at the end of the 2nd phase; if it doesn't receive an ACK from a slave server, it will keep retrying. To handle failures during the 2nd phase (i.e., to send back the ACK properly), slave servers must log the received global decision before they have sent back the ACK. Note that the slaves don't send back a missing ACK before receiving the global decision from the master again.

Extended KVMessage format (TPCMessage piggybacked on KVMessage) for the Coordinator and SlaveServer communication

Registration Messages

KVMessage with Multiple Error Messages

The ignoreNext Message

Testing distributed systems is hard, more so in presence of failures. To make it slightly easier, we are introducing the ignoreNext message that will be issued directly to the slave server by the autograder. Upon reception of this message, the slave server will ignore the next 2PC operation (either a PUT or a DEL) and respond back with a failure message during the first phase of 2PC. This will cause the 2PC operation to fail and the failure code path will be triggered. You need not implement the ignoreNext message forwarding logic in the TPCMaster (we will not send the message to the TPCMaster). The ignoreNext will be sent to the appropriate slave servers directly by our testing code. You do not need to maintain ignoreNext status on reboots. If a slave server receives an ignoreNext message, then restarts before the next TPC operation, the next TPC operation can be processed instead of ignored.

Error messages in Addition to Project 3

Concepts You are Expected to Learn