Project 4 - Distributed Key-Value Store

Overview

In project 4, you will implement a distributed key-value store that runs across multiple nodes.

Multiple clients will be communicating with a single master server in a given messaging format (KVMessage) using a client library (KVClient). The master contains a set-associative cache (KVCache), and it uses the cache to serve GET requests without going to the key-value (slave) servers it coordinates. The slave servers are contacted for a GET only upon a cache miss on the master. The master will use the TPCMaster library to forward client requests for PUT and DEL to multiple slave servers and follow the 2PC protocol for atomic PUT and DEL operations across multiple slave servers. KVServers remain the same as project 3.

Concepts to learn: two-phase commit, logging and recovery, consistent hashing, failure detection using timeouts

Figure: A distributed key-value store with replication factor of two.

Architecture Diagram

Figure: Functional diagram of a distributed key-value store. Components in colors other than black are the key ones you will be developing for this project. Blue depicts GET execution path, while green depicts PUT and DEL; purple ones are shard between operations. Note that, components in black might also require minor modifications to suite your purposes. Not shown in the figure: PUT and DEL will involve updating the write-through cache in the master.

Architecture Diagram

Task Outline

  1. [10%] Implement two-phase commit messages. You must either modify KVMessage directly or extend the class with your own TPCMessage. Complete all methods in KVMessage including a new constructor. Also implement the ignoreNext method in KVClient. See message specs in tables below.
  2. [10%] Implement the TPCLog class that slaves will use to log transactions and rebuild the server during recovery.
  3. [20%] Implement registration logic in the RegistrationHandler and SlaveInfo classes in TPCMaster. Also implement the methods findFirstReplica and findSuccessor which will be used to find the slaves involved in a transaction. Complete the modified SocketServer.
  4. [30%] Implement the rest of the TPCMaster and the modified KVClientHandler to handle all for two-phase commit logic on the master node. In TPCMaster, this includes performTPCOperation and handleGet.
  5. [30%] Implement the TPCMasterHandler class that has handles 2PC logic on the slaves. Be sure to handle ignoreNext requests. You will also need to fill in hasKey in KVServer.

Requirements

Unless otherwise specified below, you will have to satisfy the requirements described in project 3. Again, you should bulletproof your code such that the nodes do not crash under normal operation.

You are required to keep up with this Piazza thread for ongoing clarifications.

Two-phase Commit

Figure: Sequence diagram of concurrent read/write operations using the 2PC protocol in project 4. Project 3 blocks in the diagram refer to the activities when clients write to a single-node slave server, where the master is the client to individual slaves. GET request from Client 1 is hitting the cache in the above diagram; if it had not, the GET request would have been forwarded to each of the SlaveServers until it is found.

Sequence Diagram of Concurrent R/W Operations in Project 4

Failures, Timeouts, and Recovery

At any moment there will be at most one slave server down. Upon revival, slave servers always come back with the same ID. For this particular project, you can assume that the master will never go down, meaning there is no need to log its states. Individual slave servers, however, must log necessary information to survive from failures.

Your code should be able to handle the slave server failing at any point in the 2PC transaction (there are multiple cases that you have to handle here), handle the ignoreNext messages, and handle pathological inputs to the system.

Registration

These are the message formats for registration.

register
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="register">
<Message>SlaveServerID@HostName:Port</Message>
</KVMessage>
resp
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="resp">
<Message>Successfully registered SlaveServerID@HostName:Port</Message>
</KVMessage>

Consistent Hashing

Figure: Consistent Hashing. Four slave servers and three hashed keys along with where they are placed in the 64-bit address space. In this figure for example, the different servers split the key space into S1:; [216 + 1, 223], S2: [223 + 1, 235], S3: [235 + 1, 255] and finally note that the last server owns the key space S4: [255 + 1, 264 - 1] and [0, 216]. Now when a key is hash to say a value 255 + 1, it will be stored in the server that owns the key space, i.e, S4 as well as the immediately next server in the ring S1.

Consistent Hashing

Testing with ignoreNext

Testing distributed systems is hard, more so in presence of failures. We are introducing the ignoreNext message that will be issued directly to the slave server by the autograder (ignoreNext is purely for us). Upon reception of this message, the slave server will ignore the next 2PC operation (either a PUT or a DEL) and respond back with a failure message during the phase-1. This will cause the 2PC operation to fail and the failure code path will be triggered. You do not need to implement any ignoreNext forwarding logic in the TPCMaster (we will not send the message to the TPCMaster). The ignoreNext will be sent to the appropriate slave servers directly by our testing code. You do not need to maintain ignoreNext status on reboots. If a slave server receives an ignoreNext message, then restarts before the next 2PC operation, the next 2PC operation can be processed instead of ignored.

request
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="ignoreNext"/>
resp
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="resp">
<Message>Success</Message>
</KVMessage>

This error should be sent back from the slave if the previous request was an ignoreNext. Note: there are no newlines in the message.

ignoreNext resp
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="resp">
<Message>IgnoreNext Error: SlaveServer SlaveServerID has ignored this 2PC request during the first phase</Message>
</KVMessage>

2PC message formats

These are the new message formats for 2PC communication between the master and slave nodes. get requests should remain unchanged from project 3 since they have no 2PC semantics. There are no newlines in these messages. Italicized fields should be replaced with the actual value.

putreq
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="putreq">
<Key>key</Key>
<Value>value</Value>
<TPCOpId>2PC Operation ID</TPCOpId>
</KVMessage>
delreq
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="delreq">
<Key>key</Key>
<TPCOpId>2PC Operation ID</TPCOpId>
</KVMessage>
ready vote
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="ready">
<TPCOpId>2PC Operation ID</TPCOpId>
</KVMessage>
abort vote
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="abort">
<Message>Error Message</Message>
<TPCOpId>2PC Operation ID</TPCOpId>
</KVMessage>
commit decision
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="commit">
<TPCOpId>2PC Operation ID</TPCOpId>
</KVMessage>

abort decision

<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="abort">
<TPCOpId>2PC Operation ID</TPCOpId>
</KVMessage>
ack
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="ack">
<TPCOpId>2PC Operation ID</TPCOpId>
</KVMessage>

Extra error formats

Multiple error messages in case of an abort should be placed in the same Message field of a "resp" message prefixed by @SlaveServerID:= and separated by the newline character '\n'. These will be created by TPCMaster and returned to the client if multiple slaves return error messages. Use this as you deem necessary.

multiple errors
<?xml version="1.0" encoding="UTF-8"?>
<KVMessage type="resp">
<Message>@SlaveServerID1:=ErrorMessage1\n@SlaveServerID2:=ErrorMessage2</Message>
</KVMessage>

Testing

Testing for this project is complicated. We have a two main suggestions for testing failures. Neither of these are required, but you will likely find that you need to do something similar. Do not expect a public autograder for this project. Expect to spend a significant portion of your time testing.

Skeleton

The skeleton you must build on top of is linked below. Project 4 builds on top of the single-server key-value store developed in project 3; however, several interfaces have been modified/extended to support the required functionalities for this project. You should reuse the code developed for project 3 and define additional classes and methods as you see fit.

http://www-inst.eecs.berkeley.edu/~cs162/fa13/kvstore/proj4-skeleton.tar.gz

The following (hopefully) describes all of the new/modified files included in the project 4 tar and the changes associated with each:

You will need to merge your project 3 logic into the project 4 skeleton for classes that have been modified. Files from project 3 that are not mentioned in the list above will also be necessary for project 4 but will remain unchanged (Client, KVCache, KVStore, ThreadPool, etc).

Deliverables

  1. Mon 12/02: Initial Design Document (10 points)
  2. Thu 12/12: Code (60 points)
  3. Fri 12/13: Final Design Document (20 points)

If your last proj4-initial-design submission is before Wed, 11/27 at 11:59PM, you will receive 2 points of extra credit in your project grade.

You will have to submit JUnit test cases. The following are the expectations regarding testing:

Initial Design Document: One to two sentence description of each test case you plan on implementing for each class and the reason for including them (what they test). Please follow the general outline of our design doc template (not updated for project 4). 15 page maximum.

Code: Submit code along with your set of test cases. You are free to use any library for testing; just include the jar file(s) in your submission.

Final Design Document: Description on each test case you have implemented for each class. Our evaluation of your test cases will be worth 10 of the 20 points on your final design. We are looking for unit tests with good code coverage and integration tests that verify the behavior of your system. 18 page maximum (extra 3 pages for tests).