Part 1: Introduction

This lab is about a distributed in-memory key-value database. We created a Python-based framework which provides a simple dictionary-based database server instance. The framework allows reading and writing key-value pairs from/to a database. In addition, the framework provides an infrastructure to launch/stop database server instances on several nodes to run a database in a distributed manner. You will learn about the following concepts:

This lab is mostly performed in a terminal. Open your terminal application to begin.

Part 2. Running experiments

Preliminaries

Non-DICE

Firstly, if you are doing this tutorial using a machine that isn't DICE, you'll need to ssh into a DICE machine from either a UNIX terminal or a Windows ssh application like PuTTY, where sXXXXXXX is your matriculation number. If you're already using a DICE computer, you can skip this step. For more information, see computing support.

ssh sXXXXXXX@student.ssh.inf.ed.ac.uk

You can now continue with the DICE steps.

DICE

We're going to connect to one of the nodes of the cluster. This command will randomly pick one of the 12 servers and connect to it:

ssh scutter$(seq -w 1 12 | shuf -n 1)

Framework installation

Copy the directory with the framework to your DICE home:

cp -r /disk/scratch/EXC_lab3 $HOME/

Prepare shell environment to work with the framework:

source $HOME/EXC_lab3/shrc

Note: preparing shell is required every time when you open a new terminal.

Exchanging messages

In order to run a database server instance locally, use the following command:

run_db

In the output, you should see that a server started successfully, and it waits for incoming messages:

Starting up a server. Listening on port <your_port_number>
Waiting for a message to arrive...

If you want to stop the server, press Ctrl+C.

To send a message to a database server, you should use the tool send_message. This tool supports the following options:

Note: a colon is used as a delimiter between key and value in a message for UPDATE request.

Let's try to send an update message to a local database server. When a database receives an update message, it allocates the value to an internal dictionary. If the allocation was successful, the database sends an acknowledgement message to the requester. With a database server running in the first terminal, open another terminal on the same node as the first terminal and use the following commands:

source $HOME/EXC_lab3/shrc
send_message -a $LOCAL_HOSTNAME -t UPDATE -m "12345:Hello World"

The last command requests the value Hello World to be added to the database running on host $LOCAL_HOSTNAME with the key 12345. You can check what is the value of $LOCAL_HOSTNAME with:

echo $LOCAL_HOSTNAME

After the execution of the second command, send_message should report that the key-value pair was successfully added. You should see the following output (note: that node addresses and ports can be different):

Sending Message: Type: UPDATE, Addr: 129.215.18.53, MessageBody: 12345:Hello World
Response Message: Type: UPDATE_ACK, Addr: 129.215.18.53, MessageBody: Success

The database server (in the first terminal) also should report the key-value pair was added:

Add an entry to the database: key: 12345, value: Hello World

Now, let's retrieve data from the database. When a read message is received by a database and the key is present, the database sends back a read-reply message containing a value for the requested key. To observe this behaviour, type the following command in the second terminal:

send_message -a $LOCAL_HOSTNAME -t READ -m "12345"

This command tries to read the values with key 12345 on node $LOCAL_HOSTNAME. As we've already added the value for this key, send_message should report receiving READ_REPLY message with the value:

Response Message: Type: READ_REPLY, Addr: 129.215.18.53, MessageBody: Hello World

Running a database on multiple nodes

In the previous section, you were running a database instance on one node only. This section explains how to launch a database server instances on multiple nodes.

The list of the nodes to be used by the database is specified in $HOME/EXC_lab3/settings.cfg in Node_hostnames section. By default, the database runs on 12 nodes scutter{01..12}. You can see this list with the following command:

cat $HOME/EXC_lab3/settings.cfg

In order to launch database instances on all the nodes type the following commands:

start_distributed_db

The output of all database instances is redirected to log files located in $HOME/EXC_lab3/logs. The instance creates a log file with the name of its host (a node an instance is running on). You can track the changes in all log files with the following command:

tail -f $HOME/EXC_lab3/logs/<hostname>

or in order to track all files at the same time:

tail -f $HOME/EXC_lab3/logs/*

Press Ctrl+C to stop tracking changes on log files.

Exercise: Try sending requests to database server instances running on different nodes. For example, you can use the following commands:

send_message -a scutter03 -t UPDATE -m "12345:Hello World"
send_message -a scutter05 -t UPDATE -m "77777:This is a test line"
send_message -a scutter03 -t READ   -m "12345"
send_message -a scutter05 -t READ   -m "77777"
send_message -a scutter05 -t READ   -m "77776"
send_message -a scutter04 -t READ   -m "77777"

You should see that the server received the READ message, searched for a key in the internal database and sent the response to the client. What is the result of the last read commands? In order to stop database instances on all nodes, you can type the following command:

stop_distributed_db

Distributed database and load balancing

In the modern datacenter environment, one node usually cannot sustain a high rate of arriving requests and/or is not capable of storing the whole dataset. Therefore, the load should be spread across multiple nodes by making each node responsible for only a part of the key space. A part of the key space usually referred as a shard. Consequently, sharding is a method of partitioning the data across the nodes.

One way to categorize sharding is deterministic versus dynamic. With deterministic sharding, a client sets which node should process a message. Deterministically sharded databases use a sharding function key -> partition_key -> node_id to locate data. In dynamic sharding, a separate locator service tracks the partitions amongst the nodes.

In this lab, an deterministic sharding is used. We will consider two with sharding functions:

The hashing function is specified in $HOME/EXC_lab3/settings.cfg in the section General_parameters:

hashing_function = None

By default, a simple sharding function is used. You can enable md5-based hashing by changing the $HOME/EXC_lab3/settings.cfg:

hashing_function = md5

To enable a load balancer on the client side, you should use send_message with -l option. For example, you can use the following command:

send_message -l -t UPDATE -m "12345:Hello World"

Different sharding methods affect load balancing. For example, the simple sharding function might provide bad performance when a database experiences a strided key access pattern. A strided key access pattern with stride M means a stream of accesses where every Mth key block is touched.

Exercise: Study how different sharding methods affect load balancing under a strided key access pattern. First, launch a distributed database if you have not done so. To generate a stream of accesses with stride 3, you can use the following command:

STRIDE=3; for key in $(seq $(seq -w 1 $STRIDE | shuf -n 1) $STRIDE 1000); do send_message -l -t UPDATE -m "${key}:$(gen_random_string 20)"; done

Note: gen_random_string generates a random string; usage: gen_random_string ** Is load balancing fair with other strides, e.g. 10 and 12? Is there the same behaviour with md5-based sharding method? Explain your results.

Fault tolerance

To ensure that a client does not experience an interruption in service when hardware or networks fail, a fault-tolerant databases are required. One way to achieve fault tolerance is to rely on replication. With replication, every shard resides on several nodes. Thus, in case of a node failure, the data can be requested from other nodes.

In this lab, a simple replication policy of copying the data to n adjacent nodes is used. For example, if replication degree is 3 and there are 5 nodes node{00..04} are used, the data on node node00 is also replicated to node01 and node02. Similarly, node00 and node01 contain a copy of the data which primarily resides on node04.

To observe the need for fault tolerance, you can see what happens if one instance of database server fails. You can shut down a database server instance on a node with the command:

stop_distributed_db -a <hostname>
send_message -a <hostname> -t UPDATE -m "12345:Hello World" # to see a problem

To enable support for replication in a distributed database, you should change the message processing handler and the degree of replication in $HOME/EXC_lab3/settings.cfg in section General_parameters as shown below:

number_replicas = 3
message_handler = fault_tolerant_message_handler

Exercise: Emulate a failure of a node(s). How many nodes are required to be online for successful updates? Note: you can see how the database process messages in $HOME/EXC_lab3/src/lib/message_handlers.py

Troubleshooting

In case you get the following error when you try launching a database instance

ERROR: [Errno 98] Address already in use. Seems like a database is already running on this node. Run stop_distributed_db to stop a database on all nodes.

and stopping the database does not help, you can force a database server instance to shut down with the following commands:

kill_local_db # on the local node only
kill_db_on_all_nodes # on all nodes listed in settings.cfg