Skip to content

Experiments

Kinan Bab edited this page May 30, 2023 · 19 revisions

The following is a step-by-step process for replicating our experimental results from our OSDI paper.

Setup

Google Cloud

We ran our experiments on google cloud, and we recommend using it in order to identically replicate our results. We provide scripts for setting up and running the different experiments tailored to google cloud. The scripts install all the required dependencies, configure the baseline, and build K9db.

We used n1-standard-16 instances to run our experiments. We configure our instances to have an attached local NVME SSD, which we use to store the database for both K9db and all the baselines. Our lobsters experiments, e.g. figures 8 and 9, require two instances to run: one for the database server and one for the load generator. The remaining experiments only require a single machine that is used both for the database and load generation. When creating the server and load generator instances, make sure they are located in the same zone for optimal network setup.

After creating the instances, follow these instructions to set them up.

  1. SSH into the google cloud instance(s).
  2. Clone the repo into some directory <K9DB_DIR>.
  3. Execute the setup script: cd <K9DB_DIR> && ./experiments/scripts/gcloud-setup.sh.

Note: whenever you stop/start or restart your google cloud instance, you should re-run the SSD configuration script to ensure SSD is mounted properly: cd <K9DB_DIR> && ./experiments/scripts/setup/ssd.sh. You should also re-build K9db to ensure bazel has all the needed binaries cached and ready, so that experiments scripts are not delayed: cd <K9DB_DIR> && ./experiments/scripts/setup/build.sh.

Using our docker container

You can run our experiments using our docker container on a local machine or a cloud setup. The docker container includes all the required dependencies and configurations to be able to run the experiments locally. However, you may need to modify two configurations to ensure the experiments run in the most optimal way on your machine:

  1. If you have both a regular HDD and an SSD, make sure that the docker container is stored on the SSD, or mount some directory on the SSD into the container, and configure mariadb and K9db to store data in that directory, by changing the datadir parameter in /etc/mysql/mariadb.cnf, and changing the K9db database path inside each experiment script. If your system only has an SSD or only has an HDD this is not applicable.

  2. If you are running the experiments in a cloud environment with the load generation instance separate from the server instance, make sure that ports 3306 and 10001 are exposed properly from the docker container, and that K9db and MariaDB both listen to incoming connection via the correct network interface. You can change the K9db interface by modifying each experiment script, and you can change the mariadb network interface by changing the bind-address configuration in /etc/mysql/mariadb.conf.d/50-server.cnf. This is not applicable if you are running the load generation and database server on the same machine.

Other Setups

It is possible to run our experiments on a different setup, either using a different cloud provider, or using a local machine.

However, you will need to install the dependencies and configure everything yourself by manually following the applicable steps inside our Google cloud setup script at <K9DB_DIR>/experiments/scripts/gcloud-setup.sh.

Make sure MariaDB is configured correctly (similar to <K9DB_DIR>/experiments/scripts/setup/mariadb.sh):

  1. MariaDB is installed correctly with the the MyRocks plugin. This is the baseline we use in most of our experiments.
  2. MariaDB has a user k9db with password password and full permissions to create and modify databases.
  3. MariaDB is configured to store the database on the SSD (if you are using any).
  4. MariaDB is configured to listen on a proper external network interface, with its port properly exposed, in case you are running the Lobsters load generation harness on a different machine. This is not applicable if you are running everything on one machine.

In addition, you will need to install the dependencies for our plotting scripts yourself. Please consult ./experiments/scripts/setup/plotting.sh for the details.

Interpreting Results

You must take these factors into consideration when interpreting the results of the experiments using a setup different from ours.

K9db and MariaDB are both highly sensitive to disk speed. If your setup does not use an NVME SSD, the absolute numbers in the experiments results will be slower than our results. Baselines and components that do in-memory caching will not be affected by this, so the gap between, say the MariaDB and MariaDB+Memcached baselines will be larger. The effects of a slower disk on K9db will be a mixed bag, as K9db uses in memory caching to accelerate expensive queries, but not simple ones nor writes.

Similarly, the exact proportions shown in figure 12 also depend on the setup. In particular, a slower disk may amplify the speedup introduced by the different measured K9db optimizations, e.g. the in-memory cache will be more effective relatively when it is caching a slower disk.

Our lobsters experiment uses an open-loop multi threaded harness. An important configuration parameter is the request load, which controls how many parallel requests are issued by this harness per second (as a multiplier of Lobsters' actual load). When running on slower machines, slower disks, or slower network, our configuration may be too aggressive and exceed the available resources. Thus, you may need to experiment a bit to find the optimal request load. We provide more instructions about this in the Lobsters experiment page.

Overall, the general trends shown in our results should remain when run on different setups with reasonable configurations, but the exact numbers and proportions may differ. Please exercise your judgment while interpreting these differences on different setups.

Running Experiments

We provide end-to-end scripts and instructions for running each experiment below. The scripts are compatible with our Google cloud setup and our docker container, as well as any other setups that respect the MariaDB configurations listed above and have the proper dependencies installed for our libraries.

You should only run one experiment at a time on any given machine/instance.

You can find detailed instructions on how to use these scripts to run the different experiments below: