Skip to content

K9db Drill Down

Kinan Bab edited this page May 30, 2023 · 5 revisions

This wiki lists the steps required to run our K9db drill down experiment from the paper (figure 12).

Drill down

The drill down experiment uses our ownCloud Harness. This experiment runs the harness against different versions of K9db with key features or optimizations disabled, to demonstrate their effect.

The experiment has 4 modes:

  1. The full unadultered K9db: we do not run this as part of the experiment, instead we use the K9db run from the ownCloud comparison experiment as it has identical parameters.
  2. K9db with no views: we turn off the ability to have views/dataflows in K9db. This disables in-memory data-ownership indexes as well as views that cache complex queries. In this mode, whenever we encounter a query that would have otherwise been cached, or a lookup into an otherwise in-memory index, K9db computes the corresponding results using a reasonable hardcoded sequence of physical DB operations, similar to how MySQL/MariaDB would plan and execute such queries physically.
  3. K9db with no views + no accessors: this run additionally marks the users that a file is shared with (directly or via a group) as owners, rather than accessors. This creates additional copies of rows in the database, as they are stored in more user shards.
  4. K9db with no views, no accessors, and physical separation: we use an alternative storage layout for K9db that enforces strict physical separation of users' shards. In this mode, every user shard (or microdatabase) is stored in a completely separate RocksDB column family, such that no single physical storage unit stores or indexes data from multiple users.

Setups (2) to (4) are strictly worse than K9db, as they correspond to features or optimizations being turned off.

Running The Experiment

To avoid cluttering our actual K9db implementation with experimental code whose purpose is only to disable useful features and optimizations, this experiment is located on a dedicated branch: [drill-down-experiment](https://github.com/brownsys/K9db/tree/drill-down-experiment).

Before running this experiment, please make sure you ran our ownCloud comparison experiment on the same machine, so that the unadultered K9db measurements are available for plotting.

Then, switch to the experiment branch, and run our experiment

git fetch origin
git checkout 0a2111a1fe0035c316e623e6249520af54c0c81e
# Or alternatively, checkout the 'drill-down-experiment' branch.
./experiments/scripts/drilldown.sh

On our setup, this experiment takes close to two and a half hours to finish. Most of this time is spent priming the database for the physical separation experiment. If this takes an unreasonable amount of time on your setup, consider lowering the number of users in our physical separation experiment, which is controlled by the phys_user variable in our script, which is set to 1000 by default.

After the experiment is done, you will find a plot corresponding to figure 12 from our paper in experiments/scripts/outputs/drilldown/. The measurement files are available in experiments/scripts/logs/drilldown.

The physical separation mode is a lot slower than the rest of the setups, and is likely that it will dominate the plot and make the other bars almost invisible. You can use our plotting script to change the y-axis, and introduce cuts to zoom in on interesting regions of the plot.

cd experiments/scripts/plotting
. venv/bin/activate
python drilldown.py --paper --cuts="(0, 1), (1, 7), (7, 39)"
# Produces a new plot in experiments/scripts/outputs/drilldown/
# The y-axis in this new plot are split into 3 sections spanning
# 1/3 of the display space. Each section starts and ends with the
# --cuts parameter coordinates.

Feel free to change the cuts so that you can distinguish the different bars for each query.