Skip to content

Commit

Permalink
Adds framework for running DNS performance benchmarks
Browse files Browse the repository at this point in the history
- See dns/README.md for details
- Python performance runner
- Build for dnsperf docker image
  • Loading branch information
bowei committed Dec 6, 2016
1 parent 0b47944 commit 72abdc9
Show file tree
Hide file tree
Showing 40 changed files with 2,331 additions and 103 deletions.
24 changes: 24 additions & 0 deletions dns/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright 2016 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
PYLINTRC := pylintrc

all: pylint pyunit

pylint:
cd py && pylint --rcfile=../$(PYLINTRC) *.py

pyunit:
nosetests py/ -v

.PHONY: pylint pyunit
217 changes: 217 additions & 0 deletions dns/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Overview

This directory contains scripts used to run a dns performance test in a
Kubernetes cluster. The performance script `run` benchmarks the performance of a
single DNS server instance with a synthetic query workload.

# Quickstart

## Prerequisites

This assumes you have a working `kubectl` command Kubernetes cluster. The
Python code depends on the `numpy` package, which is available as `python-numpy`
on Debian-based systems or with `pip install`.

## Running a performance test

``` sh
$ mkdir out/ # output directory
$ ./run --params params/default.yaml --out-dir out # run the perf test
```

`run` will run a performance benchmark ranging over the parameters given in
`--params`. The included `default.yaml` run will take several hours to run
through all combinations. Each run will create a `run-<timestamp>` directory
under the output directory. `latest` symlink will point to the latest run
directory that was created.

### Benchmarking the cluster DNS

You can benchmark the existing cluster DNS by specifying the
`--use-cluster-dns` flag. (As opposed to the server referenced by
`--deployment-yaml`). Note: you should be aware that some noise may be
introduced if the client runs on the same pod as a DNS server.

Note: test parameters such as resource limits do not apply when testing the
cluster DNS as they cannot be changed. The run script will skip these
parameters when running in this mode. (See `params.Param.is_relevant()` for
details).

## Analyzing results

Use the `ingest` script to parse the results of the runs into a sqlite3
database.

```sh
$ ./ingest --db out/db out/latest/*.out
```

The resulting metrics can then be queried using sqlite3. The schema of the
database can be shown using `sqlite3 out/db ".schema"`. To run sql queries, you
can use `sqlite3 out/db < my-query.sql` or `sqlite3 out/db "select * from runs"`
directly.

### Example queries

Maximum 99th percentile latency with dnsmasq caching disabled:

```sql
SELECT
max(latency_99_percentile)
FROM
results NATURAL JOIN runs -- equijoin on run_id, run_subid
WHERE
dnsmasq_cache = 0;
```

Runs that have 95th percentile latency less than 20 ms:

```sql
SELECT
run_id, run_subid, dnsmasq_cpu, kubedns_cpu, max_qps, query_file,
'--',
qps, latency_95_percentile
FROM
results NATURAL JOIN runs
WHERE
results.latency_95_percentile < 20 -- milliseconds
AND results.run_id = runs.run_id
AND results.run_subid = runs.run_subid
ORDER BY
qps ASC;
```

Additional sql queries can be found in `sql/`.

# Monitoring

Kubernetes kube-dns v1.5+ (image `gcr.io/google_containers/kubedns-amd64:1.9`)
now exports [Prometheus](http://prometheus.io) metrics by default. A sample
prometheus pod that scrapes kube-dns metrics is defined in
`cluster/prometheus.yaml` and can be created using kubectl:

```yaml
$ kubectl create -f cluster/prometheus.yaml
```

Key metrics to look at are:

* dnsmasq\_cache\_hits, dnsmasq\_cache\_misses - number of dns requests to the
caching layer. Note: dnsmasq\_cache\_hits + dnsmasq\_cache\_misses = total DNS
QPS.
* skydns\_skydns\_request\_duration\_seconds\_count - total number of requests
served by the kube-dns component.

# Details

## Methodology

The questions we want to answer:

* What is the maximum queries per second (QPS) we can get from the Kubernetes
DNS service given no limits?
* If we restrict CPU resources, what is the peformance we can expect?
(i.e. resource limits in the pod yaml).
* What are the SLOs (e.g. query latency) for a given setting that the
user can expect? Alternate phrasing: what can we expect in realistic
workloads that do not saturate the service?

The inclusion of `max_qps` vs attained `qps` is to answer the third question.
For example, if a user does not hit the maximum QPS possible from a given DNS
server pod, then what are the latencies that they should expect? Latency
increases with load and if a user's applications do not saturate the service,
they will attain better latencies.

## Parameters

The performance test harness tests all combinations of the parameters given in
the `--params` file. For example, the yaml file below will test all
combinations of `run_length_seconds`, `kubedns_cpu`, `dnsmasq_cpu`, ...,
`query_file`, resulting in `1 * 4 * 5 * 2 * 5 * 4 = 800` combinations.

``` yaml
# Number of seconds to run with a particular setting.
run_length_seconds: [60]
# cpu limit for kubedns, null means unlimited.
kubedns_cpu: [200, 250, 300, null]
# cpu limit for dnsmasq, null means unlimited.
dnsmasq_cpu: [100, 150, 200, 250, null]
# size of dnsmasq cache. Note: 10000 is the maximum. 0 to disable caching.
dnsmasq_cache: [0, 10000]
# Maximum QPS for dnsperf. dnsperf is self-pacing and will ramp request rate
# until requests are dropped. null means no limit.
max_qps: [500, 1000, 2000, 3000, null]
# File to take queries from. This is in dnsperf format.
query_file: ["nx-domain.txt", "outside.txt", "pod-ip.txt", "service.txt"]
```
## Results schema
``` sql
CREATE TABLE runs (
run_id,
run_subid,
run_length_seconds,
dnsmasq_cpu,
dnsmasq_cache,
kubedns_cpu,
max_qps,
query_file,
primary key (run_id, run_subid)
);

CREATE TABLE results (
run_id,
run_subid,
queries_sent,
queries_completed,
queries_lost,
run_time,
qps,
avg_latency,
min_latency,
max_latency,
stddev_latency,
latency_50_percentile, -- in milliseconds
latency_95_percentile,
latency_99_percentile,
latency_99_5_percentile,
primary key (run_id, run_subid)
);

CREATE TABLE histograms (
run_id,
run_subid,
rtt_ms,
rtt_ms_count
);
```

# Customizing and extending

## Using the cluster DNS server configuration

`kube-dns` is installed by default using
[addon-manager](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons).
The deployment configuration is located in `/etc/kubernetes/addons/dns`. You can
use the deployment yaml from this directory as the argument to
`--deployment-yaml` above, however, you will need to replace the `k8s-app:
kube-dns` label and replace it with `app: kube-dns-perf-server` to avoid
clashing with the system DNS.

## Using a different DNS server

You can give different DNS server yaml to the runner via the `--deployment-yaml`
flag. Note: test parameters such as `kubedns_cpu` etc may no longer make sense,
so they should be removed from the `--params` file when the test is run.

## Adding new test parameters

To add a new test parameter to be explored, edit `py/params.py` and subclass the
appropriate `*Param` class and add the parameter to module variable
`PARAMETERS`. Each parameter instance implements the modification to the test
inputs (e.g. Kubernetes deployment yaml) necessary to set the value.

# Building the dnsperf image

See [image/README.md](image/README.md).
29 changes: 29 additions & 0 deletions dns/cluster/dnsperf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Copyright 2016 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: v1
kind: Pod
metadata:
name: kube-dns-perf-client
spec:
terminationGracePeriodSeconds: 1
containers:
- command: ["sh", "-c", "while true; do echo `date`: MARK; sleep 10; done"]
image: gcr.io/bowei-gke-dev/dnsperf:1.0
imagePullPolicy: Always
name: dnsperf
resources:
requests:
cpu: 250m
dnsPolicy: ClusterFirst
restartPolicy: Always
Loading

0 comments on commit 72abdc9

Please sign in to comment.