Skip to content

Commit

Permalink
docs: Fixed getting started docs after feedback. (#738)
Browse files Browse the repository at this point in the history
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
  • Loading branch information
bwplotka authored Jan 21, 2019
1 parent be76156 commit 2f08fec
Showing 1 changed file with 50 additions and 38 deletions.
88 changes: 50 additions & 38 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ NOTE: If you want to use `reload.*` flags for sidecar, make sure you enable `rel

### Backups

The following configures the Sidecar to only backup Prometheus data into a Google Cloud Storage bucket.
The following configures the sidecar to only backup Prometheus data into a chosen object storage:

```
thanos sidecar \
Expand All @@ -53,6 +53,8 @@ thanos sidecar \
--objstore.config-file bucket_config.yaml \ # Bucket to upload data to
```

The format of YAML file depends on provider you choose. Examples of config and up-to-date list of storage types Thanos supports is available [here](storage.md).

Rolling this out has little to zero impact on the running Prometheus instance. It is a good start to ensure you are backing up your data while figuring out the other pieces of Thanos.

If you are not interested in backing up any data, the `--objstore.config-file` flag can simply be omitted.
Expand All @@ -61,9 +63,9 @@ If you are not interested in backing up any data, the `--objstore.config-file` f
* _[Example Kubernetes manifest with GCS upload](../kube/manifests/prometheus-gcs.yaml)_
* _[Details & Config for other object stores](./storage.md)_

### [Store API](components/store.md)
### [Store API](/pkg/store/storepb/rpc.proto#L19)

The Sidecar component comes with a _[Store API](components/store.md)_. The Store API allows you to query metric data in Prometheus and data backed up into the Object Store bucket.
The Sidecar component implements and exposes gRPC _[Store API](/pkg/store/storepb/rpc.proto#L19)_. The Store API allows you to query metric data in Prometheus and data backed up into the Object Store bucket.

Let's extend the Sidecar in the previous section to connect to a Prometheus server, and expose the Store API.

Expand All @@ -73,13 +75,14 @@ thanos sidecar \
--objstore.config-file bucket_config.yaml \ # Bucket config file to send data to
--prometheus.url http://localhost:9090 \ # Location of the Prometheus HTTP server
--http-address 0.0.0.0:19191 \ # HTTP endpoint for collecting metrics on the Sidecar
--grpc-address 0.0.0.0:19090 \ # GRPC endpoint for StoreAPI
--grpc-address 0.0.0.0:19090 # GRPC endpoint for StoreAPI
```

* _[Example Kubernetes manifest](../kube/manifests/prometheus.yaml)_
* _[Example Kubernetes manifest with GCS upload](../kube/manifests/prometheus-gcs.yaml)_

### External Labels

Prometheus allows the configuration of "external labels" of a given Prometheus instance. These are meant to globally identify the role of that instance. As Thanos aims to aggregate data across all instances, providing a consistent set of external labels becomes crucial!

Every Prometheus instance must have a globally unique set of identifying labels. For example, in Prometheus's configuration file:
Expand All @@ -106,8 +109,8 @@ Below, we will set up a Query to connect to our Sidecars, and expose its HTTP UI
```
thanos query \
--http-address 0.0.0.0:19192 \ # HTTP Endpoint for Query UI
--store 0.0.0.0:18091 \ # Static gRPC Store API Address for the query node to query
--store 0.0.0.0:18092 \ # Also repeatable
--store 1.2.3.4:19090 \ # Static gRPC Store API Address for the query node to query
--store 1.2.3.5:19090 # Also repeatable
```

Go to the configured HTTP address that should now show a UI similar to that of Prometheus. If the cluster formed correctly you can now query across all Prometheus instances within the cluster.
Expand All @@ -132,11 +135,9 @@ Reload your Prometheus instances, and then, in Query, we will enable `replica` a
```
thanos query \
--http-address 0.0.0.0:19192 \
--cluster.peers 127.0.0.1:19391 \
--cluster.peers 127.0.0.1:19392 \
--store 0.0.0.0:18091 \
--store 0.0.0.0:18092 \
--query.replica-label replica \ # Replica label for de-duplication
--store 1.2.3.4:19090 \
--store 1.2.3.5:19090 \
--query.replica-label replica # Replica label for de-duplication
```

Go to the configured HTTP address, and you should now be able to query across all Prometheus instances and receive de-duplicated data.
Expand All @@ -145,9 +146,24 @@ Go to the configured HTTP address, and you should now be able to query across al

## Communication Between Components

Components in a Thanos cluster can be connected through a gossip protocol to advertise membership and propagate metadata about other known nodes or by setting static store flags of known components. We added gossip to efficiently and dynamically discover other nodes in the cluster and the metrics information they can access.
The only required communication between nodes is for Thanos Querier to be able to reach gRPC storeAPIs you provide. Thanos Querier periodically calls Info endpoint to collect up-to-date metadata as well as checking the health of given StoreAPI.
The metadata includes the information about time windows and external labels for each node.

There are various ways to tell query component about the StoreAPIs it should query data from. The simplest way is to use a static list of well known addresses to query.
These are repeatable so can add as many endpoint as needed. You can put DNS domain prefixed by `dns://` or `dns+srv://` to have Thanos Query do an `A` or `SRV` lookup to get all required IPs to communicate with.

```
thanos query \
--http-address 0.0.0.0:19192 \ # Endpoint for Query UI
--grpc-address 0.0.0.0:19092 \ # gRPC endpoint for Store API
--store 1.2.3.4:19090 \ # Static gRPC Store API Address for the query node to query
--store 1.2.3.5:19090 \ # Also repeatable
--store dns://rest.thanos.peers:19092 # Use DNS lookup for getting all registered IPs as separate StoreAPIs
```

Read more details [here](/docs/service_discovery.md)

This is especially useful for the Query node to know all endpoints to query, time windows and external labels for each node, thus reducing the overhead of querying all nodes in the cluster.
### Deprecated: Gossip configuration.

Given a sidecar we can have it join a gossip cluster by advertising itself to other peers within the network.

Expand All @@ -162,7 +178,7 @@ thanos sidecar \
--http-address 0.0.0.0:19191 \ # HTTP endpoint for collecting metrics on Thanos sidecar
--cluster.address 0.0.0.0:19391 \ # Endpoint used to meta data about the current node
--cluster.advertise-address 127.0.0.1:19391 \ # Location at which the node advertise itself at to other members of the cluster
--cluster.peers 127.0.0.1:19391 \ # Static cluster peer where the node will get info about the cluster (repeatable)
--cluster.peers 127.0.0.1:19391 # Static cluster peer where the node will get info about the cluster (repeatable)
```

With the above configuration a single node will advertise itself in the cluster and query for other members of the cluster (from itself) when you add more sidecars / components you will probably want to sent `cluster.peers` to a well known peer that will allow you to discover other peers within the cluster.
Expand All @@ -177,31 +193,22 @@ thanos query \
--grpc-address 0.0.0.0:19092 \ # gRPC endpoint for Store API
--cluster.address 0.0.0.0:19591 \
--cluster.advertise-address 127.0.0.1:19591 \
--cluster.peers 127.0.0.1:19391 \ # Static cluster peer where the node will get info about the cluster
```

The Query component however does not have to utilize gossip to discover other nodes and instead can be setup to use a static list of well known addresses to query. These are repeatable so can add as many endpoint as needed. However, if you only use `store` you will automatically discover nodes added to the cluster.

```
thanos query \
--http-address 0.0.0.0:19192 \ # Endpoint for Query UI
--grpc-address 0.0.0.0:19092 \ # gRPC endpoint for Store API
--store 0.0.0.0:19091 \ # Static gRPC Store API Address for the query node to query
--store 0.0.0.0:19092 \ # Also repeatable
--cluster.peers 127.0.0.1:19391 # Static cluster peer where the node will get info about the cluster
```

You can mix both static `store` and `cluster` based approaches:

```
thanos query \
--http-address 0.0.0.0:19192 \ # Endpoint for Query UI
--grpc-address 0.0.0.0:19092 \ # gRPC endpoint for Store API
--http-address 0.0.0.0:19192 \ # Endpoint for Query UI
--grpc-address 0.0.0.0:19092 \ # gRPC endpoint for Store API
--cluster.address 0.0.0.0:19591 \
--cluster.advertise-address 127.0.0.1:19591 \
--cluster.peers 127.0.0.1:19391 \ # Static cluster peer where the node will get info about the cluster
--cluster.peers 127.0.0.1:19392 \ # Another cluster peer (many can be added to discover nodes)
--store 0.0.0.0:18091 \ # Static gRPC Store API Address for the query node to query
--store 0.0.0.0:18092 \ # Also repeatable
--cluster.peers 127.0.0.1:19391 \ # Static cluster peer where the node will get info about the cluster
--cluster.peers 127.0.0.1:19392 \ # Another cluster peer (many can be added to discover nodes)
--store 1.2.3.4:19090 \ # Static gRPC Store API Address for the query node to query
--store 1.2.3.5:19090 \ # Also repeatable
--store dns://rest.thanos.peers:19092 # Use DNS lookup for getting all registered IPs as separate StoreAPIs
```

When to use gossip vs store flags?
Expand All @@ -214,26 +221,25 @@ Additional flags for cluster configuration exist but are typically not needed. C
* _[Example Kubernetes manifest](../kube/manifests/prometheus.yaml)_
* _[Example Kubernetes manifest with GCS upload](../kube/manifests/prometheus-gcs.yaml)_

## Store Gateway
## [Store Gateway](components/store.md)

As the sidecar backs up data into the object storage of your choice, you can decrease Prometheus retention and store less locally. However we need a way to query all that historical data again.
The store gateway does just that by implementing the same gRPC data API as the sidecars but backing it with data it can find in your object storage bucket.
Just like sidecars and query nodes, the store gateway joins the gossip cluster and is automatically picked up by running query nodes as yet another data provider.
Just like sidecars and query nodes, the store gateway exposes StoreAPI and needs to be discovered by Thanos Querier.

```
thanos store \
--data-dir /var/thanos/store \ # Disk space for local caches
--objstore.config-file bucket_config.yaml \ # Bucket to fetch data from
--cluster.address 0.0.0.0:19891 \
--cluster.advertise-address 127.0.0.1:19891 \
--cluster.peers 127.0.0.1:19391 \
--http-address 0.0.0.0:19191 \ # HTTP endpoint for collecting metrics on the Store Gateway
--grpc-address 0.0.0.0:19090 # GRPC endpoint for StoreAPI
```

The store gateway occupies small amounts of disk space for caching basic information about data in the object storage. This will rarely exceed more than a few gigabytes and is used to improve restart times. It is not useful but not required to preserve it across restarts.

* _[Example Kubernetes manifest](../kube/manifests/thanos-store.yaml)_

## Compactor
## [Compactor](components/compact.md)

A local Prometheus installation periodically compacts older data to improve query efficieny. Since the sidecar backs up data as soon as possible, we need a way to apply the same process to data in the object storage.

Expand All @@ -242,13 +248,19 @@ The compactor component simple scans the object storage and processes compaction
```
thanos compact \
--data-dir /var/thanos/compact \ # Temporary workspace for data processing
--objstore.config-file bucket_config.yaml # Bucket where to apply the compacting
--objstore.config-file bucket_config.yaml \ # Bucket where to apply the compacting
--http-address 0.0.0.0:19191 # HTTP endpoint for collecting metrics on the Compactor
```

The compactor is not in the critical path of querying or data backup. It can either be run as a periodic batch job or be left running to always compact data as soon as possible. It is recommended to provide 100-300GB of local disk space for data processing.

_NOTE: The compactor must be run as a **singleton** and must not run when manually modifying data in the bucket._

## [Ruler](components/rule.md)

In case of Prometheus with Thanos sidecar does not have enough retention, or if you want to have alerts or recording rules that requires global view, Thanos offers [Ruler](components/rule.md)
which does rule and alert evaluation on top of given Thanos Querier.

# All-in-one example

You can find one-box example with minikube [here](../kube/README.md).
Expand Down

0 comments on commit 2f08fec

Please sign in to comment.