docs: Document Thanos Sharding

Signed-off-by: Xiang Dai <764524258@qq.com>
thanos-io · Dec 23, 2019 · 60457e2 · 60457e2
1 parent ed6087f
commit 60457e2
Showing 1 changed file with 95 additions and 0 deletions.
diff --git a/docs/sharing.md b/docs/sharing.md
@@ -0,0 +1,95 @@
+---
+title: Sharing
+type: docs
+menu: thanos
+slug: /Sharing.md
+---
+
+# Sharing
+
+Sharing is for Long Term Retention Storage.
+
+## Background
+
+Currently all components that read from object store assume that all the operations and functionality should be done based
+on **all** the available blocks that are present in the certain bucket's root directory.
+
+This is in most cases totally fine, however with time and allowance of storing blocks from multiple `Sources` into the same bucket,
+the number of objects in a bucket can grow drastically.
+
+This means that with time you might want to scale out certain components e.g:
+
+* Compactor: Larger number of objects does not matter much, however compactor has to scale (CPU, network) with number of Sources pushing blocks to the object storage.
+If you have multiple sources handled by the same compactor, with slower network and CPU you might not compact/downsample quick enough to cope with incoming blocks.
+    * This happens a lot if no compactor is deployed for longer periods and thus has to quickly catch up with large number of blocks (e.g couple of months).
+* Store Gateway: Queries against store gateway which are touching large number of Sources might be expensive, so it has to scale up with number of Sources if we assume those queries.
+    * Orthogonally we did not advertise any labels on Store Gateway's Info. This means that querier was not able to do any pre-filtering, so all store gateways in system are always touched for each query.
+
+### Reminder: What is a Source
+
+`Source` is a any component that creates new metrics in a form of Thanos TSDB blocks uploaded to the object storage. We differentiate Sources by `external labels`.
+Having unique sources has several benefits:
+
+ * Sources does not need to inject "global" source labels to all metrics (like `cluster, env, replica`). Those all the same for all metrics produced by source, we can assume that whole block has those.
+ * We can track what blocks are "duplicates": e.g in HA groups, where 2 replicas of Prometheus-es are scraping the same targets.
+ * We can track what source produced metrics in case of problems if any.
+
+Example Sources are: Sidecar, Rule, Thanos Receive.
+
+## Solution
+
+Now, we can specify `--selector.relabel-config` (and corresponding `--selector.relabel-config-file`) flag that will be used to filter out what blocks should be selected for operations in store and compact components.
+
+For example:
+
+* We want to run Compactor only for blocks with `external_labels` being `cluster=A`. We will run second Compactor for blocks with `cluster=B` external labels.
+* We want to browse only blocks with `external_labels` being `cluster=A` from object storage. We will run StoreGateway with selector of `cluster=A` from external labels of blocks.
+
+### Relabelling
+
+Similar to [promtail](https://github.com/grafana/loki/blob/master/docs/promtail.md#scrape-configs) this config will follow native
+[Prometheus relabel-config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) syntax.
+
+The relabel config will define filtering process done on **every** synchronization with object storage.
+
+We will allow potentially manipulating with several of inputs:
+
+* `__block_id`
+* External labels:
+  * `<name>`
+* `__block_objstore_bucket_endpoint`
+* `__block_objstore_bucket_name`
+* `__block_objstore_bucket_path`
+
+Output:
+
+* If output is empty, drop block.
+
+By default, on empty relabel-config, all external labels are assumed.
+Intuitively blocks without any external labels will be ignored.
+
+All blocks should compose as set of lables to advertise. The input should be based from original meta files. NOT from relabelling.
+The reasoning is covered in [`Next Steps`](#Future-Work) section
+
+Example usages would be:
+
+* Drop blocks which contains external labels cluster=A
+
+```yaml
+- action: drop
+  regex: "A"
+  source_labels:
+  - cluster
+```
+
+* Keep only blocks which contains external labels cluster=A
+```yaml
+- action: keep
+  regex: "A"
+  source_labels:
+  - cluster
+```
+
+## Plan
+
+What we plan to do or not is documented in [sharing proposals](./proposals/201909_thanos_sharding.md).