Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added thanos bucket replicate #2113

Merged
merged 14 commits into from
Feb 26, 2020
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ We use *breaking* word for marking changes that are not backward compatible (rel
- [#2049](https://github.com/thanos-io/thanos/pull/2049) Tracing: Support sampling on Elastic APM with new sample_rate setting.
- [#2008](https://github.com/thanos-io/thanos/pull/2008) Querier, Receiver, Sidecar, Store: Add gRPC [health check](https://github.com/grpc/grpc/blob/master/doc/health-checking.md) endpoints.
- [#2145](https://github.com/thanos-io/thanos/pull/2145) Tracing: track query sent to prometheus via remote read api.
- [#2113](https://github.com/thanos-io/thanos/pull/2113) Bucket: Added `thanos bucket replicate` (mixin todo).
daixiang0 marked this conversation as resolved.
Show resolved Hide resolved

### Changed

Expand Down
47 changes: 47 additions & 0 deletions cmd/thanos/bucket.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"fmt"
"os"
"sort"
"strconv"
"strings"
"text/template"
"time"
Expand All @@ -26,13 +27,15 @@ import (
"github.com/thanos-io/thanos/pkg/block"
"github.com/thanos-io/thanos/pkg/block/metadata"
"github.com/thanos-io/thanos/pkg/compact"
"github.com/thanos-io/thanos/pkg/compact/downsample"
"github.com/thanos-io/thanos/pkg/component"
"github.com/thanos-io/thanos/pkg/extflag"
"github.com/thanos-io/thanos/pkg/extprom"
extpromhttp "github.com/thanos-io/thanos/pkg/extprom/http"
"github.com/thanos-io/thanos/pkg/objstore"
"github.com/thanos-io/thanos/pkg/objstore/client"
"github.com/thanos-io/thanos/pkg/prober"
"github.com/thanos-io/thanos/pkg/replicate"
"github.com/thanos-io/thanos/pkg/runutil"
httpserver "github.com/thanos-io/thanos/pkg/server/http"
"github.com/thanos-io/thanos/pkg/ui"
Expand Down Expand Up @@ -69,6 +72,7 @@ func registerBucket(m map[string]setupFunc, app *kingpin.Application, name strin
registerBucketLs(m, cmd, name, objStoreConfig)
registerBucketInspect(m, cmd, name, objStoreConfig)
registerBucketWeb(m, cmd, name, objStoreConfig)
registerBucketReplicate(m, cmd, name, objStoreConfig)
}

func registerBucketVerify(m map[string]setupFunc, root *kingpin.CmdClause, name string, objStoreConfig *extflag.PathOrContent) {
Expand Down Expand Up @@ -377,6 +381,49 @@ func registerBucketWeb(m map[string]setupFunc, root *kingpin.CmdClause, name str
}
}

// Provide a list of resolution, can not use Enum directly, since string does not implement int64 function.
func listResLevel() []string {
return []string{
strconv.FormatInt(downsample.ResLevel0, 10),
strconv.FormatInt(downsample.ResLevel1, 10),
strconv.FormatInt(downsample.ResLevel2, 10)}
}

func registerBucketReplicate(m map[string]setupFunc, root *kingpin.CmdClause, name string, objStoreConfig *extflag.PathOrContent) {
cmd := root.Command("replicate", fmt.Sprintf("Replicate data from one object storage to another. NOTE: Currently it works only with Thanos blocks (%v has to have Thanos metadata).", block.MetaFilename))
httpBindAddr, httpGracePeriod := regHTTPFlags(cmd)
toObjStoreConfig := regCommonObjStoreFlags(cmd, "-to", false, "The object storage which replicate data to.")
// TODO(bwplotka): Allow to replicate many resolution levels.
resolution := cmd.Flag("resolution", "Only blocks with this resolution will be replicated.").Default(strconv.FormatInt(downsample.ResLevel0, 10)).HintAction(listResLevel).Int64()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I think we might need actually Int64s for it as we might want to replicate more than one resolution. Let's add TODO for now.

Also resolution-level ?

Suggested change
resolution := cmd.Flag("resolution", "Only blocks with this resolution will be replicated.").Default(strconv.FormatInt(downsample.ResLevel0, 10)).HintAction(listResLevel).Int64()
// TODO(bwplotka): Allow to replicate many compaction levels.
resolution := cmd.Flag("resolution-level", "Only blocks with this resolution will be replicated.").Default(strconv.FormatInt(downsample.ResLevel0, 10)).HintAction(listResLevel).Int64()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact it does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? I proposed resolution-level (: I might be wrong, but you should comment that I am wrong, because of X not ignore it. But I am pretty sure you just did not notice (:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that you mean add more level support for both resolution and compaction, so i added two TODOs here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinpin supports Int but not Int64, so still need to use String here.

// TODO(bwplotka): Allow to replicate many compaction levels.
compaction := cmd.Flag("compaction", "Only blocks with this compaction level will be replicated.").Default("1").Int()
daixiang0 marked this conversation as resolved.
Show resolved Hide resolved
matcherStrs := cmd.Flag("matcher", "Only blocks whose external labels exactly match this matcher will be replicated.").PlaceHolder("key=\"value\"").Strings()
singleRun := cmd.Flag("single-run", "Run replication only one time, then exit.").Default("false").Bool()

m[name+" replicate"] = func(g *run.Group, logger log.Logger, reg *prometheus.Registry, tracer opentracing.Tracer, _ <-chan struct{}, _ bool) error {
matchers, err := replicate.ParseFlagMatchers(*matcherStrs)
if err != nil {
return errors.Wrap(err, "parse block label matchers")
}

return replicate.RunReplicate(
g,
logger,
reg,
tracer,
*httpBindAddr,
time.Duration(*httpGracePeriod),
matchers,
compact.ResolutionLevel(*resolution),
*compaction,
objStoreConfig,
toObjStoreConfig,
*singleRun,
)
}

}

// refresh metadata from remote storage periodically and update UI.
func refresh(ctx context.Context, logger log.Logger, bucketUI *ui.Bucket, duration time.Duration, timeout time.Duration, name string, reg *prometheus.Registry, objStoreConfig *extflag.PathOrContent) error {
confContentYaml, err := objStoreConfig.Content()
Expand Down
74 changes: 74 additions & 0 deletions docs/components/bucket.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@ Subcommands:
bucket web [<flags>]
Web interface for remote storage bucket
bucket replicate [<flags>]
Replicate data from one object storage to another. NOTE: Currently it works
only with Thanos blocks (meta.json has to have Thanos metadata).
```

Expand Down Expand Up @@ -315,3 +319,73 @@ Flags:
--timeout=5m Timeout to download metadata from remote storage
```

### replicate

`bucket replicate` is used to replicate buckets from one object storage to another.
daixiang0 marked this conversation as resolved.
Show resolved Hide resolved

NOTE: Currently it works only with Thanos blocks (meta.json has to have Thanos metadata).

Example:
```
$ thanos bucket replicate --objstore.config-file="..." --objstore-to.config="..."
```

[embedmd]:# (flags/bucket_replicate.txt)
```txt
usage: thanos bucket replicate [<flags>]
Replicate data from one object storage to another. NOTE: Currently it works only
with Thanos blocks (meta.json has to have Thanos metadata).
Flags:
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--version Show application version.
--log.level=info Log filtering level.
--log.format=logfmt Log format to use. Possible options: logfmt or
json.
--tracing.config-file=<file-path>
Path to YAML file with tracing configuration.
See format details:
https://thanos.io/tracing.md/#configuration
--tracing.config=<content>
Alternative to 'tracing.config-file' flag
(lower priority). Content of YAML file with
tracing configuration. See format details:
https://thanos.io/tracing.md/#configuration
--objstore.config-file=<file-path>
Path to YAML file that contains object store
configuration. See format details:
https://thanos.io/storage.md/#configuration
--objstore.config=<content>
Alternative to 'objstore.config-file' flag
(lower priority). Content of YAML file that
contains object store configuration. See format
details:
https://thanos.io/storage.md/#configuration
--http-address="0.0.0.0:10902"
Listen host:port for HTTP endpoints.
--http-grace-period=2m Time to wait after an interrupt received for
HTTP Server.
--objstore-to.config-file=<file-path>
Path to YAML file that contains object store-to
configuration. See format details:
https://thanos.io/storage.md/#configuration The
object storage which replicate data to.
--objstore-to.config=<content>
Alternative to 'objstore-to.config-file' flag
(lower priority). Content of YAML file that
contains object store-to configuration. See
format details:
https://thanos.io/storage.md/#configuration The
object storage which replicate data to.
--resolution=0 Only blocks with this resolution will be
replicated.
--compaction=1 Only blocks with this compaction level will be
replicated.
--matcher=key="value" ... Only blocks whose external labels exactly match
this matcher will be replicated.
--single-run Run replication only one time, then exit.
```
63 changes: 63 additions & 0 deletions mixin/thanos/alerts/replicate.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
{
local thanos = self,
replicator+:: {
jobPrefix: error 'must provide job prefix for Thanos Replicate dashboard',
selector: error 'must provide selector for Thanos Replicate dashboard',
},
prometheusAlerts+:: {
groups+: [
{
name: 'thanos-replicate.rules',
rules: [
{
alert: 'ThanosReplicateIsDown',
expr: |||
absent(up{%(selector)s})
||| % thanos.replicator,
'for': '5m',
labels: {
severity: 'critical',
},
annotations: {
message: 'Thanos Replicate has disappeared from Prometheus target discovery.',
},
},
{
alert: 'ThanosReplicateErrorRate',
annotations: {
message: 'Thanos Replicate failing to run, {{ $value | humanize }}% of attempts failed.',
},
expr: |||
(
sum(rate(thanos_replicate_replication_runs_total{result="error", %(selector)s}[5m]))
/ on (namespace) group_left
sum(rate(thanos_replicate_replication_runs_total{%(selector)s}[5m]))
) * 100 >= 10
||| % thanos.replicator,
'for': '5m',
labels: {
severity: 'critical',
},
},
{
alert: 'ThanosReplicateRunLatency',
annotations: {
message: 'Thanos Replicate {{$labels.job}} has a 99th percentile latency of {{ $value }} seconds for the replicate operations.',
},
expr: |||
(
histogram_quantile(0.9, sum by (job, le) (thanos_replicate_replication_run_duration_seconds_bucket{%(selector)s})) > 120
and
sum by (job) (rate(thanos_replicate_replication_run_duration_seconds_bucket{%(selector)s}[5m])) > 0
)
||| % thanos.replicator,
'for': '5m',
labels: {
severity: 'critical',
},
},
],
},
],
},
}
58 changes: 58 additions & 0 deletions mixin/thanos/dashboards/replicate.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
local g = import '../lib/thanos-grafana-builder/builder.libsonnet';

{
local thanos = self,
replicator+:: {
daixiang0 marked this conversation as resolved.
Show resolved Hide resolved
jobPrefix: error 'must provide job prefix for Thanos Replicate dashboard',
selector: error 'must provide selector for Thanos Replicate dashboard',
title: error 'must provide title for Thanos Replicate dashboard',
},
grafanaDashboards+:: {
'replicate.json':
g.dashboard(thanos.replicator.title)
.addRow(
g.row('Replicate Runs')
.addPanel(
g.panel('Rate') +
g.qpsErrTotalPanel(
'thanos_replicate_replication_runs_total{result="error", namespace="$namespace",%(selector)s}' % thanos.replicator,
'thanos_replicate_replication_runs_total{namespace="$namespace",%(selector)s}' % thanos.replicator,
)
)
.addPanel(
g.panel('Errors', 'Shows rate of errors.') +
g.queryPanel(
'sum(rate(thanos_replicate_replication_runs_total{result="error", namespace="$namespace",%(selector)s}[$interval])) by (result)' % thanos.replicator,
'{{result}}'
) +
{ yaxes: g.yaxes('percentunit') } +
g.stack
)
.addPanel(
g.panel('Duration', 'Shows how long has it taken to run a replication cycle.') +
g.latencyPanel('thanos_replicate_replication_run_duration_seconds', 'result="success", namespace="$namespace",%(selector)s' % thanos.replicator)
)
)
.addRow(
g.row('Replication')
.addPanel(
g.panel('Metrics') +
g.queryPanel(
[
'sum(rate(thanos_replicate_origin_iterations_total{namespace="$namespace",%(selector)s}[$interval]))' % thanos.replicator,
'sum(rate(thanos_replicate_origin_meta_loads_total{namespace="$namespace",%(selector)s}[$interval]))' % thanos.replicator,
'sum(rate(thanos_replicate_origin_partial_meta_reads_total{namespace="$namespace",%(selector)s}[$interval]))' % thanos.replicator,
'sum(rate(thanos_replicate_blocks_already_replicated_total{namespace="$namespace",%(selector)s}[$interval]))' % thanos.replicator,
'sum(rate(thanos_replicate_blocks_replicated_total{namespace="$namespace",%(selector)s}[$interval]))' % thanos.replicator,
'sum(rate(thanos_replicate_objects_replicated_total{namespace="$namespace",%(selector)s}[$interval]))' % thanos.replicator,
],
['iterations', 'meta loads', 'partial meta reads', 'already replicated blocks', 'replicated blocks', 'replicated objects']
)
)
)
+
g.template('namespace', 'kube_pod_info') +
g.template('job', 'up', 'namespace="$namespace",%(selector)s' % thanos.replicator, true, '%(jobPrefix)s.*' % thanos.replicator),
},
} +
(import 'defaults.libsonnet')
15 changes: 15 additions & 0 deletions mixin/thanos/rules/replicate.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
local thanos = self,
replicator+:: {
selector: error 'must provide selector for Thanos Replicate dashboard',
},
prometheusRules+:: {
groups+: [
{
name: 'thanos-replicate.rules',
rules: [
],
},
],
},
}
1 change: 1 addition & 0 deletions pkg/component/component.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,5 @@ var (
Sidecar = sourceStoreAPI{component: component{name: "sidecar"}}
Store = sourceStoreAPI{component: component{name: "store"}}
Receive = sourceStoreAPI{component: component{name: "receive"}}
Replicate = sourceStoreAPI{component: component{name: "replicate"}}
)
Loading