Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set defaults to query ingesters, not store, for recent data #1909

Merged
merged 2 commits into from
May 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
### Grafana Mimir

* [CHANGE] Increased default configuration for `-server.grpc-max-recv-msg-size-bytes` and `-server.grpc-max-send-msg-size-bytes` from 4MB to 100MB. #1883
* [CHANGE] Default values have changed for the following settings. This improves query performance for recent data (within 12h) by only reading from ingesters: #1909
- `-blocks-storage.bucket-store.ignore-blocks-within` now defaults to `10h` (previously `0`)
- `-querier.query-store-after` now defaults to `12h` (previously `0`)
* [ENHANCEMENT] Store-gateway: Add the experimental ability to run requests in a dedicated OS thread pool. This feature can be configured using `-store-gateway.thread-pool-size` and is disabled by default. Replaces the ability to run index header operations in a dedicated thread pool. #1660 #1812
* [BUGFIX] Fix regexp parsing panic for regexp label matchers with start/end quantifiers. #1883
* [BUGFIX] Ingester: fixed deceiving error log "failed to update cached shipped blocks after shipper initialisation", occurring for each new tenant in the ingester. #1893
Expand Down
4 changes: 2 additions & 2 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -1385,7 +1385,7 @@
"required": false,
"desc": "The time after which a metric should be queried from storage and not just ingesters. 0 means all queries are sent to store. If this option is enabled, the time range of the query sent to the store-gateway will be manipulated to ensure the query end is not more recent than 'now - query-store-after'.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldDefaultValue": 43200000000000,
"fieldFlag": "querier.query-store-after",
"fieldType": "duration"
},
Expand Down Expand Up @@ -5004,7 +5004,7 @@
"required": false,
"desc": "Blocks with minimum time within this duration are ignored, and not loaded by store-gateway. Useful when used together with -querier.query-store-after to prevent loading young blocks, because there are usually many of them (depending on number of ingesters) and they are not yet compacted. Negative values or 0 disable the filter.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldDefaultValue": 36000000000000,
"fieldFlag": "blocks-storage.bucket-store.ignore-blocks-within",
"fieldType": "duration",
"fieldCategory": "advanced"
Expand Down
4 changes: 2 additions & 2 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.bucket-store.consistency-delay duration
Minimum age of a block before it's being read. Set it to safe value (e.g 30m) if your object storage is eventually consistent. GCS and S3 are (roughly) strongly consistent.
-blocks-storage.bucket-store.ignore-blocks-within duration
Blocks with minimum time within this duration are ignored, and not loaded by store-gateway. Useful when used together with -querier.query-store-after to prevent loading young blocks, because there are usually many of them (depending on number of ingesters) and they are not yet compacted. Negative values or 0 disable the filter.
Blocks with minimum time within this duration are ignored, and not loaded by store-gateway. Useful when used together with -querier.query-store-after to prevent loading young blocks, because there are usually many of them (depending on number of ingesters) and they are not yet compacted. Negative values or 0 disable the filter. (default 10h0m0s)
-blocks-storage.bucket-store.ignore-deletion-marks-delay duration
Duration after which the blocks marked for deletion will be filtered out while fetching blocks. The idea of ignore-deletion-marks-delay is to ignore blocks that are marked for deletion with some delay. This ensures store can still serve blocks that are meant to be deleted but do not have a replacement yet. (default 1h0m0s)
-blocks-storage.bucket-store.index-cache.backend string
Expand Down Expand Up @@ -1082,7 +1082,7 @@ Usage of ./cmd/mimir/mimir:
-querier.query-ingesters-within duration
Maximum lookback beyond which queries are not sent to ingester. 0 means all queries are sent to ingester. (default 13h0m0s)
-querier.query-store-after duration
The time after which a metric should be queried from storage and not just ingesters. 0 means all queries are sent to store. If this option is enabled, the time range of the query sent to the store-gateway will be manipulated to ensure the query end is not more recent than 'now - query-store-after'.
The time after which a metric should be queried from storage and not just ingesters. 0 means all queries are sent to store. If this option is enabled, the time range of the query sent to the store-gateway will be manipulated to ensure the query end is not more recent than 'now - query-store-after'. (default 12h0m0s)
-querier.scheduler-address string
Address of the query-scheduler component, in host:port format. Only one of -querier.frontend-address or -querier.scheduler-address can be set. If neither is set, queries are only received via HTTP endpoint.
-querier.shuffle-sharding-ingesters-lookback-period duration
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/help.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ Usage of ./cmd/mimir/mimir:
-querier.query-ingesters-within duration
Maximum lookback beyond which queries are not sent to ingester. 0 means all queries are sent to ingester. (default 13h0m0s)
-querier.query-store-after duration
The time after which a metric should be queried from storage and not just ingesters. 0 means all queries are sent to store. If this option is enabled, the time range of the query sent to the store-gateway will be manipulated to ensure the query end is not more recent than 'now - query-store-after'.
The time after which a metric should be queried from storage and not just ingesters. 0 means all queries are sent to store. If this option is enabled, the time range of the query sent to the store-gateway will be manipulated to ensure the query end is not more recent than 'now - query-store-after'. (default 12h0m0s)
-querier.scheduler-address string
Address of the query-scheduler component, in host:port format. Only one of -querier.frontend-address or -querier.scheduler-address can be set. If neither is set, queries are only received via HTTP endpoint.
-querier.timeout duration
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -801,7 +801,7 @@ The `querier` block configures the querier.
# the time range of the query sent to the store-gateway will be manipulated to
# ensure the query end is not more recent than 'now - query-store-after'.
# CLI flag: -querier.query-store-after
[query_store_after: <duration> | default = 0s]
[query_store_after: <duration> | default = 12h]

# (advanced) Maximum duration into the future you can query. 0 to disable.
# CLI flag: -querier.max-query-into-future
Expand Down Expand Up @@ -3331,7 +3331,7 @@ bucket_store:
# are usually many of them (depending on number of ingesters) and they are not
# yet compacted. Negative values or 0 disable the filter.
# CLI flag: -blocks-storage.bucket-store.ignore-blocks-within
[ignore_blocks_within: <duration> | default = 0s]
[ignore_blocks_within: <duration> | default = 10h]

# (advanced) Max size - in bytes - of a chunks pool, used to reduce memory
# allocations. The pool is shared across all tenants. 0 to disable the limit.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,15 @@ When running Grafana Mimir at scale, querying non-compacted blocks might be inef
- Non compacted blocks contain duplicated samples, as a result of the ingesters replication.
- Querying many small TSDB indexes is slower than querying a few compacted TSDB indexes.

Configure Grafana Mimir to ensure only compacted blocks are queried:
The default values for `-querier.query-store-after`, `-querier.query-ingesters-within`, and `-blocks-storage.bucket-store.ignore-blocks-within` are set such that only compacted blocks are queried. In most cases, no additional configuration is required.

Configure Grafana Mimir so large tenants are parallelized by the compactor:

1. Configure compactor's `-compactor.split-and-merge-shards` and `-compactor.split-groups` for every tenant with more than 20 million active series. For more information about configuring the compactor's split and merge shards, refer to [compactor]({{< relref "../../architecture/components/compactor/index.md" >}}).
1. Configure querier's `-querier.query-store-after` equal to `-querier.query-ingesters-within` minus five minutes. The five-minute delta is recommended to ensure the time range on the boundary is queried both from ingesters and queriers.

#### How to estimate `-querier.query-store-after`

Set the `-querier.query-store-after` to a duration that is large enough to give compactor enough time to compact newly uploaded blocks, and queriers and store-gateways to discover and synchronize newly compacted blocks.
If not using the defaults, set the `-querier.query-store-after` to a duration that is large enough to give compactor enough time to compact newly uploaded blocks, and queriers and store-gateways to discover and synchronize newly compacted blocks.
Copy link
Contributor

@osg-grafana osg-grafana May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "you are", @56quarters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated into a new followup PR: #1915


The following diagram shows all of the timings involved in the estimation. This diagram should be used only as a template and you can modify the assumptions based on real measurements in your Mimir cluster. The example makes the following assumptions:

Expand Down
1 change: 1 addition & 0 deletions integration/getting_started_with_gossiped_ring_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ func TestGettingStartedWithGossipedRing(t *testing.T) {
"-ingester.ring.observe-period": "5s", // to avoid conflicts in tokens
"-blocks-storage.bucket-store.bucket-index.enabled": "false",
"-blocks-storage.bucket-store.sync-interval": "1s", // sync continuously
"-blocks-storage.bucket-store.ignore-blocks-within": "0",
"-blocks-storage.backend": "s3",
"-blocks-storage.s3.bucket-name": blocksBucketName,
"-blocks-storage.s3.access-key-id": e2edb.MinioAccessKey,
Expand Down
1 change: 1 addition & 0 deletions integration/ingester_sharding_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ func TestIngesterSharding(t *testing.T) {

// Enable shuffle sharding on read path but not lookback, otherwise all ingesters would be
// queried being just registered.
flags["-querier.query-store-after"] = "0"
flags["-querier.shuffle-sharding-ingesters-lookback-period"] = "1ns"

// Start dependencies.
Expand Down
1 change: 0 additions & 1 deletion integration/querier_sharding_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ func runQuerierShardingTest(t *testing.T, cfg querierShardingTestConfig) {

flags := mergeFlags(BlocksStorageFlags(), map[string]string{
"-query-frontend.cache-results": "true",
"-querier.query-ingesters-within": "12h", // Required by the test on query /series out of ingesters time range
"-query-frontend.results-cache.backend": "memcached",
"-query-frontend.results-cache.memcached.addresses": "dns+" + memcached.NetworkEndpoint(e2ecache.MemcachedPort),
"-query-frontend.results-cache.compression": "snappy",
Expand Down
1 change: 0 additions & 1 deletion integration/querier_tenant_federation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ func runQuerierTenantFederationTest(t *testing.T, cfg querierTenantFederationCon

flags := mergeFlags(BlocksStorageFlags(), map[string]string{
"-query-frontend.cache-results": "true",
"-querier.query-ingesters-within": "12h", // Required by the test on query /series out of ingesters time range
"-query-frontend.results-cache.backend": "memcached",
"-query-frontend.results-cache.memcached.addresses": "dns+" + memcached.NetworkEndpoint(e2ecache.MemcachedPort),
"-tenant-federation.enabled": "true",
Expand Down
10 changes: 9 additions & 1 deletion integration/querier_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,10 @@ func TestQuerierWithBlocksStorageRunningInMicroservicesMode(t *testing.T) {
"-blocks-storage.bucket-store.index-cache.memcached.addresses": "dns+" + memcached.NetworkEndpoint(e2ecache.MemcachedPort),
"-blocks-storage.bucket-store.sync-interval": "1s",
"-blocks-storage.bucket-store.index-cache.backend": testCfg.indexCacheBackend,
"-blocks-storage.bucket-store.ignore-blocks-within": "0",
"-blocks-storage.bucket-store.bucket-index.enabled": strconv.FormatBool(testCfg.bucketIndexEnabled),
"-store-gateway.tenant-shard-size": fmt.Sprintf("%d", testCfg.tenantShardSize),
"-querier.query-store-after": "0",
"-query-frontend.query-stats-enabled": "true",
"-query-frontend.parallelize-shardable-queries": strconv.FormatBool(testCfg.queryShardingEnabled),
})
Expand Down Expand Up @@ -348,10 +350,12 @@ func TestQuerierWithBlocksStorageRunningInSingleBinaryMode(t *testing.T) {
"-blocks-storage.tsdb.block-ranges-period": blockRangePeriod.String(),
"-blocks-storage.tsdb.ship-interval": "1s",
"-blocks-storage.bucket-store.sync-interval": "1s",
"-blocks-storage.bucket-store.ignore-blocks-within": "0",
"-blocks-storage.tsdb.retention-period": ((blockRangePeriod * 2) - 1).String(),
"-blocks-storage.bucket-store.index-cache.backend": testCfg.indexCacheBackend,
"-blocks-storage.bucket-store.index-cache.memcached.addresses": "dns+" + memcached.NetworkEndpoint(e2ecache.MemcachedPort),
"-blocks-storage.bucket-store.bucket-index.enabled": strconv.FormatBool(testCfg.bucketIndexEnabled),

// Ingester.
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consul.NetworkHTTPEndpoint(),
Expand All @@ -366,6 +370,8 @@ func TestQuerierWithBlocksStorageRunningInSingleBinaryMode(t *testing.T) {
"-compactor.ring.store": "consul",
"-compactor.ring.consul.hostname": consul.NetworkHTTPEndpoint(),
"-compactor.cleanup-interval": "2s", // Update bucket index often.
// Querier.
"-querier.query-store-after": "0",
})

// Start Mimir replicas.
Expand Down Expand Up @@ -765,10 +771,12 @@ func TestQuerierWithBlocksStorageOnMissingBlocksFromStorage(t *testing.T) {
// blocks (less than 3*sync-interval age) as they could be unnoticed by the store-gateway and ingesters
// have them anyway. We turn down the sync-interval to speed up the test.
storeGateway := e2emimir.NewStoreGateway("store-gateway", consul.NetworkHTTPEndpoint(), mergeFlags(flags, map[string]string{
"-blocks-storage.bucket-store.sync-interval": "1s",
"-blocks-storage.bucket-store.sync-interval": "1s",
"-blocks-storage.bucket-store.ignore-blocks-within": "0",
}))
querier := e2emimir.NewQuerier("querier", consul.NetworkHTTPEndpoint(), mergeFlags(flags, map[string]string{
"-blocks-storage.bucket-store.sync-interval": "1s",
"-querier.query-store-after": "0",
}))
require.NoError(t, s.StartAndWaitReady(querier, storeGateway))

Expand Down
1 change: 0 additions & 1 deletion integration/query_frontend_cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ func TestQueryFrontendUnalignedQuery(t *testing.T) {
flags = mergeFlags(flags, map[string]string{
"-query-frontend.cache-results": "true",
"-query-frontend.split-queries-by-interval": "2m",
"-querier.query-ingesters-within": "12h", // Required by the test on query /series out of ingesters time range
"-query-frontend.align-querier-with-step": "true",
"-query-frontend.max-cache-freshness": "0", // Cache everything.
"-query-frontend.results-cache.backend": "memcached",
Expand Down
1 change: 0 additions & 1 deletion integration/query_frontend_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,6 @@ func runQueryFrontendTest(t *testing.T, cfg queryFrontendTestConfig) {

flags = mergeFlags(flags, map[string]string{
"-query-frontend.cache-results": "true",
"-querier.query-ingesters-within": "12h", // Required by the test on query /series out of ingesters time range
"-query-frontend.results-cache.backend": "memcached",
"-query-frontend.results-cache.memcached.addresses": "dns+" + memcached.NetworkEndpoint(e2ecache.MemcachedPort),
"-query-frontend.query-stats-enabled": strconv.FormatBool(cfg.queryStatsEnabled),
Expand Down
2 changes: 1 addition & 1 deletion pkg/querier/querier.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.BoolVar(&cfg.BatchIterators, "querier.batch-iterators", true, "Use batch iterators to execute query, as opposed to fully materialising the series in memory. Takes precedent over the -querier.iterators flag.")
f.DurationVar(&cfg.QueryIngestersWithin, "querier.query-ingesters-within", 13*time.Hour, "Maximum lookback beyond which queries are not sent to ingester. 0 means all queries are sent to ingester.")
f.DurationVar(&cfg.MaxQueryIntoFuture, "querier.max-query-into-future", 10*time.Minute, "Maximum duration into the future you can query. 0 to disable.")
f.DurationVar(&cfg.QueryStoreAfter, "querier.query-store-after", 0, "The time after which a metric should be queried from storage and not just ingesters. 0 means all queries are sent to store. If this option is enabled, the time range of the query sent to the store-gateway will be manipulated to ensure the query end is not more recent than 'now - query-store-after'.")
f.DurationVar(&cfg.QueryStoreAfter, "querier.query-store-after", 12*time.Hour, "The time after which a metric should be queried from storage and not just ingesters. 0 means all queries are sent to store. If this option is enabled, the time range of the query sent to the store-gateway will be manipulated to ensure the query end is not more recent than 'now - query-store-after'.")
f.DurationVar(&cfg.ShuffleShardingIngestersLookbackPeriod, "querier.shuffle-sharding-ingesters-lookback-period", 0, "When distributor's sharding strategy is shuffle-sharding and this setting is > 0, queriers fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since 'now - lookback period'. The lookback period should be greater or equal than the configured -querier.query-store-after and -querier.query-ingesters-within. If this setting is 0, queriers always query all ingesters (ingesters shuffle sharding on read path is disabled).")

cfg.EngineConfig.RegisterFlags(f)
Expand Down
2 changes: 1 addition & 1 deletion pkg/storage/tsdb/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ func (cfg *BucketStoreConfig) RegisterFlags(f *flag.FlagSet) {
f.DurationVar(&cfg.ConsistencyDelay, "blocks-storage.bucket-store.consistency-delay", 0, "Minimum age of a block before it's being read. Set it to safe value (e.g 30m) if your object storage is eventually consistent. GCS and S3 are (roughly) strongly consistent.")
f.DurationVar(&cfg.IgnoreDeletionMarksDelay, "blocks-storage.bucket-store.ignore-deletion-marks-delay", time.Hour*1, "Duration after which the blocks marked for deletion will be filtered out while fetching blocks. "+
"The idea of ignore-deletion-marks-delay is to ignore blocks that are marked for deletion with some delay. This ensures store can still serve blocks that are meant to be deleted but do not have a replacement yet.")
f.DurationVar(&cfg.IgnoreBlocksWithin, "blocks-storage.bucket-store.ignore-blocks-within", 0, "Blocks with minimum time within this duration are ignored, and not loaded by store-gateway. Useful when used together with -querier.query-store-after to prevent loading young blocks, because there are usually many of them (depending on number of ingesters) and they are not yet compacted. Negative values or 0 disable the filter.")
f.DurationVar(&cfg.IgnoreBlocksWithin, "blocks-storage.bucket-store.ignore-blocks-within", 10*time.Hour, "Blocks with minimum time within this duration are ignored, and not loaded by store-gateway. Useful when used together with -querier.query-store-after to prevent loading young blocks, because there are usually many of them (depending on number of ingesters) and they are not yet compacted. Negative values or 0 disable the filter.")
f.IntVar(&cfg.PostingOffsetsInMemSampling, "blocks-storage.bucket-store.posting-offsets-in-mem-sampling", DefaultPostingOffsetInMemorySampling, "Controls what is the ratio of postings offsets that the store will hold in memory.")
f.BoolVar(&cfg.IndexHeaderLazyLoadingEnabled, "blocks-storage.bucket-store.index-header-lazy-loading-enabled", true, "If enabled, store-gateway will lazy load an index-header only once required by a query.")
f.DurationVar(&cfg.IndexHeaderLazyLoadingIdleTimeout, "blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout", 60*time.Minute, "If index-header lazy loading is enabled and this setting is > 0, the store-gateway will offload unused index-headers after 'idle timeout' inactivity.")
Expand Down
Loading