Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruler: validation and logging with disabled tenant federation #4555

Merged
merged 10 commits into from
Mar 27, 2023
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
* [CHANGE] Query-frontend: Cached results now contain timestamp which allows Mimir to check if cached results are still valid based on current TTL configured for tenant. Results cached by previous Mimir version are used until they expire from cache, which can take up to 7 days. If you need to use per-tenant TTL sooner, please flush results cache manually. #4439
* [CHANGE] Ingester: the `cortex_ingester_tsdb_wal_replay_duration_seconds` metrics has been removed. #4465
* [CHANGE] Query-frontend: use protobuf internal query result payload format by default. This feature is no longer considered experimental. #4557
* [CHANGE] Ruler: reject creating federated rule groups while tenant federation is disabled. Previously the rule groups would be silently dropped during bucket sync. #4555
* [FEATURE] Cache: Introduce experimental support for using Redis for results, chunks, index, and metadata caches. #4371
* [FEATURE] Vault: Introduce experimental integration with Vault to fetch secrets used to configure TLS for clients. Server TLS secrets will still be read from a file. `tls-ca-path`, `tls-cert-path` and `tls-key-path` will denote the path in Vault for the following CLI flags when `-vault.enabled` is true: #4446.
* `-distributor.ha-tracker.etcd.*`
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -10063,7 +10063,7 @@
"kind": "field",
"name": "enabled",
"required": false,
"desc": "Enable running rule groups against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are already created federated rule groups, then these rules groups will be skipped during evaluations.",
"desc": "Enable rule groups to query against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are federated rule groups that already exist, then these rules groups will be skipped during evaluations.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "ruler.tenant-federation.enabled",
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -2176,7 +2176,7 @@ Usage of ./cmd/mimir/mimir:
-ruler.rule-path string
Directory to store temporary rule files loaded by the Prometheus rule managers. This directory is not required to be persisted between restarts. (default "./data-ruler/")
-ruler.tenant-federation.enabled
Enable running rule groups against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are already created federated rule groups, then these rules groups will be skipped during evaluations.
Enable rule groups to query against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are federated rule groups that already exist, then these rules groups will be skipped during evaluations.
-ruler.tenant-shard-size int
The tenant's shard size when sharding is used by ruler. Value of 0 disables shuffle sharding for the tenant, and tenant rules will be sharded across all ruler replicas.
-runtime-config.file comma-separated-list-of-strings
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/help.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -594,7 +594,7 @@ Usage of ./cmd/mimir/mimir:
-ruler.rule-path string
Directory to store temporary rule files loaded by the Prometheus rule managers. This directory is not required to be persisted between restarts. (default "./data-ruler/")
-ruler.tenant-federation.enabled
Enable running rule groups against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are already created federated rule groups, then these rules groups will be skipped during evaluations.
Enable rule groups to query against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are federated rule groups that already exist, then these rules groups will be skipped during evaluations.
-ruler.tenant-shard-size int
The tenant's shard size when sharding is used by ruler. Value of 0 disables shuffle sharding for the tenant, and tenant rules will be sharded across all ruler replicas.
-runtime-config.file comma-separated-list-of-strings
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1596,10 +1596,10 @@ query_frontend:
[query_result_response_format: <string> | default = "json"]

tenant_federation:
# Enable running rule groups against multiple tenants. The tenant IDs involved
# need to be in the rule group's 'source_tenants' field. If this flag is set
# to 'false' when there are already created federated rule groups, then these
# rules groups will be skipped during evaluations.
# Enable rule groups to query against multiple tenants. The tenant IDs
# involved need to be in the rule group's 'source_tenants' field. If this flag
# is set to 'false' when there are federated rule groups that already exist,
# then these rules groups will be skipped during evaluations.
# CLI flag: -ruler.tenant-federation.enabled
[enabled: <boolean> | default = false]
```
Expand Down
2 changes: 1 addition & 1 deletion pkg/mimir/modules.go
Original file line number Diff line number Diff line change
Expand Up @@ -700,7 +700,7 @@ func (t *Mimir) initRuler() (serv services.Service, err error) {

if t.Cfg.Ruler.TenantFederation.Enabled {
if !t.Cfg.TenantFederation.Enabled {
return nil, errors.New("-ruler.tenant-federation.enabled=true requires -tenant-federation.enabled=true")
return nil, errors.New("-" + ruler.TenantFederationFlag + "=true requires -tenant-federation.enabled=true")
}
// Setting bypassForSingleQuerier=false forces `tenantfederation.NewQueryable` to add
// the `__tenant_id__` label on all metrics regardless if they're for a single tenant or multiple tenants.
Expand Down
49 changes: 40 additions & 9 deletions pkg/ruler/api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -376,50 +376,78 @@ func TestRuler_PrometheusAlerts(t *testing.T) {
}

func TestRuler_Create(t *testing.T) {
cfg := defaultRulerConfig(t)
defaultCfg := defaultRulerConfig(t)

r := prepareRuler(t, cfg, newMockRuleStore(make(map[string]rulespb.RuleGroupList)), withStart())
a := NewAPI(r, r.store, log.NewNopLogger())
cfgWithTenantFederation := defaultRulerConfig(t)
cfgWithTenantFederation.TenantFederation.Enabled = true

tc := []struct {
name string
cfg Config
input string
output string
err error
status int
}{
{
name: "with an empty payload",
cfg: defaultCfg,
input: "",
status: 400,
err: errors.New("invalid rules config: rule group name must not be empty"),
err: errors.New("invalid rules configuration: rule group name must not be empty"),
},
{
name: "with no rule group name",
cfg: defaultCfg,
input: `
interval: 15s
rules:
- record: up_rule
expr: up
`,
status: 400,
err: errors.New("invalid rules config: rule group name must not be empty"),
err: errors.New("invalid rules configuration: rule group name must not be empty"),
},
{
name: "with no rules",
cfg: defaultCfg,
input: `
name: rg_name
interval: 15s
`,
status: 400,
err: errors.New("invalid rules config: rule group 'rg_name' has no rules"),
err: errors.New("invalid rules configuration: rule group 'rg_name' has no rules"),
},
{

name: "with federated rules without enabled federation",
cfg: defaultCfg,
status: 400,
input: `
name: test
interval: 15s
source_tenants: [t1, t2]
rules:
- record: up_rule
expr: up{}
- alert: up_alert
expr: sum(up{}) > 1
for: 30s
annotations:
test: test
labels:
test: test
`,
err: errors.New("invalid rules configuration: rule group 'test' is a federated rule group, but rules federation is disabled; please contact your service administrator to have it enabled"),
},
{
name: "with a a valid rules file",
name: "with valid rules with enabled federation",
cfg: cfgWithTenantFederation,
status: 202,
input: `
name: test
interval: 15s
source_tenants: [t1, t2]
rules:
- record: up_rule
expr: up{}
Expand All @@ -431,12 +459,15 @@ rules:
labels:
test: test
`,
output: "name: test\ninterval: 15s\nrules:\n - record: up_rule\n expr: up{}\n - alert: up_alert\n expr: sum(up{}) > 1\n for: 30s\n labels:\n test: test\n annotations:\n test: test\n",
output: "name: test\ninterval: 15s\nsource_tenants: [t1, t2]\nrules:\n - record: up_rule\n expr: up{}\n - alert: up_alert\n expr: sum(up{}) > 1\n for: 30s\n labels:\n test: test\n annotations:\n test: test\n",
},
}

for _, tt := range tc {
t.Run(tt.name, func(t *testing.T) {
r := prepareRuler(t, tt.cfg, newMockRuleStore(make(map[string]rulespb.RuleGroupList)), withStart())
a := NewAPI(r, r.store, log.NewNopLogger())

router := mux.NewRouter()
router.Path("/prometheus/config/v1/rules/{namespace}").Methods("POST").HandlerFunc(a.CreateRuleGroup)
router.Path("/prometheus/config/v1/rules/{namespace}/{groupName}").Methods("GET").HandlerFunc(a.GetRuleGroup)
Expand All @@ -454,7 +485,7 @@ rules:

router.ServeHTTP(w, req)
require.Equal(t, 200, w.Code)
require.Equal(t, tt.output, w.Body.String())
require.YAMLEq(t, tt.output, w.Body.String())
} else {
require.Equal(t, tt.err.Error()+"\n", w.Body.String())
}
Expand Down
13 changes: 9 additions & 4 deletions pkg/ruler/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ func NewDefaultMultiTenantManager(cfg Config, managerFactory ManagerFactory, reg
// It's not safe to call this function concurrently.
func (r *DefaultMultiTenantManager) SyncRuleGroups(ctx context.Context, ruleGroups map[string]rulespb.RuleGroupList) {
if !r.cfg.TenantFederation.Enabled {
RemoveFederatedRuleGroups(ruleGroups)
removeFederatedRuleGroups(ruleGroups, r.logger)
}

// Sync the rules to disk and then update the user's Prometheus Rules Manager.
Expand Down Expand Up @@ -318,19 +318,24 @@ func (r *DefaultMultiTenantManager) Stop() {
r.mapper.cleanup()
}

func (*DefaultMultiTenantManager) ValidateRuleGroup(g rulefmt.RuleGroup) []error {
func (r *DefaultMultiTenantManager) ValidateRuleGroup(g rulefmt.RuleGroup) []error {
var errs []error

if g.Name == "" {
errs = append(errs, errors.New("invalid rules config: rule group name must not be empty"))
errs = append(errs, errors.New("invalid rules configuration: rule group name must not be empty"))
return errs
}

if len(g.Rules) == 0 {
errs = append(errs, fmt.Errorf("invalid rules config: rule group '%s' has no rules", g.Name))
errs = append(errs, fmt.Errorf("invalid rules configuration: rule group '%s' has no rules", g.Name))
return errs
}

if !r.cfg.TenantFederation.Enabled && len(g.SourceTenants) > 0 {
errs = append(errs, fmt.Errorf("invalid rules configuration: rule group '%s' is a federated rule group, "+
"but rules federation is disabled; please contact your service administrator to have it enabled", g.Name))
}

for i, r := range g.Rules {
for _, err := range r.Validate() {
var ruleName string
Expand Down
17 changes: 15 additions & 2 deletions pkg/ruler/tenant_federation.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ import (
"flag"
"time"

"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/prometheus/prometheus/promql"
"github.com/prometheus/prometheus/rules"
"github.com/weaveworks/common/user"
Expand All @@ -20,8 +22,10 @@ type TenantFederationConfig struct {
Enabled bool `yaml:"enabled"`
}

const TenantFederationFlag = "ruler.tenant-federation.enabled"

func (cfg *TenantFederationConfig) RegisterFlags(f *flag.FlagSet) {
f.BoolVar(&cfg.Enabled, "ruler.tenant-federation.enabled", false, "Enable running rule groups against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are already created federated rule groups, then these rules groups will be skipped during evaluations.")
f.BoolVar(&cfg.Enabled, TenantFederationFlag, false, "Enable rule groups to query against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are federated rule groups that already exist, then these rules groups will be skipped during evaluations.")
}

type contextKey int
Expand Down Expand Up @@ -55,11 +59,20 @@ func TenantFederationQueryFunc(regularQueryable, federatedQueryable rules.QueryF
}
}

func RemoveFederatedRuleGroups(groups map[string]rulespb.RuleGroupList) {
func removeFederatedRuleGroups(groups map[string]rulespb.RuleGroupList, logger log.Logger) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't used outside the package. I also checked github code search and couldn't find any usages of it in upstream projects (including GEM)

for userID, groupList := range groups {
amended := make(rulespb.RuleGroupList, 0, len(groupList))
for _, group := range groupList {
if len(group.GetSourceTenants()) > 0 {
level.Warn(logger).Log(
"msg", "skipping federated rule group because rule federation is disabled; "+
"to enable the feature, configure the Mimir or GEM ruler instance with the following parameter "+
"or contact your service administrator: set -"+TenantFederationFlag+"=true as a CLI argument, "+
"ruler.tenant_federation.enabled: true in YAML",
"namespace", group.Namespace,
"group_name", group.Name,
"user", userID,
)
continue
}
amended = append(amended, group)
Expand Down