Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruler: validation and logging with disabled tenant federation #4555

Merged
merged 10 commits into from
Mar 27, 2023
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
* [CHANGE] Query-frontend: Cached results now contain timestamp which allows Mimir to check if cached results are still valid based on current TTL configured for tenant. Results cached by previous Mimir version are used until they expire from cache, which can take up to 7 days. If you need to use per-tenant TTL sooner, please flush results cache manually. #4439
* [CHANGE] Ingester: the `cortex_ingester_tsdb_wal_replay_duration_seconds` metrics has been removed. #4465
* [CHANGE] Query-frontend: use protobuf internal query result payload format by default. This feature is no longer considered experimental. #4557
* [CHANGE] Ruler: reject creating federated rule groups while tenant federation is disabled. Previously the rule groups would be silently dropped during bucket sync. #4555
* [FEATURE] Cache: Introduce experimental support for using Redis for results, chunks, index, and metadata caches. #4371
* [FEATURE] Vault: Introduce experimental integration with Vault to fetch secrets used to configure TLS for clients. Server TLS secrets will still be read from a file. `tls-ca-path`, `tls-cert-path` and `tls-key-path` will denote the path in Vault for the following CLI flags when `-vault.enabled` is true: #4446.
* `-distributor.ha-tracker.etcd.*`
Expand Down
2 changes: 1 addition & 1 deletion pkg/mimir/modules.go
Original file line number Diff line number Diff line change
Expand Up @@ -700,7 +700,7 @@ func (t *Mimir) initRuler() (serv services.Service, err error) {

if t.Cfg.Ruler.TenantFederation.Enabled {
if !t.Cfg.TenantFederation.Enabled {
return nil, errors.New("-ruler.tenant-federation.enabled=true requires -tenant-federation.enabled=true")
return nil, errors.New("-" + ruler.TenantFederationFlag + "=true requires -tenant-federation.enabled=true")
}
// Setting bypassForSingleQuerier=false forces `tenantfederation.NewQueryable` to add
// the `__tenant_id__` label on all metrics regardless if they're for a single tenant or multiple tenants.
Expand Down
43 changes: 37 additions & 6 deletions pkg/ruler/api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -376,26 +376,29 @@ func TestRuler_PrometheusAlerts(t *testing.T) {
}

func TestRuler_Create(t *testing.T) {
cfg := defaultRulerConfig(t)
defaultCfg := defaultRulerConfig(t)

r := prepareRuler(t, cfg, newMockRuleStore(make(map[string]rulespb.RuleGroupList)), withStart())
a := NewAPI(r, r.store, log.NewNopLogger())
cfgWithTenantFederation := defaultRulerConfig(t)
cfgWithTenantFederation.TenantFederation.Enabled = true

tc := []struct {
name string
cfg Config
input string
output string
err error
status int
}{
{
name: "with an empty payload",
cfg: defaultCfg,
input: "",
status: 400,
err: errors.New("invalid rules config: rule group name must not be empty"),
},
{
name: "with no rule group name",
cfg: defaultCfg,
input: `
interval: 15s
rules:
Expand All @@ -407,6 +410,7 @@ rules:
},
{
name: "with no rules",
cfg: defaultCfg,
input: `
name: rg_name
interval: 15s
Expand All @@ -415,11 +419,35 @@ interval: 15s
err: errors.New("invalid rules config: rule group 'rg_name' has no rules"),
},
{
name: "with a a valid rules file",

name: "with federated rules without enabled federation",
cfg: defaultCfg,
status: 400,
input: `
name: test
interval: 15s
source_tenants: [t1, t2]
rules:
- record: up_rule
expr: up{}
- alert: up_alert
expr: sum(up{}) > 1
for: 30s
annotations:
test: test
labels:
test: test
`,
err: errors.New("invalid rules config: rule group 'test' is a federated rule group, but rules federation is disabled. To enable the feature, configure the Mimir or GEM ruler instance with the following parameter or contact your service administrator: -ruler.tenant-federation.enabled=true as a CLI argument, ruler.tenant_federation.enabled: true in YAML"),
},
{
name: "with valid rules with enabled federation",
cfg: cfgWithTenantFederation,
status: 202,
input: `
name: test
interval: 15s
source_tenants: [t1, t2]
rules:
- record: up_rule
expr: up{}
Expand All @@ -431,12 +459,15 @@ rules:
labels:
test: test
`,
output: "name: test\ninterval: 15s\nrules:\n - record: up_rule\n expr: up{}\n - alert: up_alert\n expr: sum(up{}) > 1\n for: 30s\n labels:\n test: test\n annotations:\n test: test\n",
output: "name: test\ninterval: 15s\nsource_tenants: [t1, t2]\nrules:\n - record: up_rule\n expr: up{}\n - alert: up_alert\n expr: sum(up{}) > 1\n for: 30s\n labels:\n test: test\n annotations:\n test: test\n",
},
}

for _, tt := range tc {
t.Run(tt.name, func(t *testing.T) {
r := prepareRuler(t, tt.cfg, newMockRuleStore(make(map[string]rulespb.RuleGroupList)), withStart())
a := NewAPI(r, r.store, log.NewNopLogger())

router := mux.NewRouter()
router.Path("/prometheus/config/v1/rules/{namespace}").Methods("POST").HandlerFunc(a.CreateRuleGroup)
router.Path("/prometheus/config/v1/rules/{namespace}/{groupName}").Methods("GET").HandlerFunc(a.GetRuleGroup)
Expand All @@ -454,7 +485,7 @@ rules:

router.ServeHTTP(w, req)
require.Equal(t, 200, w.Code)
require.Equal(t, tt.output, w.Body.String())
require.YAMLEq(t, tt.output, w.Body.String())
} else {
require.Equal(t, tt.err.Error()+"\n", w.Body.String())
}
Expand Down
10 changes: 8 additions & 2 deletions pkg/ruler/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ func NewDefaultMultiTenantManager(cfg Config, managerFactory ManagerFactory, reg
// It's not safe to call this function concurrently.
func (r *DefaultMultiTenantManager) SyncRuleGroups(ctx context.Context, ruleGroups map[string]rulespb.RuleGroupList) {
if !r.cfg.TenantFederation.Enabled {
RemoveFederatedRuleGroups(ruleGroups)
removeFederatedRuleGroups(ruleGroups, r.logger)
}

// Sync the rules to disk and then update the user's Prometheus Rules Manager.
Expand Down Expand Up @@ -318,7 +318,7 @@ func (r *DefaultMultiTenantManager) Stop() {
r.mapper.cleanup()
}

func (*DefaultMultiTenantManager) ValidateRuleGroup(g rulefmt.RuleGroup) []error {
func (r *DefaultMultiTenantManager) ValidateRuleGroup(g rulefmt.RuleGroup) []error {
var errs []error

if g.Name == "" {
Expand All @@ -331,6 +331,12 @@ func (*DefaultMultiTenantManager) ValidateRuleGroup(g rulefmt.RuleGroup) []error
return errs
}

if !r.cfg.TenantFederation.Enabled && len(g.SourceTenants) > 0 {
errs = append(errs, fmt.Errorf("invalid rules config: rule group '%s' is a federated rule group, but rules federation is disabled. "+
"To enable the feature, configure the Mimir or GEM ruler instance with the following parameter or contact your service administrator: "+
"set -"+TenantFederationFlag+"=true as a CLI argument, ruler.tenant_federation.enabled: true in YAML", g.Name))
dimitarvdimitrov marked this conversation as resolved.
Show resolved Hide resolved
}

for i, r := range g.Rules {
for _, err := range r.Validate() {
var ruleName string
Expand Down
9 changes: 7 additions & 2 deletions pkg/ruler/tenant_federation.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ import (
"flag"
"time"

"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/prometheus/prometheus/promql"
"github.com/prometheus/prometheus/rules"
"github.com/weaveworks/common/user"
Expand All @@ -20,8 +22,10 @@ type TenantFederationConfig struct {
Enabled bool `yaml:"enabled"`
}

const TenantFederationFlag = "ruler.tenant-federation.enabled"

func (cfg *TenantFederationConfig) RegisterFlags(f *flag.FlagSet) {
f.BoolVar(&cfg.Enabled, "ruler.tenant-federation.enabled", false, "Enable running rule groups against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are already created federated rule groups, then these rules groups will be skipped during evaluations.")
f.BoolVar(&cfg.Enabled, TenantFederationFlag, false, "Enable running rule groups against multiple tenants. The tenant IDs involved need to be in the rule group's 'source_tenants' field. If this flag is set to 'false' when there are already created federated rule groups, then these rules groups will be skipped during evaluations.")
dimitarvdimitrov marked this conversation as resolved.
Show resolved Hide resolved
}

type contextKey int
Expand Down Expand Up @@ -55,11 +59,12 @@ func TenantFederationQueryFunc(regularQueryable, federatedQueryable rules.QueryF
}
}

func RemoveFederatedRuleGroups(groups map[string]rulespb.RuleGroupList) {
func removeFederatedRuleGroups(groups map[string]rulespb.RuleGroupList, logger log.Logger) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't used outside the package. I also checked github code search and couldn't find any usages of it in upstream projects (including GEM)

for userID, groupList := range groups {
amended := make(rulespb.RuleGroupList, 0, len(groupList))
for _, group := range groupList {
if len(group.GetSourceTenants()) > 0 {
level.Warn(logger).Log("msg", "skipping federated rule group because rule federation is disabled", "namespace", group.Namespace, "group_name", group.Name, "user", userID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

This has the potential to be very very noisy, perhaps we should consider using rate limited logger? Also if it feels more natural to me to indicate back to the operator here that we should include To enable the feature, configure the Mimir or GEM ruler instance with the following parameter.... you had earlier on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that having the config options in the log line is more appropriate.

I'm not sure about the rate limiter. It will also mute the warning for some groups. Let's see how this works in grafana cloud, maybe we can add it afterwards.

continue
}
amended = append(amended, group)
Expand Down