Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeriodLoadRule cannot Remove expired segment #13080

Open
599166320 opened this issue Sep 13, 2022 · 14 comments
Open

PeriodLoadRule cannot Remove expired segment #13080

599166320 opened this issue Sep 13, 2022 · 14 comments

Comments

@599166320
Copy link
Contributor

599166320 commented Sep 13, 2022

Recently, when deploying the cold/hot layered Druid cluster, it was found that a hot node loaded data beyond the time range, resulting in the hot node's storage being full soon. I found the same problem on the Druid forum page, which has not been handled by anyone for a long time. I checked RunRules.java, I feel there is a problem. Periodloadrule will not delete expired data at all, but only delete too many replicants. Does the current implementation of PeriodLoadRule meet expectations?

The following is the current implementation of druid:

//RunRules.run
      for (Rule rule : rules) {
        if (rule.appliesTo(segment, now)) {
          if (
              stats.getGlobalStat(
                  "totalNonPrimaryReplicantsLoaded") >= paramsWithReplicationManager.getCoordinatorDynamicConfig()
                                                                                   .getMaxNonPrimaryReplicantsToLoad()
              && !paramsWithReplicationManager.getReplicationManager().isLoadPrimaryReplicantsOnly()
          ) {
            log.info(
                "Maximum number of non-primary replicants [%d] have been loaded for the current RunRules execution. Only loading primary replicants from here on for this coordinator run cycle.",
                paramsWithReplicationManager.getCoordinatorDynamicConfig().getMaxNonPrimaryReplicantsToLoad()
            );
            paramsWithReplicationManager.getReplicationManager().setLoadPrimaryReplicantsOnly(true);
          }
          stats.accumulate(rule.run(coordinator, paramsWithReplicationManager, segment));
          foundMatchingRule = true;
          break;
        }
      }

Now, I have solved this problem by adding dropallExpireSegments to PeriodLoadRule.java, but I don't know what bad effect it will have.

Here is my implementation:

//RunRules.run
      for (Rule rule : rules) {
        if (rule.appliesTo(segment, now)) {
          if (
              stats.getGlobalStat(
                  "totalNonPrimaryReplicantsLoaded") >= paramsWithReplicationManager.getCoordinatorDynamicConfig()
                                                                                   .getMaxNonPrimaryReplicantsToLoad()
              && !paramsWithReplicationManager.getReplicationManager().isLoadPrimaryReplicantsOnly()
          ) {
            log.info(
                "Maximum number of non-primary replicants [%d] have been loaded for the current RunRules execution. Only loading primary replicants from here on for this coordinator run cycle.",
                paramsWithReplicationManager.getCoordinatorDynamicConfig().getMaxNonPrimaryReplicantsToLoad()
            );
            paramsWithReplicationManager.getReplicationManager().setLoadPrimaryReplicantsOnly(true);
          }
          stats.accumulate(rule.run(coordinator, paramsWithReplicationManager, segment));
          foundMatchingRule = true;
          break;
        }else{
          //Add Delete Logic,Only implement dropAllExpireSegments in PeriodLoadRule
          rule.dropAllExpireSegments(paramsWithReplicationManager,segment);
        }
      }

Affected Version

0.22.0

@gianm
Copy link
Contributor

gianm commented Sep 16, 2022

I believe @AmatyaAvadhanula and @kfaraz were looking into this area of the code too, working to improve the Coordinator balancing behavior. Perhaps they will have some thoughts.

@kfaraz
Copy link
Contributor

kfaraz commented Sep 16, 2022

@599166320 , drop is handled by DropRules like a ForeverDropRule, none of the LoadRules are supposed to have that capability. You typically specify a bunch of rules for each datasource. The coordinator tries to find the first rule which applies to a given segment at a given time and tries to do what that matched rule suggests. If at any point in the lifetime of a segment, it matches with a DropRule, it gets dropped.

So, I think the problem you are facing can be solved by simply having a ForeverDropRule at the end of your retention rule list (default or datasource-specific). Please let us know if this works for you.

@599166320
Copy link
Contributor Author

thank @kfaraz for your reply. If I want the _defaultTier node to keep the hot data of the last 3 days, and the cold node to keep the cold data of the last 7 days, how should my retention rules be configured?
My retention rules are configured as follows:

loadByPeriod P3D _defaultTier
loadByPeriod P7D cold
dropForever

Will this meet my needs?
According to the current implementation of RunRules, the _defaultTier node will retain data for 7 days, and retain data for 4 more days. I think it is unreasonable, what do you think?

@FrankChen021
Copy link
Member

I think @kfaraz 's answer makes sense.

Actually, our doc has pointed out this:

Load rules indicate how many replicas of a segment should exist in a server tier. Please note: If a Load rule is used to retain only data from a certain interval or period, it must be accompanied by a Drop rule. If a Drop rule is not included, data not within the specified interval or period will be retained by the default rule (loadForever).

Although we have doc to state this, I think current default rule(loadForever) is little bit counterintuitive.
Maybe the default rule should be dropForever? But because Druid provides flexible retention rules(both load and drop), chaning the default rule to dropForever might not be so easy.

In constrast to Druid, some other DBs, only provides load rules(not the same concept, but similiar to Druid's load rule), so it makes sense that its default behaviour is a dropForever rule.

@kfaraz
Copy link
Contributor

kfaraz commented Sep 17, 2022

@599166320 , for your case, what you need may look like this:

loadByPeriod P3D _defaultTier=1 replica, cold=1 replica
loadByPeriod P7D cold=1 replica
dropForever
  1. Data upto 3 days old lives on both default and cold tiers
  2. Data more than 3 days old and upto 7 days old lives only on cold tier
  3. older data is dropped
    The only thing I added was the cold tier replicas in the first rule. If you don't have that, cold tier will not have data of the past 3 days. Is this what you intended?

@kfaraz
Copy link
Contributor

kfaraz commented Sep 17, 2022

@FrankChen021 , maybe there is some middle ground here?
The default rule should be loadForever to allow things to function properly right out the box, without having to configure anything.
But when a user explicitly modifies the rules (for a datasource or global), it's only natural that the last rule be a forever rule, either load or drop. If the user has already given it as so, good, otherwise, we tack on a dropForever rule at the end ourselves. (The user wouldn't mind us doing this as they were particular about loading data only upto a certain period/interval.)

Translated to the code, this simply means that if no rule matches, we drop the segment. This is because no rule matching implies that the retention rules have been explicitly modified by the user, otherwise the default loadForever would have matched.

We already have an alert for segments that don't match any rule but the behaviour should still be reconsidered because it is indeed a little counterintuitive.

@599166320
Copy link
Contributor Author

@kfaraz I found that default tiers store data for 7 days. Is there a problem with my configuration?

@kfaraz
Copy link
Contributor

kfaraz commented Sep 17, 2022

@599166320 , could you please share the json of your rules for this datasource?

@599166320
Copy link
Contributor Author

599166320 commented Sep 17, 2022

@kfaraz
You can see it in the Druid forum.

In addition, I found a problem with the following code, and I have fix it. I ran normally for 2 days

	  //RunRules.run
      for (Rule rule : rules) {
        if (rule.appliesTo(segment, now)) {
          //
          if (
              stats.getGlobalStat(
                  "totalNonPrimaryReplicantsLoaded") >= paramsWithReplicationManager.getCoordinatorDynamicConfig()
                                                                                   .getMaxNonPrimaryReplicantsToLoad()
              && !paramsWithReplicationManager.getReplicationManager().isLoadPrimaryReplicantsOnly()
          ) {
            log.info(
                "Maximum number of non-primary replicants [%d] have been loaded for the current RunRules execution. Only loading primary replicants from here on for this coordinator run cycle.",
                paramsWithReplicationManager.getCoordinatorDynamicConfig().getMaxNonPrimaryReplicantsToLoad()
            );
            paramsWithReplicationManager.getReplicationManager().setLoadPrimaryReplicantsOnly(true);
          }
          stats.accumulate(rule.run(coordinator, paramsWithReplicationManager, segment));
          foundMatchingRule = true;
       //Only one rule will be matched, resulting in default tiers expired data not being deleted
          break;
        }
      }


@599166320
Copy link
Contributor Author

@kfaraz Have you checked retention rules? Will retention rules be improved? Or is there no problem now?

@kfaraz
Copy link
Contributor

kfaraz commented Sep 19, 2022

@599166320 , taking a look at the rules now.
Could you also elaborate the fix that you made to the mentioned piece of code?

@kfaraz
Copy link
Contributor

kfaraz commented Sep 19, 2022

@599166320 , the configuration seems to be correct for your required behaviour.

It seems that your cold tier needs to have 2 replicas. Do you have enough historicals in the cold tier?
If you don't, then it is possible that the segments are never able to reach the target replication of 2.
Currently, not being at the target replication level can also block the cleanup from the _default_tier.

If you have a metric emitter configured, you can check these metrics emitted by the coordinator to
be certain:
segment/underReplicated/count
segment/dropped/count

Alternatively, you can also take a look at the logs at try to find logs which say something like:

"Loading in progress for segment [<segmentIdhere>], skipping drop from tier [<tierNameHere>] until loading is complete! "

@599166320
Copy link
Contributor Author

@kfaraz The cold tier of our cluster has enough historicals. The current problem is that we don't want to _default_tier's storage is full too fast.

I will create a PR and let you review it.

@599166320
Copy link
Contributor Author

599166320 commented Oct 29, 2022

Today, another cluster of ours (without any modification) once again experienced the problem that the expired data of the hot node was not deleted, causing the storage space to be quickly used up.

s19ZIAf4Q2

img_v2_5824d26b-6812-4392-a17e-14021172683h

It looks like the storage space of the hot node is almost full, and the cold node still has enough storage space.

The following is the effect after applying LoadDropByPeriod:

image

image

Here are some monitoring:

image

image

After this problem occurred, I quickly emitted the monitoring data to promethues. After observing for two hours, I upgraded the master node and applied LoadDropByPeriod, and the cluster quickly returned to normal. Of course, our cluster also found some problems in the process of applying LoadDropByPeriod, such as the need to limit the speed of deletion.

I still have a few hours of monitoring data on my side that I can share with you if you need it. @kfaraz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants