-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PeriodLoadRule cannot Remove expired segment #13080
Comments
I believe @AmatyaAvadhanula and @kfaraz were looking into this area of the code too, working to improve the Coordinator balancing behavior. Perhaps they will have some thoughts. |
@599166320 , drop is handled by So, I think the problem you are facing can be solved by simply having a |
thank @kfaraz for your reply. If I want the _defaultTier node to keep the hot data of the last 3 days, and the cold node to keep the cold data of the last 7 days, how should my retention rules be configured?
Will this meet my needs? |
I think @kfaraz 's answer makes sense. Actually, our doc has pointed out this:
Although we have doc to state this, I think current default rule(loadForever) is little bit counterintuitive. In constrast to Druid, some other DBs, only provides load rules(not the same concept, but similiar to Druid's load rule), so it makes sense that its default behaviour is a |
@599166320 , for your case, what you need may look like this:
|
@FrankChen021 , maybe there is some middle ground here? Translated to the code, this simply means that if no rule matches, we drop the segment. This is because no rule matching implies that the retention rules have been explicitly modified by the user, otherwise the default We already have an alert for segments that don't match any rule but the behaviour should still be reconsidered because it is indeed a little counterintuitive. |
@kfaraz I found that default tiers store data for 7 days. Is there a problem with my configuration? |
@599166320 , could you please share the json of your rules for this datasource? |
@kfaraz In addition, I found a problem with the following code, and I have fix it. I ran normally for 2 days
|
@kfaraz Have you checked |
@599166320 , taking a look at the rules now. |
@599166320 , the configuration seems to be correct for your required behaviour. It seems that your If you have a metric emitter configured, you can check these metrics emitted by the coordinator to Alternatively, you can also take a look at the logs at try to find logs which say something like:
|
@kfaraz The cold tier of our cluster has enough historicals. The current problem is that we don't want to _default_tier's storage is full too fast. I will create a PR and let you review it. |
Today, another cluster of ours (without any modification) once again experienced the problem that the expired data of the hot node was not deleted, causing the storage space to be quickly used up. It looks like the storage space of the hot node is almost full, and the cold node still has enough storage space. The following is the effect after applying Here are some monitoring: After this problem occurred, I quickly emitted the monitoring data to promethues. After observing for two hours, I upgraded the master node and applied I still have a few hours of monitoring data on my side that I can share with you if you need it. @kfaraz |
Recently, when deploying the cold/hot layered Druid cluster, it was found that a hot node loaded data beyond the time range, resulting in the hot node's storage being full soon. I found the same problem on the Druid forum page, which has not been handled by anyone for a long time. I checked
RunRules.java
, I feel there is a problem.Periodloadrule
will not delete expired data at all, but only delete too many replicants. Does the current implementation ofPeriodLoadRule
meet expectations?The following is the current implementation of druid:
Now, I have solved this problem by adding
dropallExpireSegments
toPeriodLoadRule.java
, but I don't know what bad effect it will have.Here is my implementation:
Affected Version
0.22.0
The text was updated successfully, but these errors were encountered: