-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5823][RFC-65] RFC for Partition TTL Management #8062
[HUDI-5823][RFC-65] RFC for Partition TTL Management #8062
Conversation
48f8ccd
to
b2bdc0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stream2000 Thanks for putting up an RFC. Left some comments to consider. Can you also share the PR in the dev list thread?
rfc/rfc-65/rfc-65.md
Outdated
|
||
- RelativePath. Relative path to the base path, or we can call it PartitionPath. | ||
- LastModifiedTime. Last instant time that modified the partition. We need this information to support the `KEEP_BY_TIME` policy. | ||
- PartitionTotalFileSize. The sum of all valid data file sizes in the partition, to support the `KEEP_BY_SIZE`policy. We calculate only the latest file slices in the partition currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define valid data file
? Is it just the latest version of each file id i.e. latest file slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we calculate only the latest file slice. If we want to calculate all the file slices instead of just the latest file slice, we can add a config to control the behavior or adding another stat field. Here we choose to calculate only the latest file slice because we think it reveals the real data size of the file group.
rfc/rfc-65/rfc-65.md
Outdated
1. Gather partitions statistics. | ||
2. Apply each TTL policy. For explicit policies, find sub-partitions matched `PartitionSpec`defined in the policy and check if there are expired partitions according to the policy type and size. For default policy, find partitions that do not match any explicit policy and check if they are expired. | ||
3. For all expired partitions, call delete_partition to delete them, which will generate a replace commit and mark all files in those partitions as replaced. For pending clustering and compaction that affect the target partition to delete, [HUDI-5553](https://issues.apache.org/jira/browse/HUDI-5553) introduces a pre-check that will throw an exception, and further improvement could be discussed in [HUDI-5663](https://issues.apache.org/jira/browse/HUDI-5663). | ||
4. Clean then will delete all replaced file groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great if you can also discuss how the read path will work? How do we maintain a consistent filesystem view for readers given that delete partition operation can take time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete_partition has been already implemented in current hudi master branch. When we call delete_partition to delete a list partitions, the executor will list all files for the partitions to delete and store them in the replacecommit commit metadata. After the replace commit committed, all the filesystem views that have seen the replace commit will exclude files that were replaced in the replace commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help me understand how is the orchestration or scheduling of these policies are doe. Is it manual via sql procedure? for eg, how often this clean up job is triggered.
RFC should cover that as well. and definitely, we should avoid adding/storing data in adhoc json files if possible.
If we plan to run this once a day or much lower candence, its not that bad a idea to collect stats on a on-demand basis, clean up and move on rather than storing the stats and ensuring we keep it updated after every commit. It will add some additional overhead to regular writes to ensure the stats are aligned w/ actual data.
rfc/rfc-65/rfc-65.md
Outdated
|
||
1. User hopes to manage partition TTL not only by expired time but also by sub-partitions count and sub-partitions size. So we need to support the following three different TTL policy types. | ||
1. **KEEP_BY_TIME**. Partitions will expire N days after their last modified time. | ||
2. **KEEP_BY_COUNT**. Keep N sub-partitions for a high-level partition. When sub partition count exceeds, delete the partitions with smaller partition values until the sub-partition count meets the policy configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help us understand the use-case here. I mean, I am trying to get an understanding of the sub-partitions here. in hudi, we have only one partitioning, but if could be multi-leveled. so, trying to see, if we can keep it high level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel, both (2) and (3) is very much catered towards multi-field partitioning like an ProductId/datstr based partitioning. can we layout high level strategies for one level partitioning as well in addition to multi-field partitioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to simplify the strategies where in we can achieve it for both single or multi field partitioning. for eg,
TTL any partition whose last mod time (last time when data was added/updated), is > 1 month for eg. this will work for both a single field partitioning (datestr), or multi-field (productId/datestr).
Open to ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also call out that the sub-partitioning might work only for day based or time based sub-partitioning right. for eg, lets say, if partitioning is datestr/productId. how do we know out of 1000 productIds under a given day, which 100 is older or newer (assuming all 1000 was created in same commit).
rfc/rfc-65/rfc-65.md
Outdated
1. **KEEP_BY_TIME**. Partitions will expire N days after their last modified time. | ||
2. **KEEP_BY_COUNT**. Keep N sub-partitions for a high-level partition. When sub partition count exceeds, delete the partitions with smaller partition values until the sub-partition count meets the policy configuration. | ||
3. **KEEP_BY_SIZE**. Similar to KEEP_BY_COUNT, but to ensure that the sum of the data size of all sub-partitions does not exceed the policy configuration. | ||
2. User need to set different policies for different partitions. For example, the hudi table is partitioned by two fields (user_id, ts). For partition(user_id='1'), we set the policy to keep 100G data for all sub-partitions, and for partition(user_id='2') we set the policy that all sub-partitions will expire 10 days after their last modified time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be able to add regex and achieve this.
for eg,
Map<{PartitionRegex/Static Partitions{ -> {TTL policy} >
so, this map can have multiple entries as well.
rfc/rfc-65/rfc-65.md
Outdated
We have three main considerations when designing TTL policy: | ||
|
||
1. User hopes to manage partition TTL not only by expired time but also by sub-partitions count and sub-partitions size. So we need to support the following three different TTL policy types. | ||
1. **KEEP_BY_TIME**. Partitions will expire N days after their last modified time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is last mod time. is it referring to new inserts, or updates as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, inserts/updates will be treated as modification to the partition. And we track them by looking the commit/deltacommit write stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to fetch the info when a commit is archived?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest version of the RFC, we use the max instant time of the committed file slices in the partition as the partition's last modified time for simplicity. Otherwise, we need some extra mechanism to get the last modified time. In our inner version, we maintain an extra JSON file and update it incrementally as new instants committed to get the real modified time for the partition. Also, we can use metadata table to track the last modify time. What do you think about this?
rfc/rfc-65/rfc-65.md
Outdated
We have three main considerations when designing TTL policy: | ||
|
||
1. User hopes to manage partition TTL not only by expired time but also by sub-partitions count and sub-partitions size. So we need to support the following three different TTL policy types. | ||
1. **KEEP_BY_TIME**. Partitions will expire N days after their last modified time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When retiring the old/unused/not-accessed partitions, another approach we are taking internally is:
(a) stash the partitions to be cleaned up in .stashedForDeletion folder (at .hoodie level).
(b) partitions stashed for deletion will wait in the folder for a week (or time dictated by the policy) before actually getting deleted. In cases, where we realize that something has been accidentally deleted (like a bad policy configuration, TTL exclusion not configured etc), we can always move back from the stash to quickly recover from the TTL event.
(c) We shall configure policies for .stashedForDeletion// subfolders to manage for appropriate tiering level (whether to be moved to a warm/cold tier etc)
(d) in addition to the deletePartitions() API, which would stash the folder (instead of deleting) based on the configs, we would need a restore API to move the subfolder/files back to their original location.
(e) Metadata left by the delete operation to be synced with MDT to keep the file listing metadata in sync with the file system. (In cases where replication to a different region is supported, this also would warrant applying the changes on the replicated copies of data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add the stash/restore mechanism to replace commit/clean process of hudi instead of dealing with it in TTL management? TTL management should only decide which partitions are outdated and call delete_partition
to delete them. If we want to retain the deleted data we can add extra mechanism in the delete_parrtition
method.
@stream2000, could we also introduce record ttl management? Partition ttl management and record ttl management both need the ttl policy. |
rfc/rfc-65/rfc-65.md
Outdated
|
||
So here we have the TTL policy definition: | ||
```java | ||
public class HoodiePartitionTTLPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stream2000, could we introduce HoodieTTLPolicy
interface? Then HoodiePartitionTTLPolicy
implements the HoodieTTLPolicy
. HoodieRecordTTLPolicy
could also implement this interface in feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SteNicholas Nice suggestions! I will take it into consideration when implementing the policy and make sure we can integrate more type of TTL policy in the future
@codope @nsivabalan @SteNicholas @nbalajee Sorry for the delayed response, I've been busy with other things these days. Will address comments and update the RFC. |
@SteNicholas It will be nice to have record-level ttl management~ We can open another RFC for record-level TTL management and concentrate on Partition TTL Management in this RFC. We can also discuss more on how to make it easy to integrate other types of ttl policy in this RFC. |
cd94fd5
to
d3144f1
Compare
@nsivabalan @SteNicholas Hi, thanks for your review. I have updated the RFC according to your comments.
And here are the answer to your remaining question.
I think we need to store the policy in hudi somewhere otherwise user needs to store the policy themselves. Think when the user has 1000 partitions and wants to set TTL policies for 500 of them. It's better to store the policy metadata in hudi. Hope for your another round of review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stream2000, thanks for the update of this RFC. I left the comments for the updates.
rfc/rfc-65/rfc-65.md
Outdated
} | ||
|
||
// Partition spec for which the policy takes effect, could be a regex or a static partition value | ||
private String partitionSpec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the partitionSpec
support multiple level partition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it supports regex expressions for partitions and static partition value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stream2000, for example, there is a Hudi table partitioned by date and hour. Meanwhile, the user want to configure ttl with a year. How could user configure this ttl with current policy definition? Sets the policyValue
with a year and partitionSpec
with */*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
rfc/rfc-65/rfc-65.md
Outdated
check if there are expired partitions according to the policy type and size. For default policy, find partitions that | ||
do not match any explicit policy and check if they are expired. | ||
3. For all expired partitions, call delete_partition to delete them, which will generate a replace commit and mark all | ||
files in those partitions as replaced. For pending clustering and compaction that affect the target partition to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is pending clustering and compaction, the ttl policy execution should clean and delete the expired file groups after clustering and compaction. The user doesn't need do any behavior for pending clustering and compaction case.
@stream2000, do you have any updates? |
Partition TTL can cover most scenarios, will bring a lot of convenience. |
Can this feature be supported in advance? Especially hope for the next version after 0.14. |
I hope so, too! Will work on this feature in the few weeks and push a POC code to my personal repo ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geserdugarov Thanks very much for your review! I think the most important part of the design to be confirmed is whether we need to provide a feature-rich but complicated TTL policy in the first place or just implement a simple but extensible policy.
@geserdugarov @codope @nsivabalan @vinothchandar Hope for your opinion for this ~
rfc/rfc-65/rfc-65.md
Outdated
The TTL strategies is similar to existing table service strategies. We can define TTL strategies like defining a clustering/clean/compaction strategy: | ||
|
||
```properties | ||
hoodie.partition.ttl.management.strategy=KEEP_BY_TIME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change the word TTL
to lifecycle
if we want to support other types of data expiration policies. In current version this RFC, we do not plan to support other types of policies like KEEP_BY_SIZE
since they rely on the condition that the partition values are comparable which is not always true. However we do have some users who have this kind of requirement so we need to make the policy extensible.
rfc/rfc-65/rfc-65.md
Outdated
} | ||
``` | ||
|
||
Users can provide their own implementation of `PartitionTTLManagementStrategy` and hudi will help delete the expired partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review~
it's just providing TTL functionality by custom implementation of PartitionTTLManagementStrategy is not user friendly.
For common users that want to do TTL management, they just need to add some write configs to their ingestion job. In the simplest situation they may just add the following config to enable async ttl management:
hoodie.partition.ttl.days.retain=10
hoodie.ttl.management.async.enabled=true
The TTL policy will default to KEEP_BY_TIME
and we can automatically detect expired partitions by their last modified time and delete them.
However, this simple policy may not suit for all users, they may need to implement their own partition TTL policy so we provide an abstraction PartitionTTLManagementStrategy
that returns expired partitions as users need and hudi will help delete them.
This is a main scope for TTL, and we shouldn't allow to have more flexibility.
Customized implementation of PartitionTTLManagementStrategy will allow to do anything with partitions. It still could be PartitionManagementStrategy, but then we shouldn't named it with TTL part.
Customized implementation of PartitionTTLManagementStrategy will only provide a list of expired partitions, and hudi will help delete them. We can rename the word PartitionTTLManagementStrategy
to PartitionLifecycleManagementStrategy
since we do not always delete the partitions by time.
rfc/rfc-65/rfc-65.md
Outdated
The `KeepByTimePartitionTTLManagementStrategy` will calculate the `lastModifiedTime` for each input partitions. If duration between now and 'lastModifiedTime' for the partition is larger than what `hoodie.partition.ttl.days.retain` configured, `KeepByTimePartitionTTLManagementStrategy` will mark this partition as an expired partition. We use day as the unit of expired time since it is very common-used for datalakes. Open to ideas for this. | ||
|
||
we will to use the largest commit time of committed file groups in the partition as the partition's | ||
`lastModifiedTime`. So any write (including normal DMLs, clustering etc.) with larger instant time will change the partition's `lastModifiedTime`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should always think twice before making format changes. We have considered using HoodiePartitionMetadata
to provide the lastModifiedTime
, however, the current implementation of partition metadata does not support transactions, and will never be modified since the first creation. Not sure it is a good choice for maintaining the lastModifiedTime
rfc/rfc-65/rfc-65.md
Outdated
|
||
For the first version of TTL management, we do not plan to implement a complicated strategy (For example, use an array to store strategies, introduce partition regex etc.). Instead, we add a new abstract method `getPartitionPathsForTTLManagement` in `PartitionTTLManagementStrategy` and provides a new config `hoodie.partition.ttl.management.partition.selected`. | ||
|
||
If `hoodie.partition.ttl.management.partition.selected` is set, `getPartitionPathsForTTLManagement` will return partitions provided by this config. If not, `getPartitionPathsForTTLManagement` will return all partitions in the hudi table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be a write config, and we will only apply TTL policy to the partition specified by this config. If this config is not specified, we will apply ttl policy to all the partitions in hudi table.
|
||
TTL management strategies will only be applied for partitions return by `getPartitionPathsForTTLManagement`. | ||
|
||
Thus, if users want to apply different strategies for different partitions, they can do the TTL management multiple times, selecting different partitions and apply corresponding strategies. And we may provide a batch interface in the future to simplify this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I offer to allow user set only spec and TTL values with support of add, show, and delete operations.
So, if I want to implement 6 partition policies, then I should prepare 6 PartitionTTLManagementStrategys and call them 6 times?
We can implement the advanced TTL management policy in the future to provide the ability to set different policies for different partitions. To achieve this, we can introduce a new hoodie.partition.ttl.management.strategy
and implement the logic you mentioned. For most users, providing a simple policy that TTL any partition whose last mod time (last time when data was added/updated), is greater than the TTL time specified will be enough.
@geserdugarov @nsivabalan What do you think about this?
|
||
### Apply different strategies for different partitions | ||
|
||
For some specific users, they may want to apply different strategies for different partitions. For example, they may have multi partition fileds(productId, day). For partitions under `product=1` they want to keep for 30 days while for partitions under `product=2` they want to keep for 7 days only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can implement two independent ttl policies, one is simple while another is advanced as I commented below.
Hi community, I have implemeted a poc version of partition lifecycle management in this PR: #9723. This PR provide a simple |
|
||
In some classic hudi use cases, users partition hudi data by time and are only interested in data from a recent period | ||
of time. The outdated data is useless and costly, we need a lifecycle management mechanism to prevent the | ||
dataset from growing infinitely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess you are talking about GDPR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: this is very different, the proposal for this doc only yields target partitons, it does not remove the whole partition.
rfc/rfc-65/rfc-65.md
Outdated
|
||
```properties | ||
hoodie.partition.lifecycle.management.strategy=KEEP_BY_TIME | ||
hoodie.partition.lifecycle.management.strategy.class=org.apache.hudi.table.action.lifecycle.strategy.KeepByTimePartitionLifecycleManagementStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hoodie.partition.lifecycle.management.strategy
-> hoodie.partition.ttl.strategy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the word ttl
means that we can't apply time unrelative partition deletion strategy in the future. However it is more suitable for current implementation since we only provide a time based strategy. What do you think? Should we use ttl directly or remain some extensibility?
rfc/rfc-65/rfc-65.md
Outdated
|
||
| Config key | Remarks | Default | | ||
|---------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|---------| | ||
| hoodie.partition.lifecycle.management.async.enabled | Enable running of lifecycle management service, asynchronously as writes happen on the table. | False | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHould we handle the cleaning duty just to the cleaner service? Like we do to the delete partition
cmds, you just generate a delete partiton commit metadata on the timeline and the cleaner would take care of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I really care about is the conflict resolution for the operation, does OCC applies here? Do we have some support for the basic concurrent modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will use delete partiton
command to delete the expired partitions, this is mentioned in Background
section. And conflict resolution for the operation is the same with what we do in delete partiton
command.
Will version 1.0 support this feature? This feature is greatly needed. |
932376f
to
409fbdd
Compare
For 1.0.0 and later hudi version which supports efficient completion time queries on the timeline(#9565), we can get partition's @danny0405 Added new I have addressed all comments according to online/offline discussions. If there is no other concern, we can move on this~ |
rfc/rfc-65/rfc-65.md
Outdated
|
||
```properties | ||
hoodie.partition.ttl.management.strategy=KEEP_BY_TIME | ||
hoodie.partition.ttl.management.strategy.class=org.apache.hudi.table.action.ttl.strategy.KeepByTimePartitionTTLManagementStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hoodie.partition.ttl.strategy.class ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure. Will change it later.
* | ||
* @return Expired partition paths. | ||
*/ | ||
public abstract List<String> getExpiredPartitionPaths(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pass around basic info like table path or meta client so that user can customize more easily? For example, how can a customized class knows the table path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see poc code: https://github.com/apache/hudi/pull/9723/files#diff-854a305c2cf1bbe8545bbe78866f53e84a489b60390ea1cfc7f64ed61165a0aaR33
Will provide a constructor like:
public PartitionTTLManagementStrategy(HoodieTable hoodieTable) {
this.writeConfig = hoodieTable.getConfig();
this.hoodieTable = hoodieTable;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then make it clear in the pseudocode code.
rfc/rfc-65/rfc-65.md
Outdated
|
||
For file groups generated by replace commit, it may not reveal the real insert/update time for the file group. However, we can assume that we won't do clustering for a partition without new writes for a long time when using the strategy. And in the future, we may introduce a more accurate mechanism to get `lastModifiedTime` of a partition, for example using metadata table. | ||
|
||
For 1.0.0 and later hudi version which supports efficient completion time queries on the timeline(#9565), we can get partition's `lastModifiedTime` by scanning the timeline and get the last write commit for the partition. Also for efficiency, we can store the partitions' last modified time and current completion time in the replace commit metadata. The next time we need to calculate the partitions' last modified time, we can build incrementally from the replace commit metadata of the last ttl management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will only support the feature in 1.0 right? Before 1.0, we do not have good manner to know when a log file is committed in a file slice and that might be a snag when calcurating the lastCommitTime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. We should support this feature in 1.0 only to have an accurate lastCommitTime
.
} else { | ||
// Return All partition paths | ||
return FSUtils.getAllPartitionPaths(hoodieTable.getContext(), config.getMetadataConfig(), config.getBasePath()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the hoodieTable.
got instantiated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the constructor of PartitionTTLManagementStrategy
like
protected final HoodieTable hoodieTable;
protected final HoodieWriteConfig writeConfig;
public PartitionTTLManagementStrategy(HoodieTable hoodieTable) {
this.writeConfig = hoodieTable.getConfig();
this.hoodieTable = hoodieTable;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the pseudocode clear.
Yes, you're right about the main difference in our ideas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danny0405 @geserdugarov Thanks for your review! I have addressed all comments, hope for your opinions.
} else { | ||
// Return All partition paths | ||
return FSUtils.getAllPartitionPaths(hoodieTable.getContext(), config.getMetadataConfig(), config.getBasePath()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the constructor of PartitionTTLManagementStrategy
like
protected final HoodieTable hoodieTable;
protected final HoodieWriteConfig writeConfig;
public PartitionTTLManagementStrategy(HoodieTable hoodieTable) {
this.writeConfig = hoodieTable.getConfig();
this.hoodieTable = hoodieTable;
}
rfc/rfc-65/rfc-65.md
Outdated
The TTL strategies is similar to existing table service strategies. We can define TTL strategies like defining a clustering/clean/compaction strategy: | ||
|
||
```properties | ||
hoodie.partition.ttl.management.strategy=KEEP_BY_TIME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to provide the hoodie.partition.ttl.strategy
abtraction so that users can implement their own strategies and reuse the infrastructures provided by RFC-65 to delete expired partitions. For example, user wants to add a new strategy that delete partitions by their specified logic.
In current design, we will implement KEEP_BY_TIME
only. However, it doesn't mean that we don't need any other kind of ttl strategy in the future.
rfc/rfc-65/rfc-65.md
Outdated
|
||
```properties | ||
hoodie.partition.ttl.management.strategy=KEEP_BY_TIME | ||
hoodie.partition.ttl.management.strategy.class=org.apache.hudi.table.action.ttl.strategy.KeepByTimePartitionTTLManagementStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure. Will change it later.
* | ||
* @return Expired partition paths. | ||
*/ | ||
public abstract List<String> getExpiredPartitionPaths(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see poc code: https://github.com/apache/hudi/pull/9723/files#diff-854a305c2cf1bbe8545bbe78866f53e84a489b60390ea1cfc7f64ed61165a0aaR33
Will provide a constructor like:
public PartitionTTLManagementStrategy(HoodieTable hoodieTable) {
this.writeConfig = hoodieTable.getConfig();
this.hoodieTable = hoodieTable;
}
rfc/rfc-65/rfc-65.md
Outdated
} | ||
``` | ||
|
||
Users can provide their own implementation of `PartitionTTLManagementStrategy` and hudi will help delete the expired partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't see any other strategies except KEEP_BY_TIME for partition level TTL.
size: not applicable, could be trigger for lower levels
We need to make the design extensible and help users with their own expire logic easy to use. I think it's better to have a strategy-level abstraction.
rfc/rfc-65/rfc-65.md
Outdated
The `KeepByTimePartitionTTLManagementStrategy` will calculate the `lastModifiedTime` for each input partitions. If duration between now and 'lastModifiedTime' for the partition is larger than what `hoodie.partition.ttl.days.retain` configured, `KeepByTimePartitionTTLManagementStrategy` will mark this partition as an expired partition. We use day as the unit of expired time since it is very common-used for datalakes. Open to ideas for this. | ||
|
||
we will to use the largest commit time of committed file groups in the partition as the partition's | ||
`lastModifiedTime`. So any write (including normal DMLs, clustering etc.) with larger instant time will change the partition's `lastModifiedTime`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, leverage .hoodie_partition_metadata
will bring format change, and it doesn't support any kind of transaction currently. As discussed with @danny0405 , In 1.0.0 and later version which supports efficient completion time queries on the timeline(#9565), we will have a more elegant way to get the lastCommitTime
.
You can see the the updated RFC for details.
rfc/rfc-65/rfc-65.md
Outdated
|
||
For file groups generated by replace commit, it may not reveal the real insert/update time for the file group. However, we can assume that we won't do clustering for a partition without new writes for a long time when using the strategy. And in the future, we may introduce a more accurate mechanism to get `lastModifiedTime` of a partition, for example using metadata table. | ||
|
||
For 1.0.0 and later hudi version which supports efficient completion time queries on the timeline(#9565), we can get partition's `lastModifiedTime` by scanning the timeline and get the last write commit for the partition. Also for efficiency, we can store the partitions' last modified time and current completion time in the replace commit metadata. The next time we need to calculate the partitions' last modified time, we can build incrementally from the replace commit metadata of the last ttl management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. We should support this feature in 1.0 only to have an accurate lastCommitTime
.
|
||
### Apply different strategies for different partitions | ||
|
||
For some specific users, they may want to apply different strategies for different partitions. For example, they may have multi partition fileds(productId, day). For partitions under `product=1` they want to keep for 30 days while for partitions under `product=2` they want to keep for 7 days only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can fire an RFC about your design of the wildcard spec version of TTL strategy, make it clear about how user should use this kind of strategy, the data structure of the strategy, how to store them(and problems about it), etc. We can have a discussion about issues of this kind strategy in that RFC.
And implement of this advanced strategy can be added in the future by implementing a new kind of ttl strategy.
I prefer to provide a simple strategy that we can configure with a few simple and easy-to-understand write configs. @danny0405 What do you think?
rfc/rfc-65/rfc-65.md
Outdated
|
||
For the first version of TTL management, we do not plan to implement a complicated strategy (For example, use an array to store strategies, introduce partition regex etc.). Instead, we add a new abstract method `getPartitionPathsForTTLManagement` in `PartitionTTLManagementStrategy` and provides a new config `hoodie.partition.ttl.management.partition.selected`. | ||
|
||
If `hoodie.partition.ttl.management.partition.selected` is set, `getPartitionPathsForTTLManagement` will return partitions provided by this config. If not, `getPartitionPathsForTTLManagement` will return all partitions in the hudi table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this hoodie.partition.ttl.management.partition.selected
after we implement the advance strategy in the future.
|
||
TTL management strategies will only be applied for partitions return by `getPartitionPathsForTTLManagement`. | ||
|
||
Thus, if users want to apply different strategies for different partitions, they can do the TTL management multiple times, selecting different partitions and apply corresponding strategies. And we may provide a batch interface in the future to simplify this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can fire an RFC about this, and we can have a discussion about your design in detail. And it's not conflict with the design in RFC-65.
@stream2000 Thank you for your reply. As you mentioned, I've wrote all my ideas in a separate RFC to simplify details discussion: #10248 |
@geserdugarov I'm gonna merge this one first, can you fire your suggestion chagne PR incrementally based on this one? And we can move the discussions there. |
@danny0405 Ok, I will work on it. |
Change Logs
RFC-65 Partition Lifecycle Management
Impact
None
Risk level (write none, low medium or high below)
Low
Documentation Update
None
Contributor's checklist