Skip to content

Commit

Permalink
add new lastModifiedTime calculation method for 1.0.0 and later hud…
Browse files Browse the repository at this point in the history
…i version
  • Loading branch information
stream2000 committed Nov 30, 2023
1 parent 17d36c4 commit 409fbdd
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion rfc/rfc-65/rfc-65.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ dataset from growing infinitely.
This proposal introduces Partition TTL Management strategies to hudi, people can config the strategies by table config
directly or by call commands. With proper configs set, Hudi can find out which partitions are outdated and delete them.


This proposal introduces Partition TTL Management service to hudi. TTL management is like other table services such as Clean/Compaction/Clustering.
The user can config their ttl strategies through write configs and Hudi will help users find expired partitions and delete them automatically.

Expand Down Expand Up @@ -77,6 +76,8 @@ we will to use the largest commit time of committed file groups in the partition

For file groups generated by replace commit, it may not reveal the real insert/update time for the file group. However, we can assume that we won't do clustering for a partition without new writes for a long time when using the strategy. And in the future, we may introduce a more accurate mechanism to get `lastModifiedTime` of a partition, for example using metadata table.

For 1.0.0 and later hudi version which supports efficient completion time queries on the timeline(#9565), we can get partition's `lastModifiedTime` by scanning the timeline and get the last write commit for the partition. Also for efficiency, we can store the partitions' last modified time and current completion time in the replace commit metadata. The next time we need to calculate the partitions' last modified time, we can build incrementally from the replace commit metadata of the last ttl management.

### Apply different strategies for different partitions

For some specific users, they may want to apply different strategies for different partitions. For example, they may have multi partition fileds(productId, day). For partitions under `product=1` they want to keep for 30 days while for partitions under `product=2` they want to keep for 7 days only.
Expand Down

0 comments on commit 409fbdd

Please sign in to comment.