Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink: Pre-create fieldGetters to avoid constructing them for each row #10565

Merged
merged 4 commits into from
Jul 10, 2024

Conversation

fengjiajie
Copy link
Contributor

@fengjiajie fengjiajie commented Jun 25, 2024

I found that RowDataUtil.clone consumes a significant amount of CPU time by analyzing the flame graph. It constructs a FieldGetter object for each field of every row being processed. This commit optimizes this by constructing all FieldGetter objects upfront, avoiding repeated object creation.

@github-actions github-actions bot added the flink label Jun 25, 2024
@fengjiajie
Copy link
Contributor Author

In my case, the average task execution time decreased from 344 seconds to 328 seconds.

reader

@fengjiajie
Copy link
Contributor Author

Hi @nastra , the failing test cases seem unrelated to this change. Could you please re-trigger the tests? Thanks!

@Fokko
Copy link
Contributor

Fokko commented Jul 3, 2024

@fengjiajie Rerunning the failed tests 👍

@@ -79,7 +79,11 @@ public static Object convertConstant(Type type, Object value) {
* the arity check.
*/
public static RowData clone(
RowData from, RowData reuse, RowType rowType, TypeSerializer[] fieldSerializers) {
RowData from,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a deprecated version of the original public method to keep compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder! I've added the original function below. The reason I didn't directly use the new function is that it requires FieldGetter for all fields, while the old function only needs to construct FieldGetter for fields with values. Constructing FieldGetter for all fields might negatively impact performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that performance degradation is acceptable for the deprecated methods if it is justified otherwise. The number of the duplicated lines seems high enough for me to try to avoid them.

The users could always do the caching of the getters themselves, or find another solution soon, since we remove the deprecated methods regularly.

WDYT?

Also CC-ed, @stevenzwu, who might be also interested in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@pvary
Copy link
Contributor

pvary commented Jul 3, 2024

Thanks for the detailed explanation and the PR @fengjiajie!

Left a few small comments.

As a general rule we create a PR for a single version. Usually the latest. So if there are any review comments, they need to be fixed only once.
Once the first PR is merged, we create a backport PR for the other supported versions.

@fengjiajie
Copy link
Contributor Author

Thanks for the detailed explanation and the PR @fengjiajie!

Left a few small comments.

As a general rule we create a PR for a single version. Usually the latest. So if there are any review comments, they need to be fixed only once. Once the first PR is merged, we create a backport PR for the other supported versions.

@pvary Thanks for taking the time to review this! Please let me know if you have any suggestions.

@fengjiajie fengjiajie requested a review from pvary July 4, 2024 04:12
Copy link
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me. Let's wait a bit, to see if someone else is interested in reviewing. If there're no more comments, I will merge next Tuesday. (Ping me if I forgot 😄)

@fengjiajie
Copy link
Contributor Author

+1 from me. Let's wait a bit, to see if someone else is interested in reviewing. If there're no more comments, I will merge next Tuesday. (Ping me if I forgot 😄)

Thanks

@pvary pvary merged commit 05923d6 into apache:main Jul 10, 2024
19 checks passed
@pvary
Copy link
Contributor

pvary commented Jul 10, 2024

Thanks @fengjiajie for the improvement.
Could you please create a backport PR to the other Flink versions?

fengjiajie added a commit to fengjiajie/iceberg that referenced this pull request Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants