-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink: Pre-create fieldGetters to avoid constructing them for each row #10565
Conversation
Hi @nastra , the failing test cases seem unrelated to this change. Could you please re-trigger the tests? Thanks! |
@fengjiajie Rerunning the failed tests 👍 |
@@ -79,7 +79,11 @@ public static Object convertConstant(Type type, Object value) { | |||
* the arity check. | |||
*/ | |||
public static RowData clone( | |||
RowData from, RowData reuse, RowType rowType, TypeSerializer[] fieldSerializers) { | |||
RowData from, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create a deprecated version of the original public method to keep compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder! I've added the original function below. The reason I didn't directly use the new function is that it requires FieldGetter for all fields, while the old function only needs to construct FieldGetter for fields with values. Constructing FieldGetter for all fields might negatively impact performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that performance degradation is acceptable for the deprecated methods if it is justified otherwise. The number of the duplicated lines seems high enough for me to try to avoid them.
The users could always do the caching of the getters themselves, or find another solution soon, since we remove the deprecated methods regularly.
WDYT?
Also CC-ed, @stevenzwu, who might be also interested in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
...k/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/reader/RowDataRecordFactory.java
Show resolved
Hide resolved
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/TestHelpers.java
Show resolved
Hide resolved
Thanks for the detailed explanation and the PR @fengjiajie! Left a few small comments. As a general rule we create a PR for a single version. Usually the latest. So if there are any review comments, they need to be fixed only once. |
@pvary Thanks for taking the time to review this! Please let me know if you have any suggestions. |
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me. Let's wait a bit, to see if someone else is interested in reviewing. If there're no more comments, I will merge next Tuesday. (Ping me if I forgot 😄)
Thanks |
Thanks @fengjiajie for the improvement. |
I found that RowDataUtil.clone consumes a significant amount of CPU time by analyzing the flame graph. It constructs a FieldGetter object for each field of every row being processed. This commit optimizes this by constructing all FieldGetter objects upfront, avoiding repeated object creation.