-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow: Add support for TimeType / UUIDType #2739
Conversation
92a551b
to
2eacfed
Compare
2eacfed
to
21af7bc
Compare
@@ -169,6 +172,9 @@ public VectorHolder read(VectorHolder reuse, int numValsToRead) { | |||
case TIMESTAMP_MILLIS: | |||
vectorizedColumnIterator.nextBatchTimestampMillis(vec, typeWidth, nullabilityHolder); | |||
break; | |||
case UUID: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come only UUID was added to this switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIME_MICROS
and TIMESTAMP_MICROS
are evaluated as LONG
: https://github.com/nastra/iceberg/blob/21af7bcd73f450e997d1af085634567a734a16b9/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L240-L255
so both are basically handled in the LONG
part of that switch statement. Only UUID
needs to be handled differently
@@ -240,6 +247,12 @@ private void allocateFieldVector(boolean dictionaryEncodedVector) { | |||
this.readType = ReadType.LONG; | |||
this.typeWidth = (int) BigIntVector.TYPE_WIDTH; | |||
break; | |||
case TIME_MICROS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline do we want to add support for MILLIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Parquet's TimeWriter
only writes micros in https://github.com/nastra/iceberg/blob/50f4ecca7711e69f63589fea828d26230fac8d59/parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetWriter.java#L256, so I think the answer would be that we don't need to handle TIME_MILLIS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into supporting TIME_MILLIS
as a follow-up tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some investigation, supporting TIME_MILLIS
might be a bit more involved. I opened #2755
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @nastra
This is partly fixing #2486 and #2485. I didn't want to include all types as otherwise the PR would become too large. It's been a bit of a pain adding new type support. So I'm planning to refactor the code after this PR is merged in the
arrow
project in order to reduce code duplication and complexity before adding support for more types.