-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ColumnarBatch to CachedBatch and back #1001
ColumnarBatch to CachedBatch and back #1001
Conversation
Write ColumnarBatch to CachedBatch and Read CachedBatch into ColumnarBatch Sign off empty-commit Signed-off-by: Raza Jafri <rjafri@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get through all of the code there is a lot to cover. I'll try to spend some more time on this soon.
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Outdated
Show resolved
Hide resolved
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Show resolved
Hide resolved
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Show resolved
Hide resolved
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Show resolved
Hide resolved
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Outdated
Show resolved
Hide resolved
...n/scala/org/apache/spark/sql/execution/datasources/parquet/RapidsVectorizedColumnReader.java
Outdated
Show resolved
Hide resolved
Let me clean it up more and add docs. |
Removed RapidsVectorizedColumnReader in favor of Reflection on VectorizedColumnReader
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Show resolved
Hide resolved
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Show resolved
Hide resolved
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Show resolved
Hide resolved
val num = Math.min(capacity.toLong, totalCountLoadedSoFar - rowsReturned).toInt | ||
for (i <- columnReaders.indices) { | ||
if (columnReaders(i) != null) { | ||
val readBatchMethod = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved this up at the class level so we don't have to look it up every time. Update coming soon
build |
build |
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Outdated
Show resolved
Hide resolved
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Outdated
Show resolved
Hide resolved
...310/src/main/scala/com/nvidia/spark/rapids/shims/spark310/ParquetCachedBatchSerializer.scala
Outdated
Show resolved
Hide resolved
build |
Write ColumnarBatch to CachedBatch and Read CachedBatch into ColumnarBatch Sign off empty-commit Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Write ColumnarBatch to CachedBatch and Read CachedBatch into ColumnarBatch Sign off empty-commit Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Write ColumnarBatch to CachedBatch and Read CachedBatch into ColumnarBatch Sign off empty-commit Signed-off-by: Raza Jafri <rjafri@nvidia.com>
…IDIA#1001) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Write ColumnarBatch to CachedBatch and Read CachedBatch into ColumnarBatch
When writing ColumnarBatch to a CachedBatch, I am converting it to a rowIterator and essentially writing it like an InternalRow. A more performant way to do this might be to write the file as Columnar but it can be explored as a follow-on.
Sign off empty-commit
Signed-off-by: Raza Jafri rjafri@nvidia.com