-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Delta Lake 2.4.0 [databricks] #8573
Conversation
build |
Moving this back to draft while I audit differences to Delta Lake implementation |
for k in "executionTimeMs", "numOutputBytes", "rewriteTimeMs", "scanTimeMs", \ | ||
"numRemovedBytes", "numAddedBytes", "numTargetBytesAdded", "numTargetBytesInserted", \ | ||
"numTargetBytesUpdated", "numTargetBytesRemoved", \ | ||
'numTargetRowsMatchedUpdated', \ | ||
'numTargetRowsMatchedDeleted', \ | ||
'numTargetRowsNotMatchedBySourceUpdated', \ | ||
'numTargetRowsNotMatchedBySourceDeleted': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for easier maintenance in the future, I would create a separate variable with the list of metrics one per line,
metrics_to_remove = [
'executionTimeMs',
'numOutputBytes',
...
]
nit: I'd stick to some consistent use of quotes or double-quotes in a single code fragment at least
…lta/delta24x/DeleteCommandMeta.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com>
…/rapids/delta24x/GpuMergeIntoCommand.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com>
...lake/delta-24x/src/main/scala/com/nvidia/spark/rapids/delta/delta24x/DeleteCommandMeta.scala
Outdated
Show resolved
Hide resolved
build |
1 similar comment
build |
build |
...lake/delta-24x/src/main/scala/com/nvidia/spark/rapids/delta/delta24x/DeleteCommandMeta.scala
Outdated
Show resolved
Hide resolved
build |
…lta/delta24x/DeleteCommandMeta.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com>
build |
build |
Integration tests are failing on Databricks 12.2 due to GC pauses causing timeouts. I can reproduce manually by running delta_lake_merge_test.py. Here are examples of warnings that I see:
|
We had problems in the past due to aggressive caching of Delta Log snapshots. Wonder if the same thing or something similar is happening. We should be specifying |
It looks like I went too far when refactoring |
build |
Part of #8547
Closes #8556
This PR adds shims for Delta Lake 2.4.0, which is only compatible with Spark 3.4.x
The approach follows the current approach for Delta Lake shims, where each version is largely the same code, with minor differences. This makes it easier to diff against the Delta Lake project.
I started by copying the 22x shims as the basis for 24x, then performed a diff with the Delta Lake 2.4 source and pulled in some changes.
The scope of this PR is to get current Delta Lake tests passing with 2.4, and does not aim to support all new features in Delta Lake 2.4 on the GPU.
Delta Lake 2.4.0 release notes: https://github.com/delta-io/delta/releases/tag/v2.4.0
Follow-on issues: