Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unapply on PromotePrecision [databricks] #4583

Merged
merged 2 commits into from
Jan 21, 2022

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Jan 20, 2022

Fixes #4581. DB9.1 added an extra argument to PromotePrecision which is not critical to our usage, so this changes the code to avoid using unapply to side-step the incompatibility between DB9.1 and other Spark versions.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added the build Related to CI / CD or cleanly building label Jan 20, 2022
@jlowe jlowe added this to the Jan 10 - Jan 28 milestone Jan 20, 2022
@jlowe jlowe self-assigned this Jan 20, 2022
@jlowe
Copy link
Member Author

jlowe commented Jan 20, 2022

build

tgravescs
tgravescs previously approved these changes Jan 20, 2022
revans2
revans2 previously approved these changes Jan 20, 2022
@jlowe
Copy link
Member Author

jlowe commented Jan 20, 2022

Looks like an abrupt connection loss on the Databricks instance. Rekicking.

@jlowe
Copy link
Member Author

jlowe commented Jan 20, 2022

build

@revans2
Copy link
Collaborator

revans2 commented Jan 20, 2022

looks like the databricks test timed out in some way.

....compute.amazonaws.com closed by remote host.

@pxLi
Copy link
Collaborator

pxLi commented Jan 21, 2022

build

@pxLi
Copy link
Collaborator

pxLi commented Jan 21, 2022

increased timeout did not help. looks like the parallel test on DB 9.1 could stuck there for a long time

[2022-01-21T03:41:22.324Z] [gw1] [ 99%] PASSED ../../src/main/python/window_function_test.py::test_multi_types_window_aggs_for_rows[partAndOrderBy:Timestamp-String][IGNORE_ORDER({'local': True}), APPROXIMATE_FLOAT] 

[2022-01-21T03:41:23.248Z] ../../src/main/python/window_function_test.py::test_multi_types_window_aggs_for_rows[partAndOrderBy:Decimal(18,1)-String][IGNORE_ORDER({'local': True}), APPROXIMATE_FLOAT] 

[2022-01-21T03:41:23.249Z] [gw1] [ 99%] PASSED ../../src/main/python/window_function_test.py::test_multi_types_window_aggs_for_rows[partAndOrderBy:Decimal(18,1)-String][IGNORE_ORDER({'local': True}), APPROXIMATE_FLOAT] 

[2022-01-21T03:41:23.249Z] ../../src/main/python/window_function_test.py::test_window_aggs_lead_ignore_nulls_fallback[partBy:Long-orderBy:Long-orderBy:LongRange(not_null)-agg:Byte][IGNORE_ORDER({'local': True}), ALLOW_NON_GPU(WindowExec,Alias,WindowExpression,Lead,Literal,WindowSpecDefinition,SpecifiedWindowFrame)] 

[2022-01-21T03:41:23.249Z] [gw1] [ 99%] SKIPPED ../../src/main/python/window_function_test.py::test_window_aggs_lead_ignore_nulls_fallback[partBy:Long-orderBy:Long-orderBy:LongRange(not_null)-agg:Byte][IGNORE_ORDER({'local': True}), ALLOW_NON_GPU(WindowExec,Alias,WindowExpression,Lead,Literal,WindowSpecDefinition,SpecifiedWindowFrame)] 

[2022-01-21T03:41:23.249Z] ../../src/main/python/window_function_test.py::test_window_aggs_lead_ignore_nulls_fallback[partBy:Long-orderBy:Long-orderBy:LongRange(not_null)-agg:Short][IGNORE_ORDER({'local': True}), ALLOW_NON_GPU(WindowExec,Alias,WindowExpression,Lead,Literal,WindowSpecDefinition,SpecifiedWindowFrame)]

time now is 2022-01-21T04:25:02 comparing to 2022-01-21T03:41:23.249Z

this reminds me about an elder issue #1811, not sure what else did they change to 9.1 runtime except function signature

@pxLi
Copy link
Collaborator

pxLi commented Jan 21, 2022

for an internal test, per the test log

Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
...
...
executor-heartbeater WARN HangingTaskDetector: Doing a full thread dump to debug potential hanging tasks.
22/01/21 07:43:53.734 dispatcher-event-loop-3 WARN TaskSetManager: Stage 12306 contains a task of very large size (3000 KiB). The maximum recommended task size is 1000 KiB.
22/01/21 07:43:56.819 Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:43:56.819 Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:43:56.819 Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:43:56.819 Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:00.360 Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:00.361 Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:00.361 Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:00.361 Thread-27 WARN RunningWindowFunctionExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:07.852 Thread-27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:07.853 Thread-27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:07.853 Thread-27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:07.853 Thread-27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:07.967 Thread-27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:07.967 Thread-27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:07.967 Thread-27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:07.967 Thread-27 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
22/01/21 07:44:08.270 dispatcher-event-loop-0 WARN TaskSetManager: Stage 12308 contains a task of very large size (3000 KiB). The maximum recommended task size is 1000 KiB.
22/01/21 07:44:38.206 dispatcher-event-loop-1 WARN TaskSetManager: Stage 12310 contains a task of very large size (5012 KiB). The maximum recommended task size is 1000 KiB.
22/01/21 07:45:01.808 dispatcher-event-loop-3 WARN TaskSetManager: Stage 12312 contains a task of very large size (5012 KiB). The maximum recommended task size is 1000 KiB.
22/01/21 07:45:27.189 dispatcher-event-loop-2 WARN TaskSetManager: Stage 12314 contains a task of very large size (3411 KiB). The maximum recommended task size is 1000 KiB.
22/01/21 07:45:49.333 dispatcher-event-loop-2 WARN TaskSetManager: Stage 12316 contains a task of very large size (3411 KiB). The maximum recommended task size is 1000 KiB.
22/01/21 07:45:55.100 executor-heartbeater WARN HangingTaskDetector: Task 85937 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 7267, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:35:45 UTC 2022
22/01/21 07:45:55.101 executor-heartbeater WARN HangingTaskDetector: Doing a full thread dump to debug potential hanging tasks.
22/01/21 08:52:36.430 executor-heartbeater WARN HangingTaskDetector: Task 181296 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 2729, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:36:26 UTC 2022
22/01/21 08:52:36.430 executor-heartbeater WARN HangingTaskDetector: Task 181297 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 3387, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:36:26 UTC 2022
22/01/21 08:52:45.097 executor-heartbeater WARN HangingTaskDetector: Task 85937 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 7267, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:35:45 UTC 2022
22/01/21 08:52:45.097 executor-heartbeater WARN HangingTaskDetector: Task 85936 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 7231, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:35:45 UTC 2022
22/01/21 08:52:45.097 executor-heartbeater WARN HangingTaskDetector: Task 85939 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 6504, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:35:45 UTC 2022
22/01/21 08:52:45.097 executor-heartbeater WARN HangingTaskDetector: Task 85938 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 6823, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:35:45 UTC 2022
22/01/21 08:52:46.430 executor-heartbeater WARN HangingTaskDetector: Task 181298 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 3105, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:36:26 UTC 2022
22/01/21 08:52:46.430 executor-heartbeater WARN HangingTaskDetector: Task 181299 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 3196, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:36:26 UTC 2022
22/01/21 08:52:46.430 executor-heartbeater WARN HangingTaskDetector: Task 181296 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 2729, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:36:26 UTC 2022
22/01/21 08:52:46.430 executor-heartbeater WARN HangingTaskDetector: Task 181297 is probably not making progress because its metrics (Map(internal.metrics.shuffle.read.localBlocksFetched -> 4, internal.metrics.shuffle.read.remoteBytesReadToDisk -> 0, internal.metrics.shuffle.write.bytesWritten -> 0, internal.metrics.output.recordsWritten -> 0, internal.metrics.shuffle.write.recordsWritten -> 0, internal.metrics.memoryBytesSpilled -> 0, internal.metrics.shuffle.read.remoteBytesRead -> 0, internal.metrics.diskBytesSpilled -> 0, internal.metrics.shuffle.read.localBytesRead -> 3387, internal.metrics.shuffle.read.recordsRead -> 4, internal.metrics.output.bytesWritten -> 0, internal.metrics.input.bytesRead -> 0, internal.metrics.input.recordsRead -> 0, internal.metrics.shuffle.read.remoteBlocksFetched -> 0)) has not changed since Fri Jan 21 07:36:26 UTC 2022

To download the log: scala-test-detailed-output.log

@jlowe
Copy link
Member Author

jlowe commented Jan 21, 2022

I think I found the tests that are hanging, see #4599. I'll skip these tests in this PR to unblock the Databricks build.

@jlowe jlowe dismissed stale reviews from revans2 and tgravescs via b4ceb5a January 21, 2022 16:45
@jlowe
Copy link
Member Author

jlowe commented Jan 21, 2022

build

@jlowe jlowe merged commit 2090ecc into NVIDIA:branch-22.02 Jan 21, 2022
@jlowe jlowe deleted the fix-promote-unapply branch January 21, 2022 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Build error "GpuOverrides.scala:924: wrong number of arguments" on DB9.1.x spark-3.1.2
4 participants