-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add command-line interface for TPC-* for use with spark-submit #823
Conversation
build |
2 similar comments
build |
build |
build |
integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/tpch/TpchLikeBench.scala
Show resolved
Hide resolved
* | ||
* @param spark The Spark session | ||
* @param query The name of the query to run e.g. "q5" | ||
* @param iterations The number of times to run the query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar here and rest of functions, any reason to not have all parameters in docs?
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit remove extra newline
build |
1 similar comment
build |
@tgravescs Thanks for the review. I've updated the javadocs to add the missing parameters and also addressed the nits. |
build |
1 similar comment
build |
@tgravescs @abellina This is ready for re-review. This is working well in general but there is the known issue #852 for comparing ordering across partitioned files. That bug already exists in branch-0.3 and is not made any worse with this PR (which just adds a CLI for this utility) so I'd like to get this one in and follow up with a separate PR to fix the underlying bug because it may require a different approach entirely. |
I added a check for multiple partitions and at least have it failing now if ignoreOrdering=true rather than report incorrect differences. |
build |
1 similar comment
build |
build |
1 similar comment
build |
integration_tests/src/main/scala/com/nvidia/spark/rapids/tests/common/CompareResults.scala
Show resolved
Hide resolved
@@ -0,0 +1,197 @@ | |||
/* | |||
* Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this file already existed and I renamed it. Let me see if I can rebase so git recognizes this is a rename rather than a delete and add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah if it was a git mv then its ok, we should try to make sure its a move so we don't lose history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rebased and if you look at the individual commits you can see that I did a git mv
in the first commit and then changed the file contents in the second commit but the "files changed" tab in the github UI for this PR still shows it as a delete and add for some reason. I do see the correct history locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as its a git move its fine.
3c5bcc2
to
f1f79d7
Compare
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
f1f79d7
to
a0ca562
Compare
build |
…A#823) * rename tpch benchmark file Signed-off-by: Andy Grove <andygrove@nvidia.com> * Add command-line interface to benchmarks Signed-off-by: Andy Grove <andygrove@nvidia.com>
…A#823) * rename tpch benchmark file Signed-off-by: Andy Grove <andygrove@nvidia.com> * Add command-line interface to benchmarks Signed-off-by: Andy Grove <andygrove@nvidia.com>
…A#823) * rename tpch benchmark file Signed-off-by: Andy Grove <andygrove@nvidia.com> * Add command-line interface to benchmarks Signed-off-by: Andy Grove <andygrove@nvidia.com>
* Revert "Update JNI submodule ref to cudf v22.12.01 (NVIDIA#816)" This reverts commit 353e2f9. * re-target cudf submodule to tag v22.12.00 Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: Peixin Li <pxli@nyu.edu>
Provides a command-line interface for TPC-* benchmarks so that they can be run from spark-submit as an alternative to running them interactively via spark-shell. This makes it easy to run a batch of queries from a bash script. For example:
This closes #795
TPC-H had fallen a bit behind so this PR also makes TPC-H consistent with the other benchmarks.