Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support comparing ORC data #1545

Merged
merged 4 commits into from
Jan 22, 2021
Merged

Support comparing ORC data #1545

merged 4 commits into from
Jan 22, 2021

Conversation

wjxiz1992
Copy link
Collaborator

Signed-off-by: Allen Xu allxu@nvidia.com

This is to resolve #1544

Signed-off-by: Allen Xu <allxu@nvidia.com>
gerashegalov
gerashegalov previously approved these changes Jan 20, 2021
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nits

Comment on lines 525 to 529
path: String => spark.read.csv(path)
case "parquet" =>
path: String => spark.read.parquet(path)
case "orc" =>
path: String => spark.read.orc(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this block L523-L530 can be a one-liner:

val readPathAction = (path: String) => spark.read.format(inputFormat).load(path)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@@ -53,6 +53,8 @@ object CompareResults {
(spark.read.csv(conf.input1()), spark.read.csv(conf.input2()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly lines https://github.com/NVIDIA/spark-rapids/pull/1545/files#diff-bc99ae208fb44d4abcf91b6e5b44515fbebeb90367c5f9eb1719e28226d22bf9R51-R68
can be

    val format = spark.read.format(conf.inputFormat())
    BenchUtils.compareResults(
      format.load(conf.input1()),
      format.load(conf.input2()),
      conf.inputFormat(),
      conf.ignoreOrdering(),
      conf.useIterator(),
      conf.maxErrors(),
      conf.epsilon())
  }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@jlowe jlowe added the test Only impacts tests label Jan 20, 2021
Signed-off-by: Allen Xu <allxu@nvidia.com>
jlowe
jlowe previously approved these changes Jan 21, 2021
@jlowe
Copy link
Member

jlowe commented Jan 21, 2021

build

@wjxiz1992
Copy link
Collaborator Author

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlowe jlowe merged commit 7a76023 into NVIDIA:branch-0.4 Jan 22, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Support comparing ORC data

Signed-off-by: Allen Xu <allxu@nvidia.com>

* clean code

* Add 2021 copyright

Signed-off-by: Allen Xu <allxu@nvidia.com>

* fix bug

Co-authored-by: Allen Xu <allxu@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Support comparing ORC data

Signed-off-by: Allen Xu <allxu@nvidia.com>

* clean code

* Add 2021 copyright

Signed-off-by: Allen Xu <allxu@nvidia.com>

* fix bug

Co-authored-by: Allen Xu <allxu@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#1545)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support comparing ORC data using Benchmark utility
3 participants