Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --use-decimals flag to TPC-DS ConvertFiles #1506

Merged

Conversation

andygrove
Copy link
Contributor

This adds the ability to specify to use decimal types when converting TPC-DS data to Parquet when using spark-submit with the ConvertFiles utility,

@andygrove andygrove added the benchmark Benchmarking, benchmarking tools label Jan 13, 2021
@andygrove andygrove added this to the Jan 4 - Jan 15 milestone Jan 13, 2021
@andygrove andygrove self-assigned this Jan 13, 2021
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove force-pushed the benchmark-convert-files-use-decimal branch from 084153b to 528b53a Compare January 13, 2021 19:14
@andygrove
Copy link
Contributor Author

build

Signed-off-by: Andy Grove <andygrove@nvidia.com>
gerashegalov
gerashegalov previously approved these changes Jan 13, 2021
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

conf.coalesce,
conf.repartition,
conf.withPartitioning())
baseInput = conf.input(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's better to use named parameters consistently, so spark should be spark = spark, or is it too redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main goal was to make sure the configuration options were being passed to the correct arguments, but it makes sense to use named args for all of them so I have updated this.

nartal1
nartal1 previously approved these changes Jan 13, 2021
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove dismissed stale reviews from nartal1 and gerashegalov via 2ca8133 January 13, 2021 22:28
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 291a20a into NVIDIA:branch-0.4 Jan 14, 2021
@andygrove andygrove deleted the benchmark-convert-files-use-decimal branch January 14, 2021 19:55
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add --use-decimals flag to TPC-DS ConvertFiles

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* update copyright

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* make named args consistent
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add --use-decimals flag to TPC-DS ConvertFiles

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* update copyright

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* make named args consistent
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…VIDIA#1506)

* Prevent optimization

* Add comment

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

* Revert "Add comment"

This reverts commit 025fad50bf317b8608e9f41ea637d3131181eadc.

---------

Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Benchmarking, benchmarking tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants