Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change databricks build to dynamically create a cluster #981

Merged
merged 46 commits into from
Oct 21, 2020

Conversation

tgravescs
Copy link
Collaborator

@tgravescs tgravescs commented Oct 19, 2020

This changes it so we dynamically create a new databricks cluster everytime we kick off a nightly build. It deletes the cluster at the end of the run as well. This is the basic functionality that we have now and we can continue enhance it later.

@tgravescs
Copy link
Collaborator Author

added a create.py script and split create off from run-tests.py. Added a bunch more options to make it more configurable. Passing clusterid back in stdout from create script and then pass it in jenkinsfile via environment variable.

@tgravescs
Copy link
Collaborator Author

tgravescs commented Oct 20, 2020

@tgravescs
Copy link
Collaborator Author

build

revans2
revans2 previously approved these changes Oct 20, 2020
jenkins/databricks/create.py Outdated Show resolved Hide resolved
jenkins/databricks/create.py Outdated Show resolved Hide resolved
jenkins/databricks/run-tests.py Show resolved Hide resolved
Signed-off-by: Thomas Graves <tgraves@nvidia.com>
@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

3.1.0 failure we have pr up for

@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

same 3.1.0 failure with ParquetRowConverter that should have been fixed second time I built. will try again

@tgravescs
Copy link
Collaborator Author

build

@jlowe
Copy link
Member

jlowe commented Oct 21, 2020

same 3.1.0 failure with ParquetRowConverter that should have been fixed second time I built. will try again

The PR needs to be upmerged with latest on branch-0.3 to pick up the fix.

@tgravescs
Copy link
Collaborator Author

build

@tgravescs tgravescs merged commit e05b3f4 into NVIDIA:branch-0.3 Oct 21, 2020
@tgravescs tgravescs deleted the dbcirebase branch October 21, 2020 16:08
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove extra newline

* use the right -gt for bash

* Add new python file for databricks cluster utils

* Fix up scripts

* databricks scripts working

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* cluster creation script mods

* fix

* fix pub key

* fix missing quote

* fix $

* update public key to be param

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add public key value

* clenaup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* modify permissions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change loc cluster id file

* fix extra /

* quote public key

* try different setting cluster id

* debug

* try again

* try readfile

* try again

* try quotes

* cleanup

* Add option to control number of partitions when converting from CSV to Parquet (NVIDIA#915)

* Add command-line arguments for applying coalesce and repartition on a per-table basis

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Move command-line validation logic and address other feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update copyright years and fix import order

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update docs/benchmarks.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Remove withPartitioning option from TPC-H and TPC-xBB file conversion

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Benchmark runner script (NVIDIA#918)

* Benchmark runner script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add argument for number of iterations

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Fix docs

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add license

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* improve documentation for the configuration files

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add missing line-continuation symbol in example

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Remove hard-coded spark-submit-template.txt and add --template argument. Also make all arguments required.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update benchmarking guide to link to the benchmark python script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add --template to example and fix markdown header

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add legacy config to clear active Spark 3.1.0 session in tests (NVIDIA#970)

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* XFail tests until final fix can be put in (NVIDIA#968)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Stop reporting totalTime metric for GpuShuffleExchangeExec (NVIDIA#973)

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

* Add create script, add more parameters, etc

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add create script

* rework some scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* fix is_cluster_running

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* put slack back in

* update text

* cleanup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove datetime

* send output to stderr

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Andy Grove <andygrove@users.noreply.github.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove extra newline

* use the right -gt for bash

* Add new python file for databricks cluster utils

* Fix up scripts

* databricks scripts working

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* cluster creation script mods

* fix

* fix pub key

* fix missing quote

* fix $

* update public key to be param

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add public key value

* clenaup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* modify permissions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change loc cluster id file

* fix extra /

* quote public key

* try different setting cluster id

* debug

* try again

* try readfile

* try again

* try quotes

* cleanup

* Add option to control number of partitions when converting from CSV to Parquet (NVIDIA#915)

* Add command-line arguments for applying coalesce and repartition on a per-table basis

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Move command-line validation logic and address other feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update copyright years and fix import order

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update docs/benchmarks.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Remove withPartitioning option from TPC-H and TPC-xBB file conversion

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Benchmark runner script (NVIDIA#918)

* Benchmark runner script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add argument for number of iterations

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Fix docs

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add license

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* improve documentation for the configuration files

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add missing line-continuation symbol in example

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Remove hard-coded spark-submit-template.txt and add --template argument. Also make all arguments required.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update benchmarking guide to link to the benchmark python script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add --template to example and fix markdown header

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add legacy config to clear active Spark 3.1.0 session in tests (NVIDIA#970)

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* XFail tests until final fix can be put in (NVIDIA#968)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Stop reporting totalTime metric for GpuShuffleExchangeExec (NVIDIA#973)

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

* Add create script, add more parameters, etc

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add create script

* rework some scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* fix is_cluster_running

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* put slack back in

* update text

* cleanup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove datetime

* send output to stderr

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Andy Grove <andygrove@users.noreply.github.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove extra newline

* use the right -gt for bash

* Add new python file for databricks cluster utils

* Fix up scripts

* databricks scripts working

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* cluster creation script mods

* fix

* fix pub key

* fix missing quote

* fix $

* update public key to be param

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add public key value

* clenaup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* modify permissions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change loc cluster id file

* fix extra /

* quote public key

* try different setting cluster id

* debug

* try again

* try readfile

* try again

* try quotes

* cleanup

* Add option to control number of partitions when converting from CSV to Parquet (NVIDIA#915)

* Add command-line arguments for applying coalesce and repartition on a per-table basis

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Move command-line validation logic and address other feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update copyright years and fix import order

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update docs/benchmarks.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Remove withPartitioning option from TPC-H and TPC-xBB file conversion

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Benchmark runner script (NVIDIA#918)

* Benchmark runner script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add argument for number of iterations

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Fix docs

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add license

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* improve documentation for the configuration files

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add missing line-continuation symbol in example

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Remove hard-coded spark-submit-template.txt and add --template argument. Also make all arguments required.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update benchmarking guide to link to the benchmark python script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add --template to example and fix markdown header

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add legacy config to clear active Spark 3.1.0 session in tests (NVIDIA#970)

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* XFail tests until final fix can be put in (NVIDIA#968)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Stop reporting totalTime metric for GpuShuffleExchangeExec (NVIDIA#973)

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

* Add create script, add more parameters, etc

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add create script

* rework some scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* fix is_cluster_running

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* put slack back in

* update text

* cleanup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove datetime

* send output to stderr

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Andy Grove <andygrove@users.noreply.github.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants