Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split DB scripts to make them common for the build and IT pipeline #1933

Merged
merged 4 commits into from
Mar 16, 2021

Conversation

NvTimLiu
Copy link
Collaborator

Split DB scripts to make them common for the build and IT pipeline.

Issue: #1568

Related PR #1829

@NvTimLiu
Copy link
Collaborator Author

build

@@ -103,36 +103,5 @@ mvn -B install:install-file \

mvn -B -P${BUILD_PROFILES} clean package -DskipTests

# Copy so we pick up new built jar and latest CuDF jar. Note that the jar names have to be
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move integration tests scripts into 'test.sh', to make it common for DB build & IT pipelines

@@ -0,0 +1,71 @@
# Copyright (c) 2021, NVIDIA CORPORATION.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the parameter parsing scripts common for both 'run-build.py' and 'run-test.py'



def main():
workspace = 'https://dbc-9ff9942e-a9c4.cloud.databricks.com'
Copy link
Collaborator Author

@NvTimLiu NvTimLiu Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the parameter parsing scripts common for both 'run-build.py' and 'run-test.py', move parsing code into params.py

LOCAL_JAR_PATH=$1

# tests
export PATH=/databricks/conda/envs/databricks-ml-gpu/bin:/databricks/conda/condabin:$PATH
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move integration tests scripts into 'test.sh', to make it common for DB build & IT pipelines


set -e

LOCAL_JAR_PATH=$1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build pipeline builds the jars from source code, so the LOCAL_JAR_PATH is null.

IT pipeline sets the 'LOCAL_JAR_PATH' to tell where the jars are downloaded into the local directory.

@@ -23,21 +23,23 @@ class ClusterUtils(object):

@staticmethod
def generate_create_templ(sshKey, cluster_name, runtime, idle_timeout,
num_workers, driver_node_type, worker_node_type,
num_workers, driver_node_type, worker_node_type, cluster_type,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add 'cluster_type' parameter to make create.py common for aws and azure Databricks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need whatever equivalent for azure or those just aren't specified?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also perhaps a better name for this would be like cloud_provider.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need whatever equivalent for azure or those just aren't specified?

I thinks we need not specify whatever equivalent for azure, as the default(unspecified) value of cluster_type is aws.

Also cluster_type is for aws configs of creating cluster as below, any other values means we do not need the below configs. So we only need to specify the default(unspecified) value cluster_type as 'aws' in the create.py.

    if (cluster_type == 'aws'):
        templ['aws_attributes'] = {
                    "zone_id": "us-west-2a",
                    "first_on_demand": 1,
                    "availability": "SPOT_WITH_FALLBACK",
                    "spot_bid_price_percent": 100,
                    "ebs_volume_count": 0
        }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also perhaps a better name for this would be like cloud_provider.

Sounds good, let me change it.

@@ -33,20 +33,21 @@ def main():
num_workers = 1
worker_type = 'g4dn.xlarge'
driver_type = 'g4dn.xlarge'
cluster_type = 'aws'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add 'cluster_type' parameter to make create.py common for aws and azure Databricks

@NvTimLiu NvTimLiu marked this pull request as draft March 15, 2021 09:56
…re Databricks

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu NvTimLiu marked this pull request as ready for review March 15, 2021 11:43
@NvTimLiu
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator

changes overall look fine, a few minor nits. One thing I want to mention is these scripts are also used by dev to build and test on Databricks manually. So any changes I would like to keep that easy to do. It looks like with these changes that is still fine, just splits it into build and test separately.

change the var name from cluster_type to cloud_provider.

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu
Copy link
Collaborator Author

NvTimLiu commented Mar 15, 2021

changes overall look fine, a few minor nits. One thing I want to mention is these scripts are also used by dev to build and test on Databricks manually. So any changes I would like to keep that easy to do. It looks like with these changes that is still fine, just splits it into build and test separately.

Got it. I'll try to change the scripts as little as possible.

@sameerz sameerz added the build Related to CI / CD or cleanly building label Mar 16, 2021
@sameerz sameerz added this to the Mar 15 - March 26 milestone Mar 16, 2021
@NvTimLiu
Copy link
Collaborator Author

build

@NvTimLiu NvTimLiu merged commit 97ccad7 into NVIDIA:branch-0.5 Mar 16, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…VIDIA#1933)

* Split DB scripts to make them common for the build and IT pipeline

Signed-off-by: Tim Liu <timl@nvidia.com>

* update getopt for jar_path

* Add 'cluster_type' parameter to make create.py common for aws and azure Databricks

Signed-off-by: Tim Liu <timl@nvidia.com>

* Change to use a more readable var name

change the var name from cluster_type to cloud_provider.

Signed-off-by: Tim Liu <timl@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…VIDIA#1933)

* Split DB scripts to make them common for the build and IT pipeline

Signed-off-by: Tim Liu <timl@nvidia.com>

* update getopt for jar_path

* Add 'cluster_type' parameter to make create.py common for aws and azure Databricks

Signed-off-by: Tim Liu <timl@nvidia.com>

* Change to use a more readable var name

change the var name from cluster_type to cloud_provider.

Signed-off-by: Tim Liu <timl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants