Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUILD] cleanup unused jenkins files and scripts #1568

Closed
tgravescs opened this issue Jan 22, 2021 · 13 comments
Closed

[BUILD] cleanup unused jenkins files and scripts #1568

tgravescs opened this issue Jan 22, 2021 · 13 comments
Assignees
Labels
build Related to CI / CD or cleanly building P1 Nice to have for release

Comments

@tgravescs
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Directory:
https://github.com/NVIDIA/spark-rapids/tree/branch-0.4/jenkins

has jenkins files, docker files, and scripts. I believe many of those are not used anymore (like the databricks ones). We should remove any files from here that aren't used.

I'm not sure if one of the Docker files was an example for users. We do already have one for k8s here: https://github.com/NVIDIA/spark-rapids/tree/branch-0.4/docs/get-started. If we want it visible to users perhaps we should document and put it in similar location or create a separate directory for it.

@tgravescs tgravescs added feature request New feature or request ? - Needs Triage Need team to review and classify build Related to CI / CD or cleanly building and removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Jan 22, 2021
@tgravescs tgravescs added the P1 Nice to have for release label Jan 22, 2021
@tgravescs
Copy link
Collaborator Author

@GaryShen2008

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Jan 22, 2021

@tgravescs
1, The below 2 Dockerfiles are useless, we’ll remove them from GitHub/spark-rapids repo.

  • Dockerfile.integration.ubuntu16
  • Dockerfile.ubuntu16

2, Other Dockerfiles/Jenkinsfiles/scripts are necessary for spark-rapids pre-merge and integration jobs.
3, We put these Dockerfiles/Jenkinsfiles/scripts in GitHub, then dev team can update them conveniently.
--- For example, if someone needs to install a python module for IT, he can change the Dockerfile-blossom.integration.centos7 directly on the GitHub repo.

So after clean useless Dockerfiles in item 1, need we to move the whole Jenkins dir? or the subdirs[jenkins/databricks] or files out of spark-rapids Github and move them into internal Gitlab repo?

@tgravescs
Copy link
Collaborator Author

tgravescs commented Jan 22, 2021

  1. ok, I guess Jenkinsfile.databricks301nightly is still used here, but its inconsistent with the other nightly now so I would rather see it moved out to be consistent. I do see some of the other scripts are used from here.
  2. The problem is that some of the scripts are here and some are in the other repo so Dev doesn't know where to update all the builds, which is confusing. For instance we wanted to add build argument, the scripts updated nightly but not all the IT tests. It would be have a single scripts for running the IT tests for instance that others could call with parameters if needed. I don't think it matters which repo this in as much as keeping it consistent and documenting where things are at.

@pxLi
Copy link
Collaborator

pxLi commented Jan 25, 2021

  1. ok, I guess Jenkinsfile.databricks301nightly is still used here, but its inconsistent with the other nightly now so I would rather see it moved out to be consistent. I do see some of the other scripts are used from here.
  2. The problem is that some of the scripts are here and some are in the other repo so Dev doesn't know where to update all the builds, which is confusing. For instance we wanted to add build argument, the scripts updated nightly but not all the IT tests. It would be have a single scripts for running the IT tests for instance that others could call with parameters if needed. I don't think it matters which repo this in as much as keeping it consistent and documenting where things are at.

Hi, I totally agree that put all those stuff in 2 different places could confuse people, but it is required by blossom and security team to move at least jenkins files to internal.

And yes, we may just keep jenkins files and scripts at the same place if possible,

  1. To leave the integration test scripts at github/spark-rapids since developers could still benefit from them (I am assuming most of developers do care about test scripts but not jenkins files) and we put all CSP-related scripts to the internal repo
  2. or to move ALL CICD-related stuff back to internal (people should remember to update test scripts in internal repo after they submit code change to the external repo),
    IMHO, I would prefer the first one, any thoughts?

@tgravescs
Copy link
Collaborator Author

the problem comes in the definition of we put all CSP related scripts... What does that include, right now that includes scripts for running tests, which I don't want them split. If we can split and commonize them such that the scripts for running tests are in spark-rapids and other Jenkinsfiles and CSP setup scripts are in private, I'm good with that.

@tgravescs
Copy link
Collaborator Author

tgravescs commented Feb 8, 2021

also here, it seems we aren't running all of the tests. I'm fixing the integration test jar to include the resources directory, but all the test environment should be using the run_pyspark_from_build.sh where its appropriate.
We may need to modify it to support running against other clusters like EMR

@tgravescs
Copy link
Collaborator Author

note that this should go alone with #1640 and basically ideally we commonize to use the same script to run all the tests. see the comments in that issue.

NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this issue Feb 28, 2021
Fix issue : NVIDIA#1568
Move databricks scrpits to Gitlib so we can use the common scripts for the nightly build job and integration tests job
Remove unused Dockerfiles

Signed-off-by: Tim Liu <timl@nvidia.com>
NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this issue Feb 28, 2021
Fix issue : NVIDIA#1568
Move Databricks scripts to GitLab so we can use the common scripts for the nightly build job and integration tests job
Remove unused Dockerfiles

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Feb 28, 2021

MR on GitLab: 123

@sameerz sameerz added this to the Mar 1 - Mar 12 milestone Mar 1, 2021
NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this issue Mar 1, 2021
NVIDIA#1568

Move Databricks scripts to GitLab so we can use the common scripts for the nightly build job and integration tests job

Remove unused Dockerfiles

Signed-off-by: Tim Liu <timl@nvidia.com>
pxLi pushed a commit that referenced this issue Mar 2, 2021
* Cleanup unused Jenkins files and scripts

#1568

Move Databricks scripts to GitLab so we can use the common scripts for the nightly build job and integration tests job

Remove unused Dockerfiles

Signed-off-by: Tim Liu <timl@nvidia.com>

* rm Dockerfile.integration.ubuntu16

* Restore Databricks nightly scripts

Signed-off-by: Tim Liu <timl@nvidia.com>
@NvTimLiu
Copy link
Collaborator

Todo

  1. To move DB IT scripts into GitHub
  2. Make it common to create cluster, setup pytests ... with python API

@tgravescs
Copy link
Collaborator Author

@NvTimLiu are you still working on commonizing some of the scripts for blossom?

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Mar 29, 2021

@tgravescs I've done all the common scripts for all the cloud environments.

@jlowe Need we to add some tag similar to the [-Dpytest.TEST_TYPE="nightly"](https://github.com/NVIDIA/spark-rapids/blob/branch-0.5/jenkins/spark-nightly-build.sh#L25) for all the integration tests pipelines? If not, I suppose we are good to close this issue.

@jlowe
Copy link
Member

jlowe commented Mar 30, 2021

Need we to add some tag similar to the -Dpytest.TEST_TYPE="nightly" for all the integration tests pipelines?

Yes, all of the CI pipelines should be specifying something for TEST_TYPE so we don't allow certain tests that are a bit tricky to setup to keep skipping, otherwise we risk inadvertently never running those tests before we release.

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Apr 6, 2021

close per PR2059 merged

@NvTimLiu NvTimLiu closed this as completed Apr 6, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this issue Jun 9, 2021
* Cleanup unused Jenkins files and scripts

NVIDIA#1568

Move Databricks scripts to GitLab so we can use the common scripts for the nightly build job and integration tests job

Remove unused Dockerfiles

Signed-off-by: Tim Liu <timl@nvidia.com>

* rm Dockerfile.integration.ubuntu16

* Restore Databricks nightly scripts

Signed-off-by: Tim Liu <timl@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this issue Jun 9, 2021
* Cleanup unused Jenkins files and scripts

NVIDIA#1568

Move Databricks scripts to GitLab so we can use the common scripts for the nightly build job and integration tests job

Remove unused Dockerfiles

Signed-off-by: Tim Liu <timl@nvidia.com>

* rm Dockerfile.integration.ubuntu16

* Restore Databricks nightly scripts

Signed-off-by: Tim Liu <timl@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
…p ci] [bot] (NVIDIA#1568)

* Update submodule cudf to 8deb3dd7573000e7d87f18a9e2bbe39cf2932e10

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to 9e7f8a5fdd03d6a24630687621d0ee14c2db26d7

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to f9c586d48aa2a879b2267318088d3cc38f398662

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to 53127de4d9e06f9fa172ac34952f85104eb7bac9

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to 8e1ef05b2b96775ce7e1a2f22894ec7a8ebb65a4

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to bf63d1049db70c28ea961b677ad5f207aa648860

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to ba5ec4080be38b795053d11bf46cb3688c201893

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to 6c2e972cefff05f6ffbba4fd9ba894e6849b041e

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to 723c565f7a03e3e9a842526cd4cc94bcf6f582e5

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

* Update submodule cudf to 823d3214a9489e3c496aa31041b5d29f650e94b3

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

---------

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building P1 Nice to have for release
Projects
None yet
Development

No branches or pull requests

6 participants