Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] docker v24.10 #714

Merged
merged 13 commits into from
Oct 10, 2024
Merged

[RELEASE] docker v24.10 #714

merged 13 commits into from
Oct 10, 2024

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-24.10 and v24.10 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-24.10 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-24.10 into main for the release

raydouglass and others added 13 commits July 19, 2024 15:17
Forward-merge branch-24.08 into branch-24.10
Forward-merge branch-24.08 into branch-24.10
Forward-merge branch-24.08 into branch-24.10
Reviewing #706, I noticed the following on the Files tab (https://github.com/rapidsai/docker/pull/706/files):

> FromAsCasing: 'as' and 'FROM' keywords' casing do not match
> More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/

<img width="1285" alt="image" src="https://github.com/user-attachments/assets/66191c7f-77d9-415c-b965-b7e039df49fa">

This resolves those warnings. They don't change anything functionally, but I do slightly agree with the style suggestion, and it'd be nice to remove the visual noise of all those warnings in PR reviews.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #707
A few small tweaks to `update-version.sh` for alignment across RAPIDS.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Mike Sarahan (https://github.com/msarahan)

URL: #701
Follow-up to #702 and #693.

Created based on #696 (comment)

`test` jobs are not currently running on pull requests here, because they require `build-multiarch-manifest` jobs, which have this condition that causes such jobs to be skipped on PR builds:

https://github.com/rapidsai/docker/blob/1c27d9245fd9d99ee35981b970acaf10961ca45b/.github/workflows/build-test-publish-images.yml#L171-L172

This PR ensures that `test` jobs always run on PRs, and that merging is blocked until they succeed.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - Ray Douglass (https://github.com/raydouglass)

URL: #708
Follow-up to #708.

Proposes completely removing the `delete-temp-images` job, in favor of relying on the scheduled nightly cleanup at https://github.com/rapidsai/workflows/blob/main/.github/workflows/cleanup_staging.yaml.

## Notes for Reviewers

### Details

CI here writes images to the `rapidsai/staging` repo on DockerHub, then later copies them to individual user-facing repos.
To avoid those temporary CI artifacts piling up in the `rapidsai/staging` repo, pull requests and branch builds run a job called `delete-temp-images` which does what it sounds like.

In exchange for more aggressive cleaning, this job introduces significant complexity for development here. Most notably, we've observed several instances where that job deletes images before all CI jobs needing them have completed successfully, leading to all of CI needing to be re-run.

Significant effort has been put into trying to avoid that, and we've found it's been difficult to get it right:

some attempts:

* #702
* #708

a recent example:

* #696 (comment)

### Ok so how will we clean up?

The workflow at https://github.com/rapidsai/workflows/blob/main/.github/workflows/cleanup_staging.yaml.

It runs once a day and deletes anything from `rapidsai/staging` that's more than 30 days old.

### Benefits of these changes

As described in #708 (comment) ...

CI here will work as it does in other RAPIDS repos.... if any jobs fail for retryable reasons (like network issues), you can safely click "re-run failed jobs" and make incremental progress towards all builds passing.

Also reduces the need to maintain code that has to keep up with the DockerHub API in two places (by deleting `ci/delete-temp-images.sh` here).

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ray Douglass (https://github.com/raydouglass)
  - https://github.com/jakirkham

URL: #709
Following rapidsai/docs#526, we can remove CUDA 12.2 from the RAPIDS 24.10 Docker images.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - https://github.com/jakirkham
  - Ray Douglass (https://github.com/raydouglass)

URL: #696
This is common information asked for by Conda issue templates (like [`conda/infrastructure` bug reports]( https://github.com/conda/infrastructure/issues/new?assignees=&labels=type%3A%3Abug&projects=&template=0_bug.yml )). Query it during the build so we have this information in the logs when sharing them with others to debug issues.

Authors:
  - https://github.com/jakirkham
  - Ray Douglass (https://github.com/raydouglass)
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Mike Sarahan (https://github.com/msarahan)

URL: #649
Nightly builds of `rapidsai/raft-ann-bench` failed like this:

> ImportError: /lib/aarch64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /opt/conda/lib/python3.11/site-packages/libmambapy/bindings.cpython-311-aarch64-linux-gnu.so)

([build link](https://github.com/rapidsai/docker/actions/runs/10739898324/job/29789780257))

I suspect that's because those images use the same pattern for initializing a conda environment that led to the issues described in rapidsai/ci-imgs#185.

This proposes the same fix that we applied in `ci-imgs` (rapidsai/ci-imgs#186).

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)

URL: #710
…711)

Contributes to rapidsai/build-planning#40.

* adds Python 3.12 images
* defaults to latest Python (3.12) and CUDA (12.5[.1]) in docs and comments

## Notes for Reviewers

Builds here will fail until all RAPIDS libraries are supporting Python 3.12, but figured we don't need to wait on that to come to an agreement about the building and testing matrices.

Blocked by:

* [x] rapidsai/cuml#6060
* [x] rapidsai/cugraph#4647
* [x] rapidsai/integration#719

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #711
@raydouglass raydouglass requested a review from a team as a code owner October 4, 2024 19:46
@raydouglass raydouglass requested review from bdice and removed request for a team October 4, 2024 19:46
@raydouglass raydouglass merged commit 4a97818 into main Oct 10, 2024
678 of 790 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants