Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to test nightly environments are solvable and using recent nightlies. #690

Merged
merged 25 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions ci/check_conda_nightly_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
import json
import re
import subprocess
import sys
from datetime import datetime, timedelta


OLD_PACKAGE_THRESHOLD_DAYS = 3


def is_rapids_nightly_package(package_info):
return package_info["channel"] == "rapidsai-nightly"


def get_package_date(package):
# Matches 6 digits starting with "2", which should be YYMMDD
date_re = r"_(2\d{5})_"

# Use regex to find the date string in the input
match = re.search(date_re, package["build_string"])

if match:
# Convert the date string to a datetime object
date_string = match.group(1)
date_object = datetime.strptime(date_string, "%y%m%d")
return date_object

print(
f"Date string not found for {package['name']} "
f"in the build string '{package['build_string']}'."
)
return None


def check_env(json_path):
"""Validate rapids conda environments.

Parses JSON output of `conda create` and check the dates on the RAPIDS
packages to ensure nightlies are relatively new.
"""

with open(json_path) as f:
try:
json_data = json.load(f)
except ValueError as e:
print("Error: JSON data file from conda failed to load:")
print(e)
sys.exit(1)

if "error" in json_data:
print("Error: conda failed:")
print()
print(json_data["error"])
sys.exit(1)

package_data = json_data["actions"]["LINK"]

rapids_package_data = list(filter(is_rapids_nightly_package, package_data))

# Dictionary to store the packages and their dates
rapids_package_dates = {
package["name"]: get_package_date(package)
for package in rapids_package_data
}

today = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
old_threshold = today - timedelta(days=OLD_PACKAGE_THRESHOLD_DAYS)
old_packages = {
package: date
for package, date in rapids_package_dates.items()
if date is not None and date < old_threshold
}

# If there are old packages, raise an error
if old_packages:
print()
print(
"Error: The following nightly packages are more than "
f"{OLD_PACKAGE_THRESHOLD_DAYS} days old:"
)
for package, date in old_packages.items():
date_string = date.strftime("%Y-%m-%d")
print(f" - {package}: {date_string}")
sys.exit(1)

print(f"All packages are less than {OLD_PACKAGE_THRESHOLD_DAYS} days old.")


if __name__ == "__main__":
if len(sys.argv) != 2:
print(
"Provide only one argument, the filepath to a JSON output from "
"conda."
)
sys.exit(1)

check_env(sys.argv[1])
1 change: 1 addition & 0 deletions ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ function sed_runner() {
}

sed_runner "/RAPIDS_VER=/ s/[0-9][0-9].[0-9][0-9]/${NEXT_SHORT_TAG}/" ci/conda-pack.sh
sed_runner "/RAPIDS_VERSION=/ s/[0-9][0-9].[0-9][0-9]/${NEXT_SHORT_TAG}/" ci/test_conda_nightly_env.sh

for FILE in .github/workflows/*.yaml; do
sed_runner "/shared-workflows/ s/@.*/@branch-${NEXT_SHORT_TAG}/g" "${FILE}"
Expand Down
29 changes: 29 additions & 0 deletions ci/test_conda_nightly_env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash
# Copyright (c) 2023, NVIDIA CORPORATION.

set -euo pipefail

RAPIDS_VERSION="23.12"
CUDA_VERSION=${RAPIDS_CUDA_VERSION%.*}

JSON_FILENAME="rapids_cuda${CUDA_VERSION}_py${RAPIDS_PY_VERSION}.json"

echo "Creating conda environment with rapids=${RAPIDS_VERSION}, python=${RAPIDS_PY_VERSION}, cuda-version=${CUDA_VERSION}"
#rapids-logger "Creating conda environment with rapids=${RAPIDS_VERSION}, python=${RAPIDS_PY_VERSION}, cuda-version=${CUDA_VERSION}"

#rapids-conda-retry \
conda \
create \
--solver=libmamba \
-n rapids-${RAPIDS_VERSION} \
-c rapidsai-nightly \
-c conda-forge \
-c nvidia \
rapids=${RAPIDS_VERSION} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(was going to put this as my review, realized a thread would probably be better)

I totally support adding something like this, based on your description offline:

... to ensure that we are able to solve the full RAPIDS conda environment with recent packages (in other words, ensure there are no recent conflicts causing fallback to older conda packages)

But I don't think it offers 100% protection against the case described in https://github.com/rapidsai/ops/issues/2947.

That issue is not about just packages building against too-old versions of dependencies... it's about packages across RAPIDS building against very different versions of dependencies.

It looks to me that the code in this PR would catch cases like these:

  • "rmm nightlies haven't been published in the last 3 days"
  • "the latest versions of cugraph and pylibraft can't be installed in the same environment"

These aren't captured by the existing nightly tests at https://github.com/rapidsai/workflows/actions/workflows/nightly-pipeline.yaml.

But this wouldn't be guaranteed to catch a case like this:

  • "the latest cuml nightly built against an rmm from 5 days ago, but the latest cudf nightly built against an rmm from yesterday"

Because this test with the rapids package is solving across all of the packages' runtime dependencies, but they could have ended up building against older versions based on conflicts in their individual build environments, right?

And those types of conflicts might not show up here if we use pin_compatible(max_pin="x.x") in run dependencies, e.g. pylibraft==24.10.* is going to have a runtime dependency on rmm=24.10.* regardless of which specific nightly of rmm it pulled in at build time. (pylibraft nightly files)

I think detecting that other case would have to happen at build time (or by post-processing of logs from build time). And I don't know how complex that would be, so can't say with confidence that the complexity would be worth it.

I totally support the approach this PR is pursuing, just wanted to be sure to note this other possible avenue for version mismatches to get through.

Copy link
Contributor Author

@bdice bdice Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct about what this will and will not cover. I think it's worth pursuing because (1) it prevents runtime problems from being hidden and (2) sometimes runtime conflicts will also affect build environments, so this may give us a bit of signal into deeper problems happening at build time, should they arise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great! I totally support moving forward with this.

python=${RAPIDS_PY_VERSION} \
cuda-version=${CUDA_VERSION} \
--dry-run \
--json \
| tee "${JSON_FILENAME}"

python ci/check_conda_nightly_env.py "${JSON_FILENAME}"