Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for artifacts building for bundles #583

Merged
merged 13 commits into from
Jul 25, 2023
Merged

Conversation

andrewnester
Copy link
Contributor

@andrewnester andrewnester commented Jul 18, 2023

Changes

Added support for artifacts building for bundles.

Now it allows to specify artifacts block in bundle.yml and define a resource (at the moment Python wheel) to be build and uploaded during bundle deploy

Built artifact will be automatically attached to corresponding job task or pipeline where it's used as a library

Follow-ups:

  1. If artifact is used in job or pipeline, but not found in the config, try to infer and build it anyway
  2. If build command is not provided for Python wheel artifact, infer it

Tests

Case 1. Explicitly defined libraries block

bundle.yml

bundle:
  name: wheel-task

artifacts:
    my_test_code:
      type: whl
      build: "/usr/local/bin/python setup.py bdist_wheel"

workspace:
  host: https://******.com

resources:
  jobs:
    test_job:
      name: "[${bundle.environment}] My Wheel Job"
      tasks:
        - task_key: TestTask
          existing_cluster_id: "0717-*****-********"
          python_wheel_task:
            package_name: "my_test_code"
            entry_point: "run"
          libraries:
          - whl: ./dist/my_test_code-*.whl

andrew.nester@HFW9Y94129 wheel % cli bundle deploy
artifacts.whl.Build(my_test_code): Building...
artifacts.whl.Build(my_test_code): Build succeeded
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/wheel-task/default/files!

artifacts.Upload(my_test_code-0.0.1-py2-none-any.whl): Uploading...
artifacts.Upload(my_test_code-0.0.1-py2-none-any.whl): Upload succeeded
Starting resource deployment
Resource deployment completed!

Python wheel was successfully built and uploaded and attached to matching job.

Case 2. Explicitly defined libraries block but with non matching path

....
          python_wheel_task:
            package_name: "my_test_code"
            entry_point: "run"
          libraries:
          - whl: ./dist/not-exists-*.whl
...
artifacts.whl.Build(my_test_code): Building...
artifacts.whl.Build(my_test_code): Build succeeded
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/wheel-task/default/files!

Error: no file matches found for library ./dist/notexists-*.whl

Case 3. No libraries block, artifact is not used

....
          python_wheel_task:
            package_name: "my_test_code"
            entry_point: "run"
...
artifacts.whl.Build(my_test_code): Building...
artifacts.whl.Build(my_test_code): Build succeeded
Starting upload of bundle files
Uploaded bundle files at /Users/andrew.nester@databricks.com/.bundle/wheel-task/default/files!

Error: task 'TestTask' is missing required libraries. Please include your package code in task libraries block

Integration test

2023/07/20 17:28:19 [INFO] Locating tests in /Users/andrew.nester/cli
2023/07/20 17:28:19 [INFO] found test in /Users/andrew.nester/cli/bundle/tests/bundle
...
=== RUN   TestAccBundlePythonWheelBuild
    wheel_test.go:15
2023/07/20 17:28:25 INFO Phase: build
artifacts.whl.Build(my_test_code): Building...
artifacts.whl.Build(my_test_code): Build succeeded
--- PASS: TestAccBundlePythonWheelBuild (0.29s)
PASS
ok      github.com/databricks/cli/bundle/tests/bundle   1.063s

@lennartkats-db
Copy link
Contributor

Nice! For your example, there doesn't seem to be a whl library referenced in the job?

@andrewnester
Copy link
Contributor Author

@lennartkats-db you're right, we do not need to explicitly reference the library in the same manner it is now in DBX. What happens underneath is that CLI check which package is being used in a job, finds corresponding artefact and attaches it as a library

https://github.com/databricks/cli/pull/583/files#diff-80cac01f5e27a35089510718bd80e3d1964b4c2c4eac9d9eb7b283a0f6a83c3cR38

@lennartkats-db
Copy link
Contributor

That seems like it's a bit too magical to me... It would become really hard to find out what is used in which jobs if you have a DAB with several (>1) jobs. I'd rather have an explicit reference to the library just like we always have in job definitions for libraries.

To me it would be ideal if the job had an explicit reference to a whl file, just like it does today, without even using variables. That way we could even support visual editing of such a job.

bundle/artifacts/whl/build.go Outdated Show resolved Hide resolved
bundle/artifacts/whl/build.go Outdated Show resolved Hide resolved
bundle/artifacts/whl/build.go Outdated Show resolved Hide resolved
bundle/libraries/libraries.go Outdated Show resolved Hide resolved
python/utils.go Show resolved Hide resolved
@pietern
Copy link
Contributor

pietern commented Jul 20, 2023

@lennartkats-db It's not unexpected though. If I specify a package foo in my job, and I have a wheel definition for a package foo in my local directory, that this commands puts them together and makes it work. If you have >1 jobs or >1 wheels the same still works and you can unambiguously refer to either wheel.

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an integration test with a minimal setup.py just to check that performing the build works if the parameters are set correctly. I suspect it would have caught an issue with python.FindFileWithSuffixInPath("dist", ".whl").

bundle/artifacts/whl/build.go Outdated Show resolved Hide resolved
bundle/config/artifact.go Outdated Show resolved Hide resolved
bundle/libraries/libraries.go Outdated Show resolved Hide resolved
bundle/libraries/libraries.go Outdated Show resolved Hide resolved
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last couple of comments that I didn't catch in the previous round.

Everything else LGTM.

bundle/artifacts/artifacts.go Outdated Show resolved Hide resolved
bundle/artifacts/build.go Outdated Show resolved Hide resolved
bundle/config/artifact.go Outdated Show resolved Hide resolved
bundle/libraries/libraries.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lennartkats-db lennartkats-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not comfortable with this amount of magic ... we should discuss what we want to do here. So far we have designed DABs to be explicit and have directly followed the API specs for the format. Automagically inserting a library dependency into one of the jobs seems contrary to that design and is contrary to the Python mantra of Explicit is better than Implicit. cc @pietern

bundle/artifacts/artifacts.go Outdated Show resolved Hide resolved
bundle/artifacts/artifacts.go Outdated Show resolved Hide resolved
bundle/artifacts/artifacts.go Outdated Show resolved Hide resolved
bundle/artifacts/build.go Outdated Show resolved Hide resolved
artifactPath := b.Config.Workspace.ArtifactsPath
b.WorkspaceClient().Workspace.Delete(ctx, workspace.Delete{
Path: artifactPath,
Recursive: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too keen on a recursive delete without validation. If someone configures artifact_path to ~ then this would remove everything underneath. Could we do some kind of validation prior to a recursive delete, or better yet, do some kind of tracking which files we have uploaded and only remove those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of tracking the uploaded files as it can be a complex and error prone in the end.
Instead I suggest to upload artifacts to subfolder of artifact_path and then we can safely delete it each time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works. Agree file tracking would add a lot of complexity.

return fmt.Errorf("artifacts.whl.Build(%s): cannot find built wheel in %s", m.name, dir)
}
artifact.File = filepath.Join(dir, wheel)
for _, wheel := range wheels {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can we expect multiple wheels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Python project I can imagine we might have some setup for monorepo and custom builds which yields multiple Python wheels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd expect multiple artifacts be defined, one for each wheel. But it's fine for symmetry with the Java case you describe below.

bundle/config/artifact.go Show resolved Hide resolved
bundle/config/artifact.go Show resolved Hide resolved
bundle/libraries/libraries.go Outdated Show resolved Hide resolved
bundle/libraries/libraries.go Show resolved Hide resolved
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case 3 is unexpected. Could we figure out a way to display a warning?

@andrewnester andrewnester dismissed lennartkats-db’s stale review July 25, 2023 11:34

Addressed in the followup commits

@andrewnester andrewnester merged commit 9a88fa6 into main Jul 25, 2023
4 checks passed
@pietern pietern deleted the artifact-block branch July 25, 2023 13:59
@mgyucht mgyucht mentioned this pull request Jul 27, 2023
github-merge-queue bot pushed a commit that referenced this pull request Jul 27, 2023
Breaking Change:
* Require include glob patterns to be explicitly defined
([#602](#602)).

Bundles:
* Add support for more SDK config options
([#587](#587)).
* Add template renderer for Databricks templates
([#589](#589)).
* Fix formatting in renderer.go
([#593](#593)).
* Fixed python wheel test
([#608](#608)).
* Auto detect Python wheel packages and infer build command
([#603](#603)).
* Added support for artifacts building for bundles
([#583](#583)).
* Add support for cloning repositories
([#544](#544)).
* Add regexp compile helper function for templates
([#601](#601)).
* Add unit test that raw strings are printed as is
([#599](#599)).

Internal:
* Fix tests under ./cmd/configure if DATABRICKS_TOKEN is set
([#605](#605)).
* Remove dependency on global state in generated commands
([#595](#595)).
* Remove dependency on global state for the root command
([#606](#606)).
* Add merge_group trigger for build
([#612](#612)).
* Added support for build command chaining and error on missing wheel
([#607](#607)).
* Add TestAcc prefix to filer test and fix any failing tests
([#611](#611)).
* Add url parse helper function for templates
([#600](#600)).
* Remove dependency on global state for remaining commands
([#613](#613)).
* Update CHANGELOG template
([#588](#588)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants