Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added transformation mutator for Python wheel task for them to work on DBR <13.1 #635

Merged
merged 10 commits into from
Aug 30, 2023

Conversation

andrewnester
Copy link
Contributor

@andrewnester andrewnester commented Aug 3, 2023

Changes

Note: this PR relies on sync.include functionality from here: #671

Added transformation mutator for Python wheel task for them to work on DBR <13.1

Using wheels upload to Workspace file system as cluster libraries is not supported in DBR < 13.1

In order to make Python wheel work correctly on DBR < 13.1 we do the following:

  1. Build and upload python wheel as usual
  2. Transform python wheel task into special notebook task which does the following
    a. Installs all necessary wheels with %pip magic
    b. Executes defined entry point with all provided parameters
  3. Upload this notebook file to workspace file system
  4. Deploy transformed job task

This is also beneficial for executing on existing clusters because this notebook always reinstall wheels so if there are any changes to the wheel package, they are correctly picked up

Tests

bundle.yml

bundle:
  name: wheel-task

workspace:
  host: ****

resources:
  jobs:
    test_job:
      name: "[${bundle.environment}] My Wheel Job"
      tasks:
        - task_key: TestTask
          existing_cluster_id: "***"
          python_wheel_task:
            package_name: "my_test_code"
            entry_point: "run"
            parameters: ["first argument","first value","second argument","second value"]
          libraries:
          - whl: ./dist/*.whl

Output

andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job
Run URL: ***

2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS 
Output:
=======
Task TestTask:
Hello from my func
Got arguments v1:
['python', 'first argument', 'first value', 'second argument', 'second value']

bundle/python/transform.go Outdated Show resolved Hide resolved
bundle/python/transform.go Show resolved Hide resolved
bundle/python/transform.go Outdated Show resolved Hide resolved
bundle/python/transform.go Outdated Show resolved Hide resolved
bundle/python/transform.go Outdated Show resolved Hide resolved
bundle/python/transform.go Outdated Show resolved Hide resolved
bundle/artifacts/artifacts.go Outdated Show resolved Hide resolved
@andrewnester andrewnester changed the title Added transformation mutator for Python wheel task for them to work on DBR <13.1 Added Sync.Include and Sync.Excluded + Added transformation mutator for Python wheel task for them to work on DBR <13.1 Aug 14, 2023
bundle/bundle.go Outdated Show resolved Hide resolved
bundle/config/root.go Outdated Show resolved Hide resolved
bundle/deploy/files/sync.go Outdated Show resolved Hide resolved
bundle/python/transform.go Show resolved Hide resolved
bundle/python/transform.go Show resolved Hide resolved
libs/fileset/glob.go Outdated Show resolved Hide resolved
libs/fileset/glob.go Outdated Show resolved Hide resolved
libs/fileset/glob.go Outdated Show resolved Hide resolved
libs/sync/sync.go Outdated Show resolved Hide resolved
libs/sync/sync.go Outdated Show resolved Hide resolved
@andrewnester andrewnester changed the title Added Sync.Include and Sync.Excluded + Added transformation mutator for Python wheel task for them to work on DBR <13.1 Added transformation mutator for Python wheel task for them to work on DBR <13.1 Aug 16, 2023
bundle/python/transform.go Show resolved Hide resolved
bundle/python/transform.go Outdated Show resolved Hide resolved
bundle/python/transform.go Outdated Show resolved Hide resolved
bundle/python/transform_test.go Show resolved Hide resolved
bundle/python/transform.go Outdated Show resolved Hide resolved
github-merge-queue bot pushed a commit that referenced this pull request Aug 17, 2023
#674)

## Changes
Fixes #673 

It also includes a change for `libraries` from #635 to get the list of
wheel tasks
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a good idea to wait for @pietern to get back, but this looks good to me. A couple small (non-blocking) suggestions from me.

bundle/config/mutator/trampoline.go Show resolved Hide resolved
bundle/config/mutator/trampoline.go Outdated Show resolved Hide resolved
bundle/config/mutator/trampoline.go Outdated Show resolved Hide resolved
@andrewnester andrewnester added this pull request to the merge queue Aug 25, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 25, 2023
bundle/config/mutator/trampoline.go Outdated Show resolved Hide resolved
bundle/config/mutator/trampoline.go Outdated Show resolved Hide resolved
remotePath := path.Join(b.Config.Workspace.FilesPath, filepath.ToSlash(internalDirRel), notebookName)

task.NotebookTask = &jobs.NotebookTask{
NotebookPath: remotePath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on how we expect the wheel args, or task params, to be passed in the template?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reference example, in Python trampoline we do it with sys.argv as you can see here
https://github.com/databricks/cli/pull/635/files#diff-8ddf2564fcd580399d61df5283d0c0c7b614c476e0803f63f9f785b62dc69a04R23

I'd expect it should work similarly for task params

bundle/python/transform_test.go Show resolved Hide resolved
bundle/python/transform.go Show resolved Hide resolved
@andrewnester andrewnester added this pull request to the merge queue Aug 30, 2023
Merged via the queue into main with commit 12368e3 Aug 30, 2023
4 checks passed
@andrewnester andrewnester deleted the transform-wheel-task branch August 30, 2023 12:32
andrewnester added a commit that referenced this pull request Aug 30, 2023
CLI:
 * Fixed --environment flag ([#705](#705)).

Bundles:
 * Support cluster overrides with cluster_key and compute_key ([#696](#696)).
 * Allow referencing local Python wheels without artifacts section defined ([#703](#703)).
 * Correctly identify local paths in libraries section ([#702](#702)).
 * Fixed path joining in FindFilesWithSuffixInPath ([#704](#704)).
 * Added transformation mutator for Python wheel task for them to work on DBR <13.1 ([#635](#635)).

Internal:
 * Add a foundation for built-in templates ([#685](#685)).
@andrewnester andrewnester mentioned this pull request Aug 30, 2023
pietern added a commit that referenced this pull request Aug 30, 2023
github-merge-queue bot pushed a commit that referenced this pull request Aug 30, 2023
## Changes

Fixes issue introduced in #635.

## Tests

Added new unit test to confirm correct behavior.

Manually deployed sample bundle.
andrewnester added a commit that referenced this pull request Aug 30, 2023
Bundles:
 * Support cluster overrides with cluster_key and compute_key ([#696](#696)).
 * Allow referencing local Python wheels without artifacts section defined ([#703](#703)).
 * Fixed --environment flag ([#705](#705)).
 * Correctly identify local paths in libraries section ([#702](#702)).
 * Fixed path joining in FindFilesWithSuffixInPath ([#704](#704)).
 *  Added transformation mutator for Python wheel task for them to work on DBR <13.1 ([#635](#635)).

Internal:
 * Add a foundation for built-in templates ([#685](#685)).
 * Test transform when no Python wheel tasks defined ([#714](#714)).
 * Pin Terraform binary version to 1.5.5 ([#715](#715)).
 * Cleanup after "Add a foundation for built-in templates" ([#707](#707)).
 * Filter down to Python wheel tasks only for trampoline ([#712](#712)).
 * Update Terraform provider schema structs from 1.23.0 ([#713](#713)).
@andrewnester andrewnester mentioned this pull request Aug 30, 2023
github-merge-queue bot pushed a commit that referenced this pull request Aug 30, 2023
Bundles:
* Support cluster overrides with cluster_key and compute_key
([#696](#696)).
* Allow referencing local Python wheels without artifacts section
defined ([#703](#703)).
* Fixed --environment flag
([#705](#705)).
* Correctly identify local paths in libraries section
([#702](#702)).
* Fixed path joining in FindFilesWithSuffixInPath
([#704](#704)).
* Added transformation mutator for Python wheel task for them to work on
DBR <13.1 ([#635](#635)).

Internal:
* Add a foundation for built-in templates
([#685](#685)).
* Test transform when no Python wheel tasks defined
([#714](#714)).
* Pin Terraform binary version to 1.5.5
([#715](#715)).
* Cleanup after "Add a foundation for built-in templates"
([#707](#707)).
* Filter down to Python wheel tasks only for trampoline
([#712](#712)).
* Update Terraform provider schema structs from 1.23.0
([#713](#713)).
lennartkats-db pushed a commit that referenced this pull request Sep 1, 2023
…on DBR <13.1 (#635)

## Changes
***Note: this PR relies on sync.include functionality from here:
#671

Added transformation mutator for Python wheel task for them to work on
DBR <13.1

Using wheels upload to Workspace file system as cluster libraries is not
supported in DBR < 13.1

In order to make Python wheel work correctly on DBR < 13.1 we do the
following:
1. Build and upload python wheel as usual
2. Transform python wheel task into special notebook task which does the
following
a. Installs all necessary wheels with %pip magic
b. Executes defined entry point with all provided parameters
3. Upload this notebook file to workspace file system
4. Deploy transformed job task

This is also beneficial for executing on existing clusters because this
notebook always reinstall wheels so if there are any changes to the
wheel package, they are correctly picked up

## Tests
bundle.yml

```yaml
bundle:
  name: wheel-task

workspace:
  host: ****

resources:
  jobs:
    test_job:
      name: "[${bundle.environment}] My Wheel Job"
      tasks:
        - task_key: TestTask
          existing_cluster_id: "***"
          python_wheel_task:
            package_name: "my_test_code"
            entry_point: "run"
            parameters: ["first argument","first value","second argument","second value"]
          libraries:
          - whl: ./dist/*.whl
```

Output
```
andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job
Run URL: ***

2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS 
Output:
=======
Task TestTask:
Hello from my func
Got arguments v1:
['python', 'first argument', 'first value', 'second argument', 'second value']

```
lennartkats-db pushed a commit that referenced this pull request Sep 1, 2023
## Changes

Fixes issue introduced in #635.

## Tests

Added new unit test to confirm correct behavior.

Manually deployed sample bundle.
lennartkats-db pushed a commit that referenced this pull request Sep 1, 2023
Bundles:
* Support cluster overrides with cluster_key and compute_key
([#696](#696)).
* Allow referencing local Python wheels without artifacts section
defined ([#703](#703)).
* Fixed --environment flag
([#705](#705)).
* Correctly identify local paths in libraries section
([#702](#702)).
* Fixed path joining in FindFilesWithSuffixInPath
([#704](#704)).
* Added transformation mutator for Python wheel task for them to work on
DBR <13.1 ([#635](#635)).

Internal:
* Add a foundation for built-in templates
([#685](#685)).
* Test transform when no Python wheel tasks defined
([#714](#714)).
* Pin Terraform binary version to 1.5.5
([#715](#715)).
* Cleanup after "Add a foundation for built-in templates"
([#707](#707)).
* Filter down to Python wheel tasks only for trampoline
([#712](#712)).
* Update Terraform provider schema structs from 1.23.0
([#713](#713)).
arpitjasa-db pushed a commit to arpitjasa-db/cli that referenced this pull request Sep 7, 2023
…on DBR <13.1 (databricks#635)

## Changes
***Note: this PR relies on sync.include functionality from here:
databricks#671

Added transformation mutator for Python wheel task for them to work on
DBR <13.1

Using wheels upload to Workspace file system as cluster libraries is not
supported in DBR < 13.1

In order to make Python wheel work correctly on DBR < 13.1 we do the
following:
1. Build and upload python wheel as usual
2. Transform python wheel task into special notebook task which does the
following
a. Installs all necessary wheels with %pip magic
b. Executes defined entry point with all provided parameters
3. Upload this notebook file to workspace file system
4. Deploy transformed job task

This is also beneficial for executing on existing clusters because this
notebook always reinstall wheels so if there are any changes to the
wheel package, they are correctly picked up

## Tests
bundle.yml

```yaml
bundle:
  name: wheel-task

workspace:
  host: ****

resources:
  jobs:
    test_job:
      name: "[${bundle.environment}] My Wheel Job"
      tasks:
        - task_key: TestTask
          existing_cluster_id: "***"
          python_wheel_task:
            package_name: "my_test_code"
            entry_point: "run"
            parameters: ["first argument","first value","second argument","second value"]
          libraries:
          - whl: ./dist/*.whl
```

Output
```
andrew.nester@HFW9Y94129 wheel % databricks bundle run test_job
Run URL: ***

2023-08-03 15:58:04 "[default] My Wheel Job" TERMINATED SUCCESS
Output:
=======
Task TestTask:
Hello from my func
Got arguments v1:
['python', 'first argument', 'first value', 'second argument', 'second value']

```

Signed-off-by: Arpit Jasapara <arpit.jasapara@databricks.com>
arpitjasa-db pushed a commit to arpitjasa-db/cli that referenced this pull request Sep 7, 2023
## Changes

Fixes issue introduced in databricks#635.

## Tests

Added new unit test to confirm correct behavior.

Manually deployed sample bundle.

Signed-off-by: Arpit Jasapara <arpit.jasapara@databricks.com>
arpitjasa-db pushed a commit to arpitjasa-db/cli that referenced this pull request Sep 7, 2023
Bundles:
* Support cluster overrides with cluster_key and compute_key
([databricks#696](databricks#696)).
* Allow referencing local Python wheels without artifacts section
defined ([databricks#703](databricks#703)).
* Fixed --environment flag
([databricks#705](databricks#705)).
* Correctly identify local paths in libraries section
([databricks#702](databricks#702)).
* Fixed path joining in FindFilesWithSuffixInPath
([databricks#704](databricks#704)).
* Added transformation mutator for Python wheel task for them to work on
DBR <13.1 ([databricks#635](databricks#635)).

Internal:
* Add a foundation for built-in templates
([databricks#685](databricks#685)).
* Test transform when no Python wheel tasks defined
([databricks#714](databricks#714)).
* Pin Terraform binary version to 1.5.5
([databricks#715](databricks#715)).
* Cleanup after "Add a foundation for built-in templates"
([databricks#707](databricks#707)).
* Filter down to Python wheel tasks only for trampoline
([databricks#712](databricks#712)).
* Update Terraform provider schema structs from 1.23.0
([databricks#713](databricks#713)).

Signed-off-by: Arpit Jasapara <arpit.jasapara@databricks.com>
@stawo
Copy link

stawo commented Sep 19, 2023

@andrewnester , @pietern , I believe this PR introduced some breaking behavior. See #783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants