Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bundle run fails for non unity-catalog workspace #1000

Closed
Eulenator opened this issue Nov 21, 2023 · 6 comments · Fixed by #1037
Closed

bundle run fails for non unity-catalog workspace #1000

Eulenator opened this issue Nov 21, 2023 · 6 comments · Fixed by #1037
Assignees
Labels
DABs DABs related issues Enhancement New feature or request Question Further information is requested

Comments

@Eulenator
Copy link

I have deployed my pipelines with databricks bundle deploy, this worked without problems.

Now when i want to trigger a pipeline databricks bundle run then it fails with the following error message:

Library installation failed for library due to user error. Error messages: Library from /Workspace is not allowed on non Unity Catalog cluster. Please switch to DBR 13.1+ Shared cluster or 13.2+ Assigned cluster to use /Workspace libraries.

My problem is that we do not plan to use unity catalog soon.
So is there an option I have missed to use databricks asset bundles with non unity catalog workspaces?

@Eulenator Eulenator added the DABs DABs related issues label Nov 21, 2023
@andrewnester andrewnester added the Question Further information is requested label Nov 21, 2023
@andrewnester
Copy link
Contributor

@Eulenator what type of resources are you deploying and trying to run?

@Eulenator
Copy link
Author

Eulenator commented Nov 21, 2023

@andrewnester i have a spark-python project with the necessary entry points for my data pipelines. I build a wheel file and use a deployment.yaml (under /resources) like this to push it with asset bundles to databricks:

build:
  no_build: true
environments:
  dev:
    strict_path_adjustment_policy: true
    workflows:
    - access_control_list:
      - permission_level: CAN_MANAGE
        user_name: service-principal://some-sp
      job_clusters:
      - job_cluster_key: default
        new_cluster:
          custom_tags:
            ResourceClass: SingleNode
          node_type_id: Standard_F4s
          num_workers: 0
          spark_conf:
            spark.databricks.cluster.profile: singleNode
            spark.master: local[*, 4]
          spark_version: 13.3.x-scala2.12
      name: Ingest
      tasks:
      - job_cluster_key: default
        python_wheel_task:
          entry_point: ingest
          package_name: demo_data_pipelines
        task_key: ingest_landing
    - access_control_list:
      - permission_level: CAN_MANAGE
        user_name: service-principal://some-sp

My databricks.yaml for the bundle then looks like this:

bundle:
  name: demo_data_pipelines

include:
  - resources/*.yml

targets:
  dev:
    default: true
    workspace:
      host: DB_HOST

I already experimented with root_path or artifact_path to change the path to a dbfs one. So the cluster can access the files without unity catalog. But no luck so far.

@andrewnester
Copy link
Contributor

@Eulenator thanks! Since you're using Python wheel tasks, you can enable the following

experimental:
  python_wheel_wrapper: true

It will allow to install wheel tasks on non UC clusters.
You can see more details here
#635
#797

@Eulenator
Copy link
Author

Thanks @andrewnester this fixes one problem.

The other problem is that we want to trigger the pipeline with the jobs API with dynamic --python-params like this:

databricks jobs run-now $JOB_ID --json '{ "job_id": $JOB_ID, "python_params": [ "--conf-file", "/some/path/conf.yaml", "--dynamic-file-path", "/dynamic/file/path/test.parquet"] }'

But it looks like that the wrapping notebook to install the libraries overwrites the sys.argv arguments with the ones defined in the deployment.yaml:

import sys
sys.argv = ["..."]

Is there a chance to run the job with the cli client and still configure dynamic params?

@andrewnester
Copy link
Contributor

Sorry for the delay in the reply. At the moment it's not possible to do so, but with when this PR lands #1037 you could use databricks bundle run <JOB_KEY> --python-params command which will work the same with or without python_wheel_wrapper on

@andrewnester andrewnester added Enhancement New feature or request and removed Response Requested labels Nov 30, 2023
github-merge-queue bot pushed a commit that referenced this issue Dec 1, 2023
…heel_wrapper` is true (#1037)

## Changes
It makes the behaviour consistent with or without `python_wheel_wrapper`
on when job is run with `--python-params` flag.

In `python_wheel_wrapper` mode it converts dynamic `python_params` in a
dynamic specially named `notebook_param` and the wrapper reads them with
`dbutils` and pass to `sys.argv`

Fixes #1000

## Tests
Added an integration test.

Integration tests pass.
@andrewnester
Copy link
Contributor

The fix has been merged and will be released in the upcoming release next week. In the meantime you can try out snapshot version which already contains the fix https://github.com/databricks/cli/releases/tag/snapshot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DABs DABs related issues Enhancement New feature or request Question Further information is requested
Projects
None yet
2 participants