Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add allow list for resources when bundle run_as is set #1233

Merged
merged 28 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9adc6e6
wip
shreyas-goenka Feb 5, 2024
39c1f08
Merge remote-tracking branch 'origin' into fix-run-as
shreyas-goenka Feb 22, 2024
76fc145
Merge remote-tracking branch 'origin' into fix-run-as
shreyas-goenka Feb 23, 2024
b2450c1
Merge remote-tracking branch 'origin' into fix-run-as
shreyas-goenka Feb 28, 2024
4da90f2
wip
shreyas-goenka Mar 4, 2024
05c9d38
further along the way
shreyas-goenka Mar 7, 2024
66f288d
Merge remote-tracking branch 'origin' into fix-run-as
shreyas-goenka Mar 7, 2024
335d0c9
add test for when both sp and user are defined
shreyas-goenka Mar 7, 2024
776af33
fix test
shreyas-goenka Mar 7, 2024
20a0274
final cleanup
shreyas-goenka Mar 7, 2024
5886a65
fix test
shreyas-goenka Mar 7, 2024
f2d24fc
fix windows tests
shreyas-goenka Mar 7, 2024
61f08cc
Merge remote-tracking branch 'origin' into fix-run-as
shreyas-goenka Mar 11, 2024
a4cc92d
rename vars
shreyas-goenka Mar 11, 2024
e7416ef
use typed configuration instead
shreyas-goenka Mar 13, 2024
aafd767
fix tests
shreyas-goenka Mar 13, 2024
9d6949a
cleanup
shreyas-goenka Mar 13, 2024
fab62f3
add error for when neither are specified
shreyas-goenka Mar 13, 2024
89c86c8
lint
shreyas-goenka Mar 13, 2024
dcdb87f
-
shreyas-goenka Mar 13, 2024
9c2bfc6
comment refine
shreyas-goenka Mar 25, 2024
a83453b
Merge remote-tracking branch 'origin' into fix-run-as
shreyas-goenka Mar 25, 2024
e1e5a36
merge diagnostics changes
shreyas-goenka Mar 25, 2024
98f8be2
more diagnostics
shreyas-goenka Mar 25, 2024
d28ce06
Merge remote-tracking branch 'origin' into fix-run-as
shreyas-goenka Mar 27, 2024
9e9e18a
TryLocation -> GetLocation
shreyas-goenka Mar 27, 2024
dcaf247
address comments about comments
shreyas-goenka Mar 27, 2024
a50117e
remove branching for windows assertions
shreyas-goenka Mar 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 116 additions & 25 deletions bundle/config/mutator/run_as.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,23 @@ package mutator

import (
"context"
"fmt"
"slices"
"strings"

"github.com/databricks/cli/bundle"
"github.com/databricks/cli/bundle/config/resources"
"github.com/databricks/cli/libs/dyn"
"github.com/databricks/databricks-sdk-go/service/jobs"
)

type setRunAs struct {
}

// SetRunAs mutator is used to go over defined resources such as Jobs and DLT Pipelines
// And set correct execution identity ("run_as" for a job or "is_owner" permission for DLT)
// if top-level "run-as" section is defined in the configuration.
// This mutator does two things:
// 1. It sets the run_as field for jobs to the value of the run_as field in the bundle.
// 2. Validates the types of resources used in the bundle. Not all Databricks resource
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
// types are supported when the current deployment identity is different from
// the bundle's run_as identity.
func SetRunAs() bundle.Mutator {
return &setRunAs{}
}
Expand All @@ -23,12 +27,120 @@ func (m *setRunAs) Name() string {
return "SetRunAs"
}

// Resources that satisfy one of the following conditions:
// 1. Allow to set run_as for the resources to a different user from the current
// deployment user. For example, jobs.
// 2. Does not make sense for these resources to run_as a different user. We do not
// have plans to add platform side support for `run_as` for these resources.
// For example, experiments or registered models.
var allowListForRunAsOther = []string{"jobs", "models", "registered_models", "experiments"}
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved

// Resources that do not allow setting a run_as identity to a different user but
// have plans to add platform side support for `run_as` for these resources at
// some point in the future. For example, pipelines, model serving endpoints or lakeview dashboards.
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
//
// We expect the allow list and the deny list to form mutually exclusive and exhaustive
// sets of all resource types that are supported by DABs.
var denyListForRunAsOther = []string{"pipelines", "model_serving_endpoints"}
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved

type errorUnsupportedResourceTypeForRunAs struct {
resourceType string
resourceValue dyn.Value
currentUser string
runAsUser string
}

// TODO(6 March 2024): This error message is big. We should split this once
// diag.Diagnostics is ready.
// TODO(6 March 2024): Link the docs page describing run_as semantics in the error below
// once the page is ready.
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
func (e errorUnsupportedResourceTypeForRunAs) Error() string {
return fmt.Sprintf("%s are not supported when the current deployment user is different from the bundle's run_as identity. Please deploy as the run_as identity. List of supported resources: [%s]. Location of the unsupported resource: %s. Current identity: %s. Run as identity: %s", e.resourceType, strings.Join(allowListForRunAsOther, ", "), e.resourceValue.Location(), e.currentUser, e.runAsUser)
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
}
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved

func getRunAsIdentity(runAs dyn.Value) (string, error) {
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
// Get service principal name and user name from run_as section
runAsSp, err := dyn.Get(runAs, "service_principal_name")
if err != nil && !dyn.IsNoSuchKeyError(err) {
return "", err
}
runAsUser, err := dyn.Get(runAs, "user_name")
if err != nil && !dyn.IsNoSuchKeyError(err) {
return "", err
}

sp, spIsDefined := runAsSp.AsString()
user, userIsDefined := runAsUser.AsString()

switch {
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
case spIsDefined && userIsDefined:
return "", fmt.Errorf("run_as section must specify exactly one identity. A service_principal_name %q is specified at %s. A user_name %q is defined at %s", sp, runAsSp.Location(), user, runAsUser.Location())
case spIsDefined:
return sp, nil
case userIsDefined:
return user, nil
default:
return "", nil
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
}
}

func (m *setRunAs) Apply(_ context.Context, b *bundle.Bundle) error {
// Return early if run_as is not defined in the bundle
runAs := b.Config.RunAs
if runAs == nil {
return nil
}

currentUser := b.Config.Workspace.CurrentUser.UserName

// Assert the run_as configuration is valid in the context of the bundle
err := b.Config.Mutate(func(v dyn.Value) (dyn.Value, error) {
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
// Get run_as from the bundle
runAs, err := dyn.Get(v, "run_as")
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved

// If run_as is not defined in the bundle, return early
if dyn.IsNoSuchKeyError(err) {
return v, nil
}
if err != nil {
return dyn.InvalidValue, err
}

runAsIdentity, err := getRunAsIdentity(runAs)
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
return dyn.InvalidValue, err
}

// If run_as is the same as the current user, return early. All resource
// types are allowed in this case.
if runAsIdentity == currentUser {
return v, nil
}

rv, err := dyn.Get(v, "resources")
if err != nil {
return dyn.NilValue, err
}

r := rv.MustMap()
for k, v := range r {
// If the resource type is not in the allow list, return an error
if !slices.Contains(allowListForRunAsOther, k) {
return dyn.InvalidValue, errorUnsupportedResourceTypeForRunAs{
resourceType: k,
resourceValue: v,
currentUser: currentUser,
runAsUser: runAsIdentity,
}
}
}
return v, nil
})
if err != nil {
return err
}

// Set run_as for jobs
for i := range b.Config.Resources.Jobs {
job := b.Config.Resources.Jobs[i]
if job.RunAs != nil {
Expand All @@ -40,26 +152,5 @@ func (m *setRunAs) Apply(_ context.Context, b *bundle.Bundle) error {
}
}

me := b.Config.Workspace.CurrentUser.UserName
// If user deploying the bundle and the one defined in run_as are the same
// Do not add IS_OWNER permission. Current user is implied to be an owner in this case.
// Otherwise, it will fail due to this bug https://github.com/databricks/terraform-provider-databricks/issues/2407
if runAs.UserName == me || runAs.ServicePrincipalName == me {
return nil
}

for i := range b.Config.Resources.Pipelines {
pipeline := b.Config.Resources.Pipelines[i]
pipeline.Permissions = slices.DeleteFunc(pipeline.Permissions, func(p resources.Permission) bool {
return (runAs.ServicePrincipalName != "" && p.ServicePrincipalName == runAs.ServicePrincipalName) ||
(runAs.UserName != "" && p.UserName == runAs.UserName)
})
pipeline.Permissions = append(pipeline.Permissions, resources.Permission{
Level: "IS_OWNER",
ServicePrincipalName: runAs.ServicePrincipalName,
UserName: runAs.UserName,
})
}

return nil
}
46 changes: 46 additions & 0 deletions bundle/config/mutator/run_as_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
package mutator

import (
"slices"
"testing"

"github.com/databricks/cli/bundle/config"
"github.com/databricks/cli/libs/dyn"
"github.com/databricks/cli/libs/dyn/convert"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"golang.org/x/exp/maps"
)

// Every new resource type added to DABs should either be classified in the allow
// list or the deny list for run_as other support.
func TestAllResourceTypesAreClassifiedForRunAs(t *testing.T) {
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
// Compute supported resource types based on the `Resources{}` struct.
r := config.Resources{}
rv, err := convert.FromTyped(r, dyn.NilValue)
require.NoError(t, err)
normalized, _ := convert.Normalize(r, rv, convert.IncludeMissingFields)
resourceTypes := maps.Keys(normalized.MustMap())
slices.Sort(resourceTypes)

// Assert that all resource types are classified in either the allow list or the deny list.
for _, resourceType := range resourceTypes {
if slices.Contains(allowListForRunAsOther, resourceType) && !slices.Contains(denyListForRunAsOther, resourceType) {
continue
}
if !slices.Contains(allowListForRunAsOther, resourceType) && slices.Contains(denyListForRunAsOther, resourceType) {
continue
}
if slices.Contains(allowListForRunAsOther, resourceType) && slices.Contains(denyListForRunAsOther, resourceType) {
t.Errorf("Resource type %s is classified in both allow list and deny list for run_as other support", resourceType)
}
if !slices.Contains(allowListForRunAsOther, resourceType) && !slices.Contains(denyListForRunAsOther, resourceType) {
t.Errorf("Resource type %s is not classified in either allow list or deny list for run_as other support", resourceType)
}
}

// Assert the total list of resource supported, as a sanity check that using
// the dyn library gives us the correct list of all resources supported. Please
// also update this check when adding a new resource
assert.Equal(t, []string{"experiments", "jobs", "model_serving_endpoints", "models", "pipelines", "registered_models"}, resourceTypes)
shreyas-goenka marked this conversation as resolved.
Show resolved Hide resolved
}
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,6 @@ targets:
user_name: "my_user_name"

resources:
pipelines:
nyc_taxi_pipeline:
name: "nyc taxi loader"

permissions:
- level: CAN_VIEW
service_principal_name: my_service_principal
- level: CAN_VIEW
user_name: my_user_name

libraries:
- notebook:
path: ./dlt/nyc_taxi_loader

jobs:
job_one:
name: Job One
Expand Down Expand Up @@ -52,3 +38,15 @@ resources:
- task_key: "task_three"
notebook_task:
notebook_path: "./test.py"

models:
model_one:
name: "skynet"

registered_models:
model_two:
name: "skynet (in UC)"

experiments:
experiment_one:
name: "experiment_one"
17 changes: 17 additions & 0 deletions bundle/tests/run_as/not_allowed/both_sp_and_user/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
bundle:
name: "run_as"

# This is not allowed because both service_principal_name and user_name are set
run_as:
service_principal_name: "my_service_principal"
user_name: "my_user_name"

resources:
jobs:
job_one:
name: Job One

tasks:
- task_key: "task_one"
notebook_task:
notebook_path: "./test.py"
14 changes: 14 additions & 0 deletions bundle/tests/run_as/not_allowed/model_serving/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
bundle:
name: "run_as"

run_as:
service_principal_name: "my_service_principal"

targets:
development:
run_as:
user_name: "my_user_name"

resources:
model_serving_endpoints:
name: "skynet"
25 changes: 25 additions & 0 deletions bundle/tests/run_as/not_allowed/pipelines/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
bundle:
name: "run_as"

run_as:
service_principal_name: "my_service_principal"

targets:
development:
run_as:
user_name: "my_user_name"

resources:
pipelines:
nyc_taxi_pipeline:
name: "nyc taxi loader"

permissions:
- level: CAN_VIEW
service_principal_name: my_service_principal
- level: CAN_VIEW
user_name: my_user_name

libraries:
- notebook:
path: ./dlt/nyc_taxi_loader
Loading
Loading