Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal-frontend role #3706

Merged
merged 11 commits into from
Jan 11, 2023
Merged

Add internal-frontend role #3706

merged 11 commits into from
Jan 11, 2023

Conversation

dnr
Copy link
Member

@dnr dnr commented Dec 13, 2022

What changed?

  • Add a new server role, internal-frontend
    • This is intended for the worker role to connect to
    • This does what the frontend does, but acts as an "internode" role for the purposes of TLS
    • It also fixes the claim mapper to one that always returns admin permission
  • Change development configs to enable internal-frontend (and remove publicClient hostPort)
  • Integration tests do not use internal-frontend
  • Change docker config template to add internal-frontend service, conditional on new environment variable
  • Also to remove publicClient section if internal-frontend is present or not using claim mapper/authorizer
  • If publicClient.hostPort is not set, frontend connections use the membership resolver, pointing to internal-frontend (using internode TLS) if present, else frontend (using frontend TLS)
  • Remove the temporal-system special case in the default authorizer

Why?
The server worker needs to make connections to the frontend with effective "admin" access (to all namespaces). With pluggable claim mappers and authorizers, we can't control the logic in them, and requiring all users to set up mTLS with special claims is a large burden. Adding internal frontends as an alternative is easier and more reliable.

How did you test it?
Updated unit tests, lots of manual testing with https://github.com/temporalio/samples-server/tree/main/tls/tls-full

Potential risks
Users who are currently using the default authorizer and relying on the temporal-system special case will need to switch their setup to use internal frontends (but this is good, since the special case could be a security hole). Other configurations that I haven't thought of could also break.

Overall, this creates more configuration options, which increases the support burden. Eventually I'd like to eliminate most of them and require internal frontends (with internode TLS if used), to reduce the number of options and make things even simpler than they were before this change, but that can come later.

Is hotfix candidate?

@dnr dnr requested a review from a team as a code owner December 13, 2022 17:29
@@ -561,6 +577,12 @@ func (c *Config) Validate() error {
return err
}

switch c.PublicClient.ForceTLSConfig {
case ForceTLSConfigDefault, ForceTLSConfigInternode, ForceTLSConfigFrontend:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is default (empty string) mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means use "internode" if there is an internal-frontend section, otherwise use "frontend". Maybe I should name it ForceTLSConfigAuto?

I really don't think anyone should ever have to use this, it's just for completeness.

if hasIFE && cfg.PublicClient.HostPort == "" {
forceTLS = config.ForceTLSConfigInternode
} else {
forceTLS = config.ForceTLSConfigFrontend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if hasIFE && cfg.PublicClient.HostPort != "", then it will use public frontend config?
It means if publicClient.hostPort is configured it will be highest priority, and it will use public frontend, the internal-frontend will be ignored. Shall we error out if internal-frontend is configured with publicClient.hostPort?
We need clear documentation about this behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I considered making that combination fail validation but thought maybe we should trust the user. But maybe failing on that would avoid some user error.

I'm trying to think of any reason you might want to have an internal-frontend config section present but also have an explicit hostPort... maybe someone doesn't like the membership-based resolver and wants to use k8s dns or something like that?

I think I added && cfg.PublicClient.HostPort == "" because I considered the case where someone explicitly put membership://frontend as HostPort. But on second thought that config doesn't really make sense either. Yeah, I'll simplify this.

// Frontend controls SDK Client to Frontend communication TLS settings.
// Frontend controls frontend server TLS settings. To control system worker -> frontend
// TLS, use the SystemWorker field. (Frontend.Client is accepted for backwards
// compatibility.)
Frontend GroupTLS `yaml:"frontend"`
// SystemWorker controls TLS setting for System Workers connecting to Frontend.
SystemWorker WorkerTLS `yaml:"systemWorker"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have inner node frontend, do we still need the system worker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a bunch more commits on a branch where I ripped out SystemWorker, GetFrontendClientConfig, and everything that touches those. Total diff stats: 11 files changed, 57 insertions(+), 365 deletions(-). And we could clean up even more. I'd like to push that through some day. However, in the interest of a smooth 1.20 release, I agree with Yimin that we should be backwards compatible for now, i.e. any existing config should not change behavior, we should only add new options.

I will note that this addition of SystemWorker (as a more flexible alternative to Frontend.Client) never made it to the docs, or docker/config_template.yaml. So probably only a very small number of users are using it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think we should wait, and deprecate-remove SystemWorker at a later point.

service/frontend/fx.go Show resolved Hide resolved
// we should remove "temporal-system" from here. Handling of call with
// no namespace will need to be performed at the API level, so that data would
// be filtered based of caller's permissions to namespaces and system.
if target.Namespace == "temporal-system" || target.Namespace == "" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for reviewers: I'm confident that this handles the issues around temporal-system. But this default authorizer also allowed all empty-namespace calls through. Now, those will require claims.System >= RoleReader. This makes logical sense: System is the field for stuff that applies outside of namespaces. The question is might this break anyone's expectations (in a way that's hard to fix)?

// Frontend controls SDK Client to Frontend communication TLS settings.
// Frontend controls frontend server TLS settings. To control system worker -> frontend
// TLS, use the SystemWorker field. (Frontend.Client is accepted for backwards
// compatibility.)
Frontend GroupTLS `yaml:"frontend"`
// SystemWorker controls TLS setting for System Workers connecting to Frontend.
SystemWorker WorkerTLS `yaml:"systemWorker"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a bunch more commits on a branch where I ripped out SystemWorker, GetFrontendClientConfig, and everything that touches those. Total diff stats: 11 files changed, 57 insertions(+), 365 deletions(-). And we could clean up even more. I'd like to push that through some day. However, in the interest of a smooth 1.20 release, I agree with Yimin that we should be backwards compatible for now, i.e. any existing config should not change behavior, we should only add new options.

I will note that this addition of SystemWorker (as a more flexible alternative to Frontend.Client) never made it to the docs, or docker/config_template.yaml. So probably only a very small number of users are using it.

@@ -561,6 +577,12 @@ func (c *Config) Validate() error {
return err
}

switch c.PublicClient.ForceTLSConfig {
case ForceTLSConfigDefault, ForceTLSConfigInternode, ForceTLSConfigFrontend:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means use "internode" if there is an internal-frontend section, otherwise use "frontend". Maybe I should name it ForceTLSConfigAuto?

I really don't think anyone should ever have to use this, it's just for completeness.

if hasIFE && cfg.PublicClient.HostPort == "" {
forceTLS = config.ForceTLSConfigInternode
} else {
forceTLS = config.ForceTLSConfigFrontend
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I considered making that combination fail validation but thought maybe we should trust the user. But maybe failing on that would avoid some user error.

I'm trying to think of any reason you might want to have an internal-frontend config section present but also have an explicit hostPort... maybe someone doesn't like the membership-based resolver and wants to use k8s dns or something like that?

I think I added && cfg.PublicClient.HostPort == "" because I considered the case where someone explicitly put membership://frontend as HostPort. But on second thought that config doesn't really make sense either. Yeah, I'll simplify this.

), nil
}

func getFrontendConnectionDetails(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: I'm pretty unhappy about the existence of this function: tlsConfigProvider already has a whole bunch of switching logic, and now I'm adding another layer of switching on top of that. It would seem better to integrate this logic into tlsConfigProvider.

But the problem is that this needs the whole config (for PublicClient and Services) and tlsConfigProvider only gets the TLS section. Also, this deals with grpc endpoints and TLS config in a linked way, where tlsConfigProvider deals only with TLS. So it makes some sense to keep it separate.

@@ -357,10 +366,14 @@ namespaceDefaults:
state: "disabled"
URI: "file:///tmp/temporal_vis_archival/development"

{{- if or (.Env.USE_INTERNAL_FRONTEND) (and (not .Env.TEMPORAL_AUTH_AUTHORIZER) (not .Env.TEMPORAL_AUTH_CLAIM_MAPPER)) }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for reviewers: is this logic sufficient? The idea is that if you're using internal-frontend, then you just use membership to find that and you're good. If using external frontend only, but you have no authorizer and claim mapper (no security or maybe using an external validating proxy or something), then you can use membership to find frontend and it'll work (even with TLS).

Otherwise you're on your own and you can set PUBLIC_FRONTEND_ADDRESS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks good to me.

Copy link
Member

@yiminc yiminc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// Membership resolver (new in 1.18): Leave this empty, and other nodes will use the
// membership service resolver to find the frontend.
// TODO: remove this and always use membership resolver
// PublicClient is the config for internal nodes (history/matching/worker) connecting to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, does history/matching ever need to connect to frontend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, not.

Copy link
Contributor

@yux0 yux0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But it would be good to let @sergeybykov take a look.

// Membership resolver (new in 1.18): Leave this empty, and other nodes will use the
// membership service resolver to find the frontend.
// TODO: remove this and always use membership resolver
// PublicClient is the config for internal nodes (history/matching/worker) connecting to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, not.

@@ -357,10 +366,14 @@ namespaceDefaults:
state: "disabled"
URI: "file:///tmp/temporal_vis_archival/development"

{{- if or (.Env.USE_INTERNAL_FRONTEND) (and (not .Env.TEMPORAL_AUTH_AUTHORIZER) (not .Env.TEMPORAL_AUTH_CLAIM_MAPPER)) }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks good to me.

@dnr dnr merged commit 1a9695e into temporalio:master Jan 11, 2023
@dnr dnr deleted the feport2 branch January 11, 2023 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants