-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add internal-frontend role #3706
Changes from 5 commits
74accc9
ada4193
a983dcf
38945af
9e7be85
f5c0269
45397a6
c47b2d2
b095e98
83b7cbf
bdca41b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ package config | |
|
||
import ( | ||
"bytes" | ||
"fmt" | ||
"strings" | ||
"time" | ||
|
||
|
@@ -38,6 +39,7 @@ import ( | |
"go.temporal.io/server/common/masker" | ||
"go.temporal.io/server/common/metrics" | ||
"go.temporal.io/server/common/persistence/visibility/store/elasticsearch/client" | ||
"go.temporal.io/server/common/primitives" | ||
"go.temporal.io/server/common/telemetry" | ||
) | ||
|
||
|
@@ -111,9 +113,12 @@ type ( | |
|
||
// RootTLS contains all TLS settings for the Temporal server | ||
RootTLS struct { | ||
// Internode controls backend service communication TLS settings. | ||
// Internode controls backend service (history, matching, internal-frontend) | ||
// communication TLS settings. | ||
Internode GroupTLS `yaml:"internode"` | ||
// Frontend controls SDK Client to Frontend communication TLS settings. | ||
// Frontend controls frontend server TLS settings. To control system worker -> frontend | ||
// TLS, use the SystemWorker field. (Frontend.Client is accepted for backwards | ||
// compatibility.) | ||
Frontend GroupTLS `yaml:"frontend"` | ||
// SystemWorker controls TLS setting for System Workers connecting to Frontend. | ||
SystemWorker WorkerTLS `yaml:"systemWorker"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we have inner node frontend, do we still need the system worker? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made a bunch more commits on a branch where I ripped out I will note that this addition of SystemWorker (as a more flexible alternative to Frontend.Client) never made it to the docs, or docker/config_template.yaml. So probably only a very small number of users are using it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 I think we should wait, and deprecate-remove SystemWorker at a later point. |
||
|
@@ -481,16 +486,30 @@ type ( | |
S3ForcePathStyle bool `yaml:"s3ForcePathStyle"` | ||
} | ||
|
||
// PublicClient is config for internal nodes (history/matching/worker) connecting to | ||
// temporal frontend. There are two methods of connecting: | ||
// Explicit endpoint: Supply a host:port to connect to. This can resolve to multiple IPs, | ||
// or a single IP that is a load-balancer. | ||
// Membership resolver (new in 1.18): Leave this empty, and other nodes will use the | ||
// membership service resolver to find the frontend. | ||
// TODO: remove this and always use membership resolver | ||
// PublicClient is the config for internal nodes (history/matching/worker) connecting to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently, does history/matching ever need to connect to frontend? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe, not. |
||
// frontend. There are three methods of connecting: | ||
// 1. Use membership to locate "internal-frontend" and connect to them using the Internode | ||
// TLS config (which can be "no TLS"). This is recommended for deployments that use an | ||
// Authorizer and ClaimMapper. To use this, leave this section out of your config, and | ||
// make sure there is an "internal-frontend" section in Services. | ||
// 2. Use membership to locate "frontend" and connect to them using the Frontend TLS config | ||
// (which can be "no TLS"). This is recommended for deployments that don't use an | ||
// Authorizer or ClaimMapper, or have implemented a custom ClaimMapper that correctly | ||
// identifies the system worker using mTLS and assigns it an Admin-level claim. | ||
// To use this, leave this section out of your config and make sure there is _no_ | ||
// "internal-frontend" section in Services. | ||
// 3. Connect to an explicit endpoint using the SystemWorker (falling back to Frontend) TLS | ||
// config (which can be "no TLS"). You can use this if you want to force frontend | ||
// connections to go through an external load balancer. If you use this with a | ||
// ClaimMapper+Authorizer, you need to ensure that your ClaimMapper assigns Admin | ||
// claims to worker nodes, and your Authorizer correctly handles those claims. | ||
PublicClient struct { | ||
// HostPort is the host port to connect on. Host can be DNS name | ||
// HostPort is the host port to connect on. Host can be DNS name. See the above | ||
// comment: in many situations you can leave this empty. | ||
HostPort string `yaml:"hostPort"` | ||
// Force selection of either the "internode" or "frontend" TLS configs for these | ||
// connections (only those two strings are valid). | ||
ForceTLSConfig string `yaml:"forceTLSConfig"` | ||
} | ||
|
||
// NamespaceDefaults is the default config for each namespace | ||
|
@@ -551,6 +570,12 @@ const ( | |
ClusterMDStoreName DataStoreName = "ClusterMDStore" | ||
) | ||
|
||
const ( | ||
ForceTLSConfigAuto = "" | ||
ForceTLSConfigInternode = "internode" | ||
ForceTLSConfigFrontend = "frontend" | ||
) | ||
|
||
// Validate validates this config | ||
func (c *Config) Validate() error { | ||
if err := c.Persistence.Validate(); err != nil { | ||
|
@@ -561,6 +586,17 @@ func (c *Config) Validate() error { | |
return err | ||
} | ||
|
||
_, hasIFE := c.Services[string(primitives.InternalFrontendService)] | ||
if hasIFE && (c.PublicClient.HostPort != "" || c.PublicClient.ForceTLSConfig != "") { | ||
return fmt.Errorf("when using internal-frontend, publicClient must be empty") | ||
} | ||
|
||
switch c.PublicClient.ForceTLSConfig { | ||
case ForceTLSConfigAuto, ForceTLSConfigInternode, ForceTLSConfigFrontend: | ||
default: | ||
return fmt.Errorf("invalid value for publicClient.forceTLSConfig: %q", c.PublicClient.ForceTLSConfig) | ||
} | ||
|
||
return nil | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1305,4 +1305,5 @@ const ( | |
History | ||
Matching | ||
Worker | ||
InternalFrontend | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ package resource | |
|
||
import ( | ||
"context" | ||
"crypto/tls" | ||
"fmt" | ||
"net" | ||
"os" | ||
|
@@ -93,7 +94,6 @@ type ( | |
// See LifetimeHooksModule for detail | ||
var Module = fx.Options( | ||
persistenceClient.Module, | ||
fx.Provide(SnTaggedLoggerProvider), | ||
fx.Provide(HostNameProvider), | ||
fx.Provide(TimeSourceProvider), | ||
cluster.MetadataLifetimeHooksModule, | ||
|
@@ -136,7 +136,7 @@ var DefaultOptions = fx.Options( | |
fx.Provide(DCRedirectionPolicyProvider), | ||
) | ||
|
||
func SnTaggedLoggerProvider(logger log.Logger, sn primitives.ServiceName) log.SnTaggedLogger { | ||
func DefaultSnTaggedLoggerProvider(logger log.Logger, sn primitives.ServiceName) log.SnTaggedLogger { | ||
return log.With(logger, tag.Service(sn)) | ||
} | ||
|
||
|
@@ -395,19 +395,13 @@ func SdkClientFactoryProvider( | |
logger log.SnTaggedLogger, | ||
resolver membership.GRPCResolver, | ||
) (sdk.ClientFactory, error) { | ||
tlsFrontendConfig, err := tlsConfigProvider.GetFrontendClientConfig() | ||
frontendURL, frontendTLSConfig, err := getFrontendConnectionDetails(cfg, tlsConfigProvider, resolver) | ||
if err != nil { | ||
return nil, fmt.Errorf("unable to load frontend TLS configuration: %w", err) | ||
} | ||
|
||
hostPort := cfg.PublicClient.HostPort | ||
if hostPort == "" { | ||
hostPort = resolver.MakeURL(primitives.FrontendService) | ||
return nil, err | ||
} | ||
|
||
return sdk.NewClientFactory( | ||
hostPort, | ||
tlsFrontendConfig, | ||
frontendURL, | ||
frontendTLSConfig, | ||
metricsHandler, | ||
logger, | ||
), nil | ||
|
@@ -426,24 +420,70 @@ func RPCFactoryProvider( | |
svcName primitives.ServiceName, | ||
logger log.Logger, | ||
tlsConfigProvider encryption.TLSConfigProvider, | ||
dc *dynamicconfig.Collection, | ||
resolver membership.GRPCResolver, | ||
traceInterceptor telemetry.ClientTraceInterceptor, | ||
) common.RPCFactory { | ||
) (common.RPCFactory, error) { | ||
svcCfg := cfg.Services[string(svcName)] | ||
hostPort := cfg.PublicClient.HostPort | ||
if hostPort == "" { | ||
hostPort = resolver.MakeURL(primitives.FrontendService) | ||
frontendURL, frontendTLSConfig, err := getFrontendConnectionDetails(cfg, tlsConfigProvider, resolver) | ||
if err != nil { | ||
return nil, err | ||
} | ||
return rpc.NewFactory( | ||
&svcCfg.RPC, | ||
svcName, | ||
logger, | ||
tlsConfigProvider, | ||
dc, | ||
hostPort, | ||
frontendURL, | ||
frontendTLSConfig, | ||
[]grpc.UnaryClientInterceptor{ | ||
grpc.UnaryClientInterceptor(traceInterceptor), | ||
}, | ||
) | ||
), nil | ||
} | ||
|
||
func getFrontendConnectionDetails( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note for reviewers: I'm pretty unhappy about the existence of this function: tlsConfigProvider already has a whole bunch of switching logic, and now I'm adding another layer of switching on top of that. It would seem better to integrate this logic into tlsConfigProvider. But the problem is that this needs the whole config (for PublicClient and Services) and tlsConfigProvider only gets the TLS section. Also, this deals with grpc endpoints and TLS config in a linked way, where tlsConfigProvider deals only with TLS. So it makes some sense to keep it separate. |
||
cfg *config.Config, | ||
tlsConfigProvider encryption.TLSConfigProvider, | ||
resolver membership.GRPCResolver, | ||
) (string, *tls.Config, error) { | ||
// To simplify the static config, we switch default values based on whether the config | ||
// defines an "internal-frontend" service. The default for TLS config can be overridden | ||
// with publicClient.forceTLSConfig, and the default for hostPort can be overridden by | ||
// explicitly setting hostPort to "membership://internal-frontend" or | ||
// "membership://frontend". | ||
_, hasIFE := cfg.Services[string(primitives.InternalFrontendService)] | ||
|
||
forceTLS := cfg.PublicClient.ForceTLSConfig | ||
if forceTLS == config.ForceTLSConfigAuto { | ||
if hasIFE { | ||
forceTLS = config.ForceTLSConfigInternode | ||
} else { | ||
forceTLS = config.ForceTLSConfigFrontend | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if hasIFE && cfg.PublicClient.HostPort != "", then it will use public frontend config? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. I considered making that combination fail validation but thought maybe we should trust the user. But maybe failing on that would avoid some user error. I'm trying to think of any reason you might want to have an internal-frontend config section present but also have an explicit hostPort... maybe someone doesn't like the membership-based resolver and wants to use k8s dns or something like that? I think I added |
||
} | ||
} | ||
|
||
var frontendTLSConfig *tls.Config | ||
var err error | ||
switch forceTLS { | ||
case config.ForceTLSConfigInternode: | ||
frontendTLSConfig, err = tlsConfigProvider.GetInternodeClientConfig() | ||
case config.ForceTLSConfigFrontend: | ||
frontendTLSConfig, err = tlsConfigProvider.GetFrontendClientConfig() | ||
default: | ||
err = fmt.Errorf("invalid forceTLSConfig") | ||
} | ||
if err != nil { | ||
return "", nil, fmt.Errorf("unable to load TLS configuration: %w", err) | ||
} | ||
|
||
frontendURL := cfg.PublicClient.HostPort | ||
if frontendURL == "" { | ||
if hasIFE { | ||
frontendURL = resolver.MakeURL(primitives.InternalFrontendService) | ||
} else { | ||
frontendURL = resolver.MakeURL(primitives.FrontendService) | ||
} | ||
} | ||
|
||
return frontendURL, frontendTLSConfig, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for reviewers: I'm confident that this handles the issues around
temporal-system
. But this default authorizer also allowed all empty-namespace calls through. Now, those will requireclaims.System >= RoleReader
. This makes logical sense:System
is the field for stuff that applies outside of namespaces. The question is might this break anyone's expectations (in a way that's hard to fix)?