Nested SPIRE Architecture, NestedA workload invoke NestedB worload error in one case. #5317

penghuazhou · 2024-07-20T07:20:16Z

How to occuor:
1、scale up a new Root Server pod, i will generate a new ca.
2、scale up a new NestedB Server pod, i will generate a new intermediate ca.
3、scale up NestedB Agent, should worload svid sign by the new ca.
4、NestedA workload invoke NestedB worload error.

Background knowledge:
1、A new intermediate certificate will be prepared for the intermediate and root certificate when ttl/2. This new intermediate or root certificate will only be activated at ttl/6.
2、When preparing the intermediate certificate, it will ensure that the root certificate is synchronized to the nested server before preparing the intermediate certificate successfully.
3、Spire agent synchronizes the trust certificate every 5 seconds.
4、Spire agent will notify the workload of trust certificate changes every 5 seconds to 8 minutes.

Version: 1.9.6
Platform: linux-amd64
Subsystem: spire-agent、spire-server、DataStore mysql、NodeAttestor k8s_psat、UpstreamAuthority spire、 Notifier k8sbundle

MarcosDY · 2024-07-25T19:10:48Z

Force rotation feature may be able to help you to update the current bundle intermediates inside each nested SPIRE,
this is still under development, you can track the status in force rotation project
Original issue: #1934

penghuazhou · 2024-07-29T02:09:26Z

Force rotation feature may be able to help you to update the current bundle intermediates inside each nested SPIRE, this is still under development, you can track the status in force rotation project Original issue: #1934

@MarcosDY Force rotation feature update the current bundle intermediates inside each nested SPIRE, but alse need several seconds. During which the CA key generated by expanding the root server may have already issued a new nested server intermediate certificate, and the intermediate certificate may have already issued the workload's SVID. If this workload communicates with workloads that have not been synchronized to the bundle in a timely manner, it will cause TLS exceptions.

penghuazhou · 2024-07-29T02:22:26Z

I think we have two solution to solve this problem, What solution will the community plan adopt to solve this problem? I can commit a pr.
1、If scale up spire-server, new spire-server pod can copy ca from old pod to solve this problem. New spire-server rotate ca independent。
2、let spire-server share a ca key. Spire-server which get lock can rocate ca.

sorindumitru · 2024-07-29T13:15:33Z

I think the force rotation API by itself doesn't help, since it looks like you can only tell an existing server instance to prepare or rotate a CA. It would be good to have something (even within the force rotation APIs) that allows preparing a CA for use by a specific server instance at a later time. So you can:

Prepare a CA for server instances, N+1 and N+2
Wait for some amount of time for them to be propagated to all workloads
Start instances N+1 and N+2 and have them use the prepared CA.

Alternatively maybe this could be something that the CLI command which starts the new instances up to when the CA is prepared and activated and then exits. That way you can run that new command as step 1 in the previous sequence.

evan2645 · 2024-07-30T19:27:57Z

Thanks for reporting this @penghuazhou and @sorindumitru for jumping in

We discussed this issue during SPIRE contributor sync today, and the consensus is that liveness and readiness checks in SPIRE should be solving this problem (but don't currently). When a new SPIRE Server boots at the root, readiness check should fail for ~some amount of time to allow the new root to propagate. After it's propagated, the readiness check can succeed and signing can begin.

I think there's a couple gotchas that need to be figured out as part of this work:

Probably don't want to do this if it's the first SPIRE Server being turned on ... under what conditions do we want the behavior, and when do we want to shortcut?
For how long should the readiness check fail? I feel there's no "good" answer because root servers don't have a full picture of bundle propagation and thus the decision will be open-loop

I'll move this issue to the backlog as unscoped ... once we have answers to the above two questions, I think we'll better understand the scope and be ready to accept the change. Thank you for volunteering to work on this @penghuazhou! I will go ahead and assign it to you as well.

penghuazhou · 2024-08-01T02:02:11Z

Thanks，I'm glad to be able to participate.

MarcosDY added the triage/in-progress Issue triage is in progress label Jul 23, 2024

evan2645 added priority/backlog Issue is approved and in the backlog unscoped The issue needs more design or understanding in order for the work to progress and removed triage/in-progress Issue triage is in progress labels Jul 30, 2024

evan2645 assigned penghuazhou Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested SPIRE Architecture, NestedA workload invoke NestedB worload error in one case. #5317

Nested SPIRE Architecture, NestedA workload invoke NestedB worload error in one case. #5317

penghuazhou commented Jul 20, 2024 •

edited

Loading

MarcosDY commented Jul 25, 2024

penghuazhou commented Jul 29, 2024 •

edited

Loading

penghuazhou commented Jul 29, 2024 •

edited

Loading

sorindumitru commented Jul 29, 2024

evan2645 commented Jul 30, 2024

penghuazhou commented Aug 1, 2024

Nested SPIRE Architecture, NestedA workload invoke NestedB worload error in one case. #5317

Nested SPIRE Architecture, NestedA workload invoke NestedB worload error in one case. #5317

Comments

penghuazhou commented Jul 20, 2024 • edited Loading

MarcosDY commented Jul 25, 2024

penghuazhou commented Jul 29, 2024 • edited Loading

penghuazhou commented Jul 29, 2024 • edited Loading

sorindumitru commented Jul 29, 2024

evan2645 commented Jul 30, 2024

penghuazhou commented Aug 1, 2024

penghuazhou commented Jul 20, 2024 •

edited

Loading

penghuazhou commented Jul 29, 2024 •

edited

Loading

penghuazhou commented Jul 29, 2024 •

edited

Loading