Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: support Root CA rotation on server side #13307

Closed
wants to merge 1 commit into from

Conversation

yishuT
Copy link
Contributor

@yishuT yishuT commented Aug 28, 2021

  • In TlsConfig creation, use GetConfigForClient instead of GetCertificate, so that we can load the latest CA.
  • Add a go routine to refresh tls config from given cert/key paths, such reload loop can be enabled through flags.
  • Change TlsInfo to pointer receiver to avoid unnecessary copy
  • Fix all TlsInfo pass by values because sync.Once cannot be copied. Provide TlsInfo.Clone to copy TlsInfo
  • Unit tests and integration tests

Fixes #11555

@yishuT yishuT changed the title client: support root CA rotation on server side client: support CA rotation on server side Sep 13, 2021
@yishuT yishuT marked this pull request as ready for review September 13, 2021 17:19
@yishuT
Copy link
Contributor Author

yishuT commented Nov 1, 2021

Not sure about if we still need e2e tests. In the integration test, server start and client conn with new certs is already tested.

@yishuT yishuT changed the title client: support CA rotation on server side client: support Root CA rotation on server side Nov 1, 2021
@serathius
Copy link
Member

Can you please cleanup commits? There are over 22 commits (most of them called "fix ..."), could you squash them and maybe rename them so they could be used for review?

@yishuT yishuT force-pushed the reload-server-config branch 2 times, most recently from df8f6b6 to 34b794d Compare November 8, 2021 00:30
@yishuT
Copy link
Contributor Author

yishuT commented Jan 6, 2022

@serathius ping on this. Happy NY

@yishuT
Copy link
Contributor Author

yishuT commented Apr 7, 2022

ping again.

I think we need this feature anyway as root ca rotation is one of the fundamental operations.

cc @serathius bcz you helped review this PR before. Please review it or help me find the person to review. Thanks

@serathius
Copy link
Member

Sorry for delayed response, It's in my review queue. Can you fix the conflicts and rebase the change?

@yishuT yishuT marked this pull request as draft April 11, 2022 05:12
@yishuT
Copy link
Contributor Author

yishuT commented Apr 11, 2022

change to Draft until I fix the broken tests

@yishuT yishuT force-pushed the reload-server-config branch 2 times, most recently from 1d39e1a to 8aeb985 Compare April 17, 2022 14:01
@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2022

Codecov Report

Merging #13307 (8aeb985) into main (3dce380) will increase coverage by 0.48%.
The diff coverage is 76.00%.

@@            Coverage Diff             @@
##             main   #13307      +/-   ##
==========================================
+ Coverage   72.58%   73.06%   +0.48%     
==========================================
  Files         469      469              
  Lines       38389    39178     +789     
==========================================
+ Hits        27864    28627     +763     
- Misses       8754     8757       +3     
- Partials     1771     1794      +23     
Flag Coverage Δ
all 73.06% <76.00%> (+0.48%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
etcdctl/ctlv2/command/util.go 3.93% <0.00%> (ø)
server/config/config.go 79.76% <ø> (ø)
server/etcdmain/util.go 6.00% <0.00%> (ø)
server/etcdserver/api/rafthttp/transport.go 85.71% <ø> (ø)
server/etcdserver/api/v2discovery/discovery.go 63.31% <0.00%> (ø)
server/embed/etcd.go 74.95% <40.00%> (ø)
pkg/proxy/server.go 61.08% <100.00%> (+0.06%) ⬆️
server/embed/config.go 73.71% <100.00%> (+0.13%) ⬆️
server/embed/serve.go 63.01% <100.00%> (ø)
server/etcdmain/config.go 85.36% <100.00%> (+0.10%) ⬆️
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3dce380...8aeb985. Read the comment docs.

@yishuT yishuT marked this pull request as ready for review April 17, 2022 19:51
@yishuT
Copy link
Contributor Author

yishuT commented Apr 17, 2022

Finally make tests passed. I think the one failed is unrelated to my change

@@ -0,0 +1,233 @@
package integration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is major feature, we should have both integration and e2e tests. Please take a look into https://github.com/etcd-io/etcd/issues/13637.on how to write one code that with run both.

@serathius
Copy link
Member

Hey @yishuT,
Great work on the server rotation and test. Would love to have this change in v3.6, however I might not have enough time to properly review this PR. I will be busy addressing the recent data inconsistency https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md, so it might take me couple of weeks before I get back to reviewing this one. Etcd needs to prioritize stability over new features now.

Sorry that I'm unable to this is PR attention it deserves.

@yishuT
Copy link
Contributor Author

yishuT commented Apr 25, 2022

Hey, that's totally fine. Thanks for the review.

@yishuT
Copy link
Contributor Author

yishuT commented Apr 25, 2022

@serathius qq, to test root CA rotation, I need to create clients with tls config signed by different root CAs. The current Cluster interface in test framework does not allow me to do that. Therefore, I might need to make change to the interface. Do you prefer it to be in this PR which already changes many pieces or in a separate PR?

@stale
Copy link

stale bot commented Jul 30, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 30, 2022
@yishuT
Copy link
Contributor Author

yishuT commented Jul 30, 2022

Not stale

@serathius
Copy link
Member

cc @spzala @ahrtr to help review

@ahrtr
Copy link
Member

ahrtr commented Aug 8, 2022

Thanks @yishuT for the PR and sorry for the late response.

I will take a look after you resolve the conflict.

@stale stale bot removed the stale label Aug 8, 2022
@yishuT
Copy link
Contributor Author

yishuT commented Aug 12, 2022

thanks. Will try to get to this PR by end of next week

Signed-off-by: Yi-Shu Tai <echo80313@gmail.com>
@yishuT
Copy link
Contributor Author

yishuT commented Aug 31, 2022

@ahrtr please review

@ahrtr
Copy link
Member

ahrtr commented Aug 31, 2022

Sorry for the late response, I am working on #14370 right now. This PR will be a priority when I finish that one.

I do not get time to review this PR so far, but 1K+ lines of code change seems too big to me. Please consider to break the PR if possible. Will have a closer look later.

@yishuT
Copy link
Contributor Author

yishuT commented Aug 31, 2022

Sorry for the late response, I am working on #14370 right now. This PR will be a priority when I finish that one.

I do not get time to review this PR so far, but 1K+ lines of code change seems too big to me. Please consider to break the PR if possible. Will have a closer look later.

this PR touches the TlsInfo, so I need to make change everywhere...

@yishuT
Copy link
Contributor Author

yishuT commented Oct 29, 2022

@ahrtr I have a proposal to break this PR into

  1. TlsInfo -> *TlsInfo so that it's not copied everywhere
  2. Introduce the root CA rotation logic

sg?

@yishuT
Copy link
Contributor Author

yishuT commented Jan 17, 2023

@ahrtr hey, I would like to finish this PR. As discussed before, this PR is too big to review. I am going to split this pr into couple parts.

  1. Make TlsInfo -> *TlsInfo. The reason is that to reload root ca, we need to have a background goroutine.
  2. do the real root ca rotation.

wdyt?

@stale
Copy link

stale bot commented May 21, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label May 21, 2023
@ahrtr ahrtr added stage/tracked and removed stale labels May 21, 2023
@hongbin
Copy link

hongbin commented Jun 6, 2023

Hi,

I saw this PR has been opened for two years without significant progress. I wonder if anything I can help to move forward this PR.

From my side, I am from Azure Kubernetes Service team. We do have requirement to rotate etcd CA in zero-downtime. This PR seems to be able to solve the problem. If the community needs additional help to merge this PR, please don't hesitate to ping me.

@yishuT
Copy link
Contributor Author

yishuT commented Jun 6, 2023

I can resume the work if the community can review the PR. I didn't receive any response earlier if they like my proposal. maybe @ahrtr ?

@hongbin
Copy link

hongbin commented Aug 18, 2023

We can split the PR into two as @ahrtr mentioned. @yishuT you need help with that?

@yishuT
Copy link
Contributor Author

yishuT commented Aug 22, 2023

We can split the PR into two as @ahrtr mentioned. @yishuT you need help with that?

I don't have time to work on it recently. Feel free to take it over

@hongbin
Copy link

hongbin commented Aug 28, 2023

Fixes #11555

Thanks. After an analysis on this PR, I think we can split this PR in this way:

  • Provide implementation for "GetConfigForClient". This will allow server CA rotation to work in term of functionality.
  • Optimize the loading of CA files. In particular, avoid loading the CA files repeatedly on each request.

I am working on the first part: #16500

@yishuT yishuT closed this Mar 24, 2024
@aauren
Copy link

aauren commented Mar 24, 2024

@yishuT

Why did this issue get closed? The problem still exists and all of the PRs that were created are still outstanding waiting for maintainer review.

@yishuT
Copy link
Contributor Author

yishuT commented Mar 24, 2024

@yishuT

Why did this issue get closed? The problem still exists and all of the PRs that were created are still outstanding waiting for maintainer review.

This pr is pending review for yrs. also I don't have time to work on it recently while there is another pr trying to solve the same problem so I closed this pr

@aauren
Copy link

aauren commented Mar 24, 2024

My bad, I thought this was the primary issue tracking the problem not a PR. I should have paid more attention.

Thanks for trying to make this better for all of us. I wish that it had been accepted back when you had opened it.

@hongbin
Copy link

hongbin commented Mar 25, 2024

It is unfortunate that this feature didn't make any progress for years. It is quite frustrating to see the slowness on the code reviewing process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

ETCD doesn't automatically load changes to ca bundles for peer-trusted-ca-file or trusted-ca-file
6 participants