Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple Azure Key Vault instances as fallback #1433

Open
JorTurFer opened this issue Jan 22, 2024 · 6 comments
Open

Support multiple Azure Key Vault instances as fallback #1433

JorTurFer opened this issue Jan 22, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@JorTurFer
Copy link

Describe the solution you'd like
Yesterday there was an issue in Azure Key Vault service in west europe (probably a maintenance or so, because ALL our vaults were affected, doesn't matter the subscription). The health monitors show something like:
image

Although the service issue isn't reponsibility of this driver, having a plan B to mitigate this would have been nice. In theory, Azure Key Vault is transparently replicated in the paired region with automatic failover in read-only mode, but it didn't happen.

We use multiple regions to be resilient to region failures but currently the secrets-store-csi is a single point of failure as it doesn't support any type of fallback at any level.

Given that, I'd like to propose extending current behavior to support other Azure Key Vaults as failover if the primary instance fails.

Current configuration looks like:

parameters:
    keyvaultName: ......
    tenantId: ......
    useVMManagedIdentity: 'true'
    userAssignedIdentityID: .....
    objects: |
      array:
        - |
          objectName: ...
          objectType: secret

and it could be easily extended with an array of fallback Key Vaults (or just once 🤷 )

parameters:
    keyvaultName: ......
    tenantId: ......
    userAssignedIdentityID: .....
    fallback:
    -  keyvaultName: ......
       tenantId: ......
       userAssignedIdentityID: .....
    objects: |
      array:
        - |
          objectName: ...
          objectType: secret

This approach would improve the resiliency of the component, just doing a fallback to other Azure Key Vault instances if there is any error on the primary instance without disruption the service.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

As csi volumes doesn't support being optional, problems related with the upstream will block pods scheduling (with a chance of huge impact in productive environments if this happens during high load peaks). I've reviewed csi-secret-store documentation and I've not found anything to handle these scenarios, but maybe I've missed something.

Environment:

  • Secrets Store CSI Driver version: (use the image tag): v1.4.0
  • Azure Key Vault provider version: (use the image tag): v1.5..0
  • Kubernetes version: (use kubectl version): 1.27
  • Cluster type: (e.g. AKS, aks-engine, etc): AKS
@JorTurFer JorTurFer added the enhancement New feature or request label Jan 22, 2024
@enj
Copy link
Member

enj commented Jan 22, 2024

Linking the slack thread here for future reference.

@enj
Copy link
Member

enj commented Jan 22, 2024

Writing down recommendations from the slack thread:

  • Use the upcoming secret sync controller to have full offline support (i.e. tolerate any form of failure, not just one AKV region having an issue)
  • Write a generic provider that can multiplex across N providers (which could be different types altogether)
  • Work with sig-storage folks to expand CSI drivers to support optional volume mounts so that N providers can be used to provide the same secrets

@JorTurFer
Copy link
Author

This is the slack thread in sig-storage.

I'd like to respectfuly say, all the options seem as: "do it from your side or go to another place".
It is a fact that the component isn't resilient to any kind of disruption, which can be a no-go for productive scenarios.

Not storing the secrets in k8s API is the main reason for using the csi. Storing the secrets in k8s API instead of a fallback/failover/justanothercall it's already managed by other 3rd parties and it's quite less secure than using CSI.

@JorTurFer
Copy link
Author

JorTurFer commented Jan 25, 2024

Hi again ✋ !
I've presented the topic in the SIG-Storage meeting (Jan 25th) where there's been a SIG lead and the conclusion from the SIG is that it's the CSI-drive (this component) who has to handle the failures and high availability features as making the volume optional, only moves the problem from the k8s layer to the application layer.

Is it now something open to discuss or doing it by myself the only option that's left?

Currently, the component isn't resilient to Azure Key Vault failures and it's a single point of failure indeed, which is a problem at least for us (and that's why we are willing to contribute with this)

@JorTurFer
Copy link
Author

Hello @enj !
Is there any update realted with this?

@JorTurFer
Copy link
Author

JorTurFer commented Mar 24, 2024

Hello @enj !
Have you had an opportunity to see this by chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants