Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lease cache not working for PKI engine #238

Open
dnlopes opened this issue Jan 25, 2024 · 6 comments
Open

Lease cache not working for PKI engine #238

dnlopes opened this issue Jan 25, 2024 · 6 comments

Comments

@dnlopes
Copy link

dnlopes commented Jan 25, 2024

Hello,

I have been struggling to integrate the PKI engine with csi-driver using Vault.

For context, this is what I have on my test setup:

  • csi-driver polling interval set to 1m and auto-rotation enabled
  • a MySQL secrets engine and a role with lease time of 5m
  • a PKI engine and a role with lease time of 1h

For the MySQL use case everything works smoothly: my pod initializes, fetches credentials, and every minute, the csi-driver-vault replies with the cached reply, and does not request new credentials from Vault. Close to the final of the lease duration, the csi-driver actually requests a new credentials from Vault and its refreshed inside the pod.

For my PKI use case, I'm not able to make this work. Every minute, the csi-driver ignores the lease of the certificate and always fetches a new one from Vault.

See the logs below:

MySQL logs

2024-01-25T12:00:14.800Z [INFO]  agent.apiproxy: received request: method=GET path=/v1/mysql/creds/god
2024-01-25T12:00:14.800Z [DEBUG] agent.cache.leasecache: returning cached response: path=/v1/mysql/creds/god
2024-01-25T12:00:59.849Z [DEBUG] agent.cache.leasecache: secret renewed: path=/v1/mysql/creds/god
2024-01-25T12:01:14.801Z [INFO]  agent.apiproxy: received request: method=GET path=/v1/mysql/creds/god
2024-01-25T12:01:14.802Z [DEBUG] agent.cache.leasecache: returning cached response: path=/v1/mysql/creds/god

PKI logs

2024-01-25T11:48:23.567Z [INFO]  agent.apiproxy: received request: method=POST path=/v1/nats/issue/david
2024-01-25T11:48:23.568Z [DEBUG] agent.cache.leasecache: forwarding request from cache: method=POST path=/v1/nats/issue/david
2024-01-25T11:48:23.568Z [INFO]  agent.apiproxy: forwarding request to Vault: method=POST path=/v1/nats/issue/david
2024-01-25T11:48:23.568Z [DEBUG] agent.apiproxy.client: performing request: method=POST url=http://vault-server.vault-server.svc.cluster.local:8200/v1/nats/issue/david
2024-01-25T11:48:27.188Z [DEBUG] agent.cache.leasecache: pass-through response; secret not renewable: method=POST path=/v1/nats/issue/david

The log entry that catches my attention is this one: pass-through response; secret not renewable. Indeed, the PKI secret is not renewable, but looking at the documentation here I was expecting a new certificate to be fetched only when close to 85% of the lease duration.

I have gona through a lot of issues around this topic (#90, #82, #202), but I can't figure out what's wrong in my setup.

Can you help?

Thanks.

@VioletHynes
Copy link

Hey there! We chatted about this internally and we think this might be a bug. It might also be related to this issue. You're right, as the docs say, this should be the expected behaviour:

If a secret or token is non-renewable but leased, Vault Agent will fetch the secret when 85% of the secrets time-to-live (TTL) is reached

To help our investigation, could you let us know if you're using pkiCert or secret to render the certificate? Likewise, it might be that changing from one to the other fixes this issue as a workaround, so I'd love to hear if that does change the behaviour in any meaningful way.

@VioletHynes
Copy link

I realize I might have gotten my wires crossed here -- are you using Vault Agent's templating to retrieve the PKI cert? You linked the templating docs so I might have made an assumption that you were.

@dnlopes
Copy link
Author

dnlopes commented Jan 27, 2024

Thanks for the quick reply @VioletHynes. I'm actually using the csi-driver for Vault with the SecretProviderClass K8s resource. My link to the documentation might have sent you in the wrong direction, sorry about that.

Still, the behavior I have with MySQL credentials aligns with the documentation; a MySQL secret is NOT being generated every csi-driver poll, but only when it's reaching the TTL.

Also, the reason I thought that the csi-driver behavior would match the one from vault-agent is because of this comment, from 2021. That comment links to #90, and that was when I started going through the rabbit hole of old issues and such.

In the end, what I understood is that in the meantime (sometime between 2021 and today), the caching mechanism was added on the csi-driver as well.

PS: I was able to setup a configuration with PKI using the vault-agent injector without problems. In that setup, the PKI certificate is only refreshed close to the TTL, just like expected.

@VioletHynes
Copy link

Hey there! Apologies, it does seem like I got a bit mixed up. The behaviour you're experiencing is expected behaviour.

The root of why is this doesn't work is that this is going through Vault Agent's caching, which today does cache renewable, leased secrets, but does not cache non-renewable leased secrets.

The cache may support these kinds of secrets in the future, but it does not today. It only supports caching secrets it can renew.

@UXabre
Copy link

UXabre commented Apr 2, 2024

So, do I understand correctly that a certificate will always be renewed, regardless? For instance, if we look at the current renew time in our kubernetes cluster, it is set (and I think this is the default when enabled but not specified) to two minutes. Meaning, every two minute a new certificate will be created, regardless of the lifetime of the "current" certificate? If this is true, it is a pity :-( as I wanted to use this CSI with Stakater/reloader and restart our deployment/statefulset when the certificate is about to expire. Now it restarts our pods every two minutes which is ...a bit too fast for my taste. The other secrets (e.g. password for database, kv secrets, ...) seem to behave as expected. Is there nothing that can be done about this? Or some work-around to make the vault CSI driver run more like the expected route?

I've also tried cert-manager btw, but there they don't really want to expose the CA chain in a meaningful way, meaning that, for mTLS scenarios, this won't work either.

Or, does anybody know if I can do a hybrid? Meaning: cert-manager for certificates and CSI for fetching CA chains (hopefully without restarting every two minutes but only if a change has detected?)

@harveyxia
Copy link

Any updates from the project maintainers on the likelihood of this issue being prioritized? We wish to continue using Vault CSI for our Kubernetes workloads but this issue incurs such load on our Vault servers that it potentially poses a non-starter scalability issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants