PKI: Progressive performance degradation upon CA/issuer rotation #29083

aescaler-raft · 2024-12-03T21:15:06Z

Describe the bug

HA Vault PKI engine with raft storage experiences progressive and permanent performance degradation upon CA/issuer generate (root) or import (root or intermediate) operations.
Separate PKI engines are not impacted (but experience the same progressive and permanent performance degradation).
Performance can be restored by disabling and then re-enabling the engine at the previous endpoint.

Testing parameters:

CA private key (root or intermediate):
- type: internal
- algorithm: RSA or EC
- bits: EC 256, 384, 521; RSA 2048, 4096, 8192
Vault cluster: 3 nodes, raft integrated storage
Platform:
- distro: RKE2 v1.24.15+rke2r1
- storage: Longhorn
Workload:
- type: StatefulSet
- requests/limits: none (unlimited)

This performance issue is not experienced for CA CSR generation.
To me, this indicates that this issue is not tied to the private key, but to the issuer.

Observations:

CPU Usage: Peaks at ~900m on the raft leader pod (vault-0) during operation (25% of the 4 vCPUs available)
Memory Usage: less than 5% of the node's allocated memory across all pods
Disk I/O: negligible (%iowait less than ~0.07 on all pods)
Inter-Pod Latency:
- Ping tests between Vault pods show negligible latency (<1ms)
- Pods are in the same availability zone
OpenSSL Benchmark:
- RSA 4096 key generation on cluster nodes is fast, completing 60 keys in less than 50 seconds
- Indicates no issues with cryptographic library performance or hardware acceleration
AES-NI Availability:
- Verified inside Vault containers using /proc/cpuinfo
- The aes flag is present, confirming support for hardware-accelerated crypto
Entropy
- sudo cat /proc/cpuinfo | grep rand showed rdrand available on all nodes
- sudo cat /proc/sys/kernel/random/entropy_avail showed >3500 on all cluster nodes
- rng-tools package installed to test /dev/random and /dev/urandom
- rngtest -c 10000 </dev/urandom showed >1000x speed over rngtest -c 10000 </dev/random (expected)
- started rngd service with sudo systemctl enable --now rngd, observed significant performance improvement in /dev/random speed with rngtest
- rngtest on /dev/random and /dev/urandom now within one order of magnitude (26s vs 2.6s, respectively)
- no observable impact in Vault PKI keypair generation speed

Tracing:

On a Vault cluster that is experiencing PKI engine performance degradation, I called the trace endpoint /v1/sys/pprof/trace with the parameter seconds=60, after which I called the PKI engine's generate root endpoint.
Analysis under go tool trace <file> showed high latency in the synchronization blocking profile, specifically this goroutine: github.com/hashicorp/raft.(*raftState).goFunc.func1.
Under the synchronization blocking profile page, I noticed the graph showed edges with times greater than 60s and nodes showed 0 of <time> (<percent>%) where <time> and <percent> are non-zero.

To Reproduce

Steps to reproduce the behavior:

Deploy Vault in HA on Kubernetes with Helm chart with raft integrated storage and awskms seal
Run vault secrets enable pki
Run for i in $(seq 1 60); do time curl -k -X POST "https://<vault_url>/v1/pki/root/generate/internal" -H "X-Vault-Token: <vault_token>" --data '{"common_name": "TEST CURL ROOT", "key_type": "rsa", "key_bits": 2048}'; done
Observe increasing delay in responses
Run sleep 600; time curl -k -X POST "https://<vault_url>/v1/pki/root/generate/internal" -H "X-Vault-Token: <vault_token>" --data '{"common_name": "TEST CURL ROOT", "key_type": "rsa", "key_bits": 2048}'
Observe no decrease in delay in responses

Expected behavior

Vault PKI engine performance should not degrade, or at least recover.

Environment

Vault Server Version (retrieve with vault status): 1.18.1 built 2024-10-29T14:21:31Z
Vault CLI Version (retrieve with vault version): Vault v1.18.1 (f479e5c), built 2024-10-29T14:21:31Z
Server Operating System/Architecture: RHEL 8.8 4.18.0-477.15.1.el8_8.x86_64
Platform Architecture: RKE2 v1.24.15+rke2r1 on AWS EC2 (5+ nodes), 3 dedicated C6i.xlarge nodes for Vault

Vault server configuration file(s)

disable_mlock = true
ui = true

listener "tcp" {
  tls_disable = true
  address = "[::]:8200"
  cluster_address = "[::]:8201"
}

storage "raft" {
  path = "/vault/data"
}

seal "awskms" {
  region = "eu-west-2"
  kms_key_id = "<clipped>"
  endpoint     = "http://local-kms:8081"
  access_key   = "dummy"
  secret_key   = "dummy"
}

service_registration "kubernetes" {}

Additional context

Attempted remediations:

Improve single core performance: pinned vault pod to nodes, from t3.xlarge to c3i.xlarge - very minor effect, permanent
Isolate Vault pods: isolated vault workload to dedicated c3i.xlarge nodes - very minor effect, permanent
Increase worker count: vault write sys/mounts/pki/config/tune options=worker_count=4 - no effect
Decrease max lease: vault secrets tune -max-lease-ttl=1h pki - no effect
Tidy PKI engine: vault write pki/tidy tidy_cert_store=true tidy_revoked_certs=true - no effect
Snapshot and import raft log: vault operator raft snapshot save snapshot.snap; vault operator raft snapshot restore snapshot.snap
Enable and disable PKI engine: vault secrets disable pki; vault secrets enable -path=pki pki - greatest effect, non-permanent

The text was updated successfully, but these errors were encountered:

aescaler-raft · 2024-12-03T21:17:40Z

I realize that the example of 60 CA/issuer rotations as quickly as Vault is capable is unrealistic, however the fact that performance never recovers indicates that this will be encountered in the future.

aescaler-raft · 2024-12-03T21:25:56Z

I've also tested this on a fresh EKS cluster with the ebs-csi-provisioner platform storage backend and an actual AWS KMS key, and observed the same effect.

stevendpclark · 2024-12-03T23:53:53Z

Hi @aescaler-raft, thanks for filing the issue.

I could have sworn we had an open issue around this problem already but my search turned up empty. This is a known issue around having many issuers and rebuilding all the CRLs which always happens when a new issuer is created.

This shouldn't be a huge impact on day to day operations if the issuer count is kept low, which we highly recommend doing for various reasons see: https://developer.hashicorp.com/vault/docs/secrets/pki/considerations#one-ca-certificate-one-secrets-engine

I'll keep the issue open for visibility, and as another reminder that we need to make the CRL building smarter, more efficient within the PKI engine.

aescaler-raft · 2024-12-04T17:30:36Z

Hi @stevendpclark,

Thanks for validating my observations, and providing a recommended path forward.
Introducing additional engines/endpoints would dramatically increase the complexity of an app I'm building for a customer, so I'll have a conversation with them and make the recommendation that we contribute to the PKI engine to resolve this issue.
I'll take a look at the contributors of the PKI engine codebase and figure out who will need to review our proposed changes, I'd like to start a dialogue early on in this effort.
Can you tell me if there's anyone else we might need to involve from the HashiCorp side to sign off on this, assuming it becomes a series of architectural decisions?
Do you all have a Slack channel that I (and possibly members of my team) can join?

Regarding the documentation, is there any specific reason why this isn't documented in the page linked?
I'd be happy to submit a PR to do so in the meantime.

stevendpclark · 2024-12-04T19:51:52Z

Hi @aescaler-raft,

I really appreciate the offer to work on this. This won't be a trivial fix to address overall within the PKI engine at this stage. Historically the PKI engine only supported a single issuer, we added multi-issuer support to help with rotating that issuer, it was never meant to have a large number of distinct issuers within the mount itself.

CRL rebuilding on initial CA creation is one item but depending on how many issuers you are talking about the following items might also need to be tackled, off the top of my head (this isn't an exhaustive list)

There's a single config object that stores the mapping of issuer to CRL, this might hit limitations on storage size if too many issuers are active within a mount
OCSP queries iterate over all issuers within the mount to identify the proper issuer which might become an issue with a large list
All revocations are stored in a single folder for all issuers, which will lead to lengthy CRL rebuilds if you have a lot of revocations across many issuers.
Within the various operational paths, if the issuer name and/or key name is used instead of UUIDs we iterate over the list of issuers to identify the issuer requested which if large enough could end up with additional delays and storage hits.

So this will be a pretty significant effort to make the PKI engine do what you want.

Regarding the documentation, is there any specific reason why this isn't documented in the page linked?

I can't think of any particular reason, this should probably be called out though in that same section as another reason we do not recommend running a large number of issuers within an individual mount.

aescaler-raft · 2024-12-04T20:47:40Z

@stevendpclark,

I can certainly resonate with the history of the PKI engine.

In order to meet my customer's requirements, I don't see any other options than:

build an additional service which re-implements features that Vault already has, additionally routes data/operations from/to multiple PKI engines, and persists state in an additional external system

--or--

implement the changes needed directly in Vault

I'm partial to contributing directly to Vault, but this of course assumes appetite from maintainers and contributors such as yourself.

Please advise.

stevendpclark · 2024-12-04T22:12:55Z

I've brought it up internally for discussion, out of curiosity what sort of time frame would you need this for?

aescaler-raft · 2024-12-12T16:14:20Z

@stevendpclark before Q2 of 2025 would be ideal.

stevendpclark added secret/pki reproduced This issue has been reproduced by a Vault engineer labels Dec 3, 2024

stevendpclark changed the title ~~HA Vault (raft storage) progressive performance degradation upon CA/issuer rotation~~ PKI: Progressive performance degradation upon CA/issuer rotation Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PKI: Progressive performance degradation upon CA/issuer rotation #29083

PKI: Progressive performance degradation upon CA/issuer rotation #29083

aescaler-raft commented Dec 3, 2024 •

edited

Loading

aescaler-raft commented Dec 3, 2024

aescaler-raft commented Dec 3, 2024

stevendpclark commented Dec 3, 2024

aescaler-raft commented Dec 4, 2024

stevendpclark commented Dec 4, 2024

aescaler-raft commented Dec 4, 2024 •

edited

Loading

stevendpclark commented Dec 4, 2024 •

edited

Loading

aescaler-raft commented Dec 12, 2024

PKI: Progressive performance degradation upon CA/issuer rotation #29083

PKI: Progressive performance degradation upon CA/issuer rotation #29083

Comments

aescaler-raft commented Dec 3, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Environment

Vault server configuration file(s)

Additional context

aescaler-raft commented Dec 3, 2024

aescaler-raft commented Dec 3, 2024

stevendpclark commented Dec 3, 2024

aescaler-raft commented Dec 4, 2024

stevendpclark commented Dec 4, 2024

aescaler-raft commented Dec 4, 2024 • edited Loading

stevendpclark commented Dec 4, 2024 • edited Loading

aescaler-raft commented Dec 12, 2024

aescaler-raft commented Dec 3, 2024 •

edited

Loading

aescaler-raft commented Dec 4, 2024 •

edited

Loading

stevendpclark commented Dec 4, 2024 •

edited

Loading