Huge consul KV update ends in ressource spikes

leva · March 25, 2026, 9:20pm

Last week we had to migrate about 1000 customers in our DB which lead to over 3000 new ressources created in consul kv that were then picked up by traefik.
This lead to enourmous resouce spikes that eventually ended in traefik being forecfully stopped by our EKS cluster.

No we are back to normal, but I am still wondering, what caused this spikes.
I went thousands of log lines, but the best I can do, is guess.
However, I want to understand it and prevent future events like this.

Can anyone point me into a direction?
I searched Github for possible related memory leaks or such in combination with kv, but found nothing relevant.
Was it just to much updates at once for traefik to handle? Or could something like an error with a router cause this spikes?
Because the way we fixed it, was by rollbacking our changes and removing all the newly created routes.

The environment is:

EKS version 1.32
Traefik 3.6.7

Thanks in advance for any hint!
Leo

leva · March 26, 2026, 1:34pm

In the meantime, I created some python scripts to simulate behaviour in an isolated environment. I even tested the churn approach from otlp memory leak · Issue #12232 · traefik/traefik · GitHub.
The result is: While there is a significant spike in resources for CPU, there is a very low spike for memory. And even the CPU spike is far beyond from what we saw in production.
Any ideas or is that something for an GitHub issue?

bluepuma77 · March 27, 2026, 4:00pm

You can reach the devs via Traefik Github for bugs. But you should provide a reproducible example.

ethanbelltr · April 15, 2026, 3:07pm

Really useful breakdown. I like how the discussion covers both the theory and the practical side of things, which is not always the case in these threads.

Topic		Replies	Views
Long connections + large config + many config reloads = high memory usage? Traefik v2 kubernetes-crd	2	846	December 15, 2023
OOMKilled Possible Memory Leak with config Traefik v2 kubernetes-ingress	0	1014	May 23, 2023
11K routers in K8s, high CPU load on pod / service churn, is this expected and am I doing it right? Traefik v2 kubernetes-crd	0	225	March 2, 2024
High load and disk usage crashed the ingress controller Traefik v2 kubernetes-ingress	0	624	May 19, 2023
Traefik 2.4.1 running with high CPU Traefik v2 kubernetes-crd	1	621	March 17, 2021

Huge consul KV update ends in ressource spikes

Related topics