Burst of request hung Traefik

radicaled42 · February 11, 2023, 5:48pm

Let me try to explain my problem to see if you have any suggestion.
I'm stress testing testing an internal docker registry.
I'm using Traefik 2.9 as my ingress controller. I have installed it with the helm chart.
I have set up the resources limits to 3000m and 8Gi.
At the moment I'm using letsencrypt-acme, so I'm not able to create replicas.

I'm using a docker image of 1.15Gb to do the stress test.
For the test itself, I have created a Job that pulls the image and do a sleep for 10s. I'm running 50 parallels jobs waiting for 150 completions.
I'm using aws fargate to ensure that every time the pod gets created, it will pull the image.

The problem start with the first burst of pods that get created with the Job. From the 50 at least 40 get an ImagePullBackOff. After a few minutes, the situation start to stabilize and the ImagePullBackOff errors start to clean up.
Sometimes traefik pod dies and gets recreated and there are other times that the traefik pod doesn't release the RAM (use 3.5Gb RAM continuously).

I have some questions:

Is there any way to play with the Ingress buffer?
Is there any way to use replicas and lets encrypt at the same time?

Any suggestion will be very appreciated.

bluepuma77 · February 11, 2023, 7:54pm

Just for my understanding: you are running a private Docker repository behind Traefik as reverse proxy, the job is pulling the 1.15Gb image through Traefik?

LetsEncrypt with multiple Traefik instances should work in kubernetes, tutorials are available.

radicaled42 · February 11, 2023, 11:20pm

You got the idea, I should have added a TLDR.
I've seen similar tutorials but they force you to create a Certificate, in this case you need to create the DNS records manually. But we are using external DNS for that.
I will try to test it on monday.

Thanks

Topic		Replies	Views
Traefik 2.0.4 letsencrypt Traefik v2 docker-swarm , letsencrypt-acme	4	877	December 10, 2019
LE quietly failing on GKE Traefik v2 kubernetes-crd , letsencrypt-acme	0	390	February 6, 2020
[Bounty Offer] Need to get Traefik V2 working on DigitalOcean k8s Traefik v2 docker , kubernetes-crd	4	634	November 6, 2019
Unable to generate a certificate acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during connect Traefik v2 docker-swarm , letsencrypt-acme	0	2861	June 29, 2021
DNS challenge - VinylDNS Traefik v2 kubernetes-crd , kubernetes-ingress , letsencrypt-acme	0	407	April 28, 2021

Burst of request hung Traefik

Related topics