Hi,
I have traefik v2 deployed via helm chart, with ingress setup to route traffic to my service. I've noticed that during load tests my service becomes unresponsive, almost as if traefik is holding connections. Traefik throws these logs indicating my service isn't available.
time="2023-12-18T19:54:30Z" level=error msg="Skipping service: no endpoints found" providerName=kubernetes ingress=my-app-domain.com namespace=iana serviceName=my-service servicePort="&ServiceBackendPort{Name:,Number:8080,}"
time="2023-12-18T19:54:40Z" level=error msg="Skipping service: no endpoints found" namespace=iana servicePort="&ServiceBackendPort{Name:,Number:8080,}" serviceName=my-service providerName=kubernetes ingress=my-app-domain.com
time="2023-12-18T20:06:30Z" level=error msg="Skipping service: no endpoints found" serviceName=my-service servicePort="&ServiceBackendPort{Name:,Number:8080,}" providerName=kubernetes ingress=my-app-domain.com namespace=iana
time="2023-12-18T20:06:30Z" level=error msg="Skipping service: no endpoints found" ingress=my-app-domain.com namespace=iana serviceName=my-service servicePort="&ServiceBackendPort{Name:,Number:8080,}" providerName=kubernetes
When I check resources, neither traefik or my service have any resource limitations. Traefik is running as two nodes (with other pods running on the same node)
my-service pods
I can see that the services pods are going in and out of warning state with Readiness probe failed: Get "http://10.100.48.20:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
and when I just check to see the endpoint myself, the curl requests are taking sometimes up to 30 seconds
➜ for i in {1..100}; do curl -o /dev/null -s https://my-app-domain.com/health -w "%{time_starttransfer}\n"; done
1.990471
0.722385
12.936152
8.418795
7.588562
2.960248
3.471391
1.830841
2.197170
3.050710
3.646034
4.062799
8.711729
15.350525
0.833441
0.557816
14.358396
0.233381
0.733933
0.808577
8.264995
4.577606
Can anyone provide some insight into possible configuration to route out this issue?