I have a setup where Traefik is running behind corporate PLB in Kubernetes, but then it handles all traffic from there.
For a services that is failing, we run 3 instances of it; ms-1, ms-2, ms-3 through Kubernetes deployment.
Traffic is directed to each in round robin.
Sometimes, we observe random 502 on ONE of the nodes and requests are not forwarded to any node. After a while, it auto corrects and connection is restored for the failing one.
level=debug msg="'502 Bad Gateway' caused by: dial tcp x.x.x.x:8100: connect: connection refused"
- Does Traefik retry sending/forward to other replica nodes, if any 1 is failing? Doesn't seem to happen here as the http request returns 502.
- Why would this issue come and what can be fixed?
I read other topics and mostly find suggestions to use the correct port, ip, service name, etc. That doesn't seem the case here as 1 of the 3 node fails, that too sometimes.