I am running on GKE and using traefik as my only ingress controller into the cluster. I enabled synthetic tests on my APIs a few days ago and every now and then I will get 504 errors that take about 30s to get a response.
From some preliminary digging I have done, it doesn't look like it's traefik that's crashing or the GCP load balancer either. Instead I think it might be a stale traefik record, but I don't know how to further investigate this either.
How does traefik gather DNS records under the hood in order yo forward traffic correctly? Is there a way to force a refetch of said records (say every second)?
Traefik does not use DNS records for anything. Traefik queries the kubernetes API for service endpoints, and directly forwards traffic on to the backend pods.
Can you see if your situation is similar to [Traefik 1.7.12 ingress requests ending with a HTTP 504 gateway timeout]?
Thank you very much for pointing that out, they do appear to be very similar.
I am not sure how to start debugging this so any help with where to look for or what to read would be helpful. I thought this might have been a stale DNS record issue, but since traefik goes straight to the pods, then I don't think this could be it. As an alternative, is there any way to configure traefik to retry that request ?!