Backend DNS resolution is cached forever

Greetings! We're using Traefik within an Azure Service Fabric cluster. SF does internal load balancing through DNS (so internally, http://api resolves to one of several IPs running that service). We're seeing a problem where when Traefik boots up, it resolves "api" to one of 9 different IPs, and then sticks with that IP forever. We have 6 Traefik instances running and 9 API instances running, which means at least 3 API instances will never even get used. It was yesterday we noticed this when all 6 instances picked the same API instance and overloaded it! My question - is there a way to control how long (if at all) traefik will cache DNS resolution? I would think by default it should obey the TTL on the DNS. We are using Traefik 1.7.9 on a Windows Server Core 1809 container.

Hello @mike,

Can you please provide your Traefik configuration? and a sanitized service configuration?

The reason I ask is that Traefik doesn't normally use DNS names for backend communication, normally it uses IPs, and when it does use DNS names for backend communication, it has no caching at all.

Worked with Daniel offline and we've confirmed this isn't anything Traefik or Go is doing, this is something happening on the local OS layer. The suggested course of action is to move to the Service Fabric provider and not use DNS.

Traefik usage is well known to generate a lot of subtle traefik unreleated issues :wink:

2 Likes