Update Traefik without service disruption?

I have Traefik running on a VPS with a number of services set up through the dynamic configuration file. The domain for each service points to the IP address of the Traefik server. Everything works perfectly.

When it’s time to update the server running Traefik (or the version of Traefik itself), what’s the best way to go about it without impacting the services?

I can mirror the entire setup to a second VPS, but then how do I manage temporarily “pointing” the services to the other server without updating the DNS entries for each and every service?

I’m just having a hard time understanding how to accomplish this. Thanks for any info or insights you can provide.

We run a loadbalancer in front of our Traefik nodes, so we can update one after another.

If you need to switch DNS and multiple domain names, then I would recommend to point your sub-domains to a single CNAME domain, so you only need to change the IP once.

Note that DNS changes propagate slowly and also have some retention time due to caching, so plan overlapping (running two Traefik) of at least 1 hour after both changes.

Thanks for the reply, @bluepuma77.

But what's the load balancer running on? Doesn't it have to be updated occasionally as well? How do you manage that?

I don't have any sub-domains. They're all "apex" domains (or whatever the proper term is). Can a bare domain be pointed to another domain, with the "master" domain pointing to Traefik? I don't have much experience in this realm, and I'm not quite sure where to start researching.

If I could somehow temporarily swap the IP addresses of two different VPS servers, that would work; but I have no idea if that's even possible (or feasible).

LB update is managed by provider, they will probably change the IP over to a different one ("virtual IP" or "failover IP").

You can switch (virtual) IPs automatically with tools like keepalived.

You can also use CNAME for example.com, but it seems not best practice.

1 Like

Something else just occurred to me...

What if I added the IP address of a second VPS server to the DNS records of the various service domains - i.e. have two "A" records for each domain, each record pointing to a different IP address?

Would that mean the first one listed is always tried first? And would it result in the second IP address being used "automagically" if the server at the first address is unavailable (completely offline)?

I know I've had to enter multiple IP addresses for certain hosting providers but don't fully understand the rationale behind it or how it works "behind the scenes".

Can anyone speak to this setup?

Ok, I think ChatGPT answered this for me. It won't work as I was hoping it might...

So still researching.

A default http client will pick a random A record from DNS to connect, but will not try another if it fails.

Thanks again. I’m now looking into a load balancing provider. Seems like the way to go.

Or a VIP (virtual IP), which you can switch from one server to another.

Thanks. Sounds like that could work as well, but given the added benefits of the LB service I'm looking into - such as DDoS protection, statistics, failover, very reasonable price, etc. - I'm gonna give that a go. Gotta get my 2nd server set up first though.

Note that Traefik doesn’t easily work with clustered LetsEncrypt, so running multiple Traefik instances in parallel with LetsEncrypt.

Hmm, I hadn’t considered that at all. Thanks for bringing it to my attention.

How do you manage it? If I kept the Let’s Encrypt directories in sync between the two servers and only one server was ever available at any given time, would that work?

The challenge is the challenge :laughing:

When using httpChallenge or tlsChallenge the LetsEncrypt server will try to connect to check a token, and it usually hits the wrong Traefik server, the one that hasn't initiated the cert creation.

You can try with dnsChallenge, you might end up with two different certs on each Traefik instance. But in general that should not be a problem.

So far we used purchased wildcard certs, looking into the topic for a long time, see certbot POC :slight_smile:

Running only a single instance is of course a possibility. Need to keep acme.json in sync.

1 Like

You might try using cloudflare's origin masking to handle instant dns changes. They have an api so you could write a small bash script or web interface to rsync & carry out the bulk change. If you're on AWS EC2 you can mount your services volume to the failover instance while it's mounted to the live instance so you wouldn't even have to rsync. That can also be coded.

1 Like