I have a internet/public facing load balancer which distributes requests to a docker swarm. at the moment the swarm is using traefikv2 only on one of the manager nodes, and load balancer directs all traffic to this node.
To utilise the load balancer to full effect, I would like to run traefikv2 on each of the manager nodes.
Is this possible with open source traefik proxy or is it a feature in EE.
You can use multiple Traefik instances behind a real load balancer without a problem. The only issue is that LetsEncrypt is not straight forward anymore.
You cannot use regular lets encrypt challenge because there is no guarantee that requests from LetsEncrypt server will hit the very same node (traefik instance) which initiated cert request. There will be multiple fails. This can be easily solved using DNS challenge (LetsEncrypt validates challenge with dns records).
There is another "issue". Only one traefik instance is aware of certificate change. Other simply do not react to changes in acme.json file and still serve old certificate (or self signed if there was none). You can restart all traefik instances so after restart all certificates will be reread from file, but this seems not a good solution on production, because restating traefik means downtime and still until restart info between traefik instances is not synchronized.
From what I know, Enterprise Edition solves this problem.
The embedded Traefik LetsEncrypt only works for single node, for multi node you need the Enterprise Edition starting at 3000€/year.
There are work arounds, but they need some work. Best solution is probably a separate certbot or lego container with a shared folder for a dynamic certificates file for all Traefik instances. See discussions like this one.
I have traefik on 3 swarm managers, and a single lego container floating in the swarm to do the acme foo, with shared files sitting on NFS, and a tls/certificates file to soft update traefik containers on file change.
This works well, but only after the traefik containers have managed to soft update. at first there are often default traefik certs served when other of the traefik containers will be serving the legit le cert files
I had a quick look through the logs and it says it is "unknown authority -> serving default traefik cert"
after a few clicks and all traefik proxies have the certs, then this works rock solid, but at this stage it is unacceptable
do you have this problem in your proof of concept?
Strange, your dynamic configuration and cert files should be there from the beginning, why is there a problem? Is Traefik listening to ports and answering requests before loading the dynamic config?
Do you have certs inline or as separate files? Are the files always there and only updated for new domains or renewed certs?
So far we only use LE for internal stuff, external services all with paid wildcards certs, so haven't done extensive testing with LE.
Swarm is a live environment so certs need to be added after traefik has started up and running.
Traefik watches providers.file.directory=/lego/traefik for an updated certificates.yaml file, which contains seperate cert/keys in the following format:
thanks for the info, I'll have a look at your link to see what sort of awk wizardry you conjure to create this file
on a side note, I enquired about a feature request for lego, for it to watch the docker socket, so on change it would create certs ala traefik, however at this stage the project has no intention to support docker (which is understandable).
and this is random, so traefik is aware of the certificate sometimes. I would imagine that 1 or 2 of the 3 traekik containers is not recognizing the change from providers.file.directory=/lego/traefik
usually one or two refreshes resolves the correct certificate
I am running browser/cert checks in an incognito window so no previous requests should be affecting the situation
as this thread is now going off-topic slightly I have created a new thread, which defines a definite problem, rather than talking about the generality of Traefik as a global proxy.