Traefik Proxy on Docker Swarm Multiple Managers

I have a internet/public facing load balancer which distributes requests to a docker swarm. at the moment the swarm is using traefikv2 only on one of the manager nodes, and load balancer directs all traffic to this node.

To utilise the load balancer to full effect, I would like to run traefikv2 on each of the manager nodes.
Is this possible with open source traefik proxy or is it a feature in EE.

Thanks for feedback

You can use multiple Traefik instances behind a real load balancer without a problem. The only issue is that LetsEncrypt is not straight forward anymore.

1 Like

thanks, do you have more info on the problems with letsencrypt?

You cannot use regular lets encrypt challenge because there is no guarantee that requests from LetsEncrypt server will hit the very same node (traefik instance) which initiated cert request. There will be multiple fails. This can be easily solved using DNS challenge (LetsEncrypt validates challenge with dns records).

There is another "issue". Only one traefik instance is aware of certificate change. Other simply do not react to changes in acme.json file and still serve old certificate (or self signed if there was none). You can restart all traefik instances so after restart all certificates will be reread from file, but this seems not a good solution on production, because restating traefik means downtime and still until restart info between traefik instances is not synchronized.

From what I know, Enterprise Edition solves this problem.

From what I know th

1 Like

The embedded Traefik LetsEncrypt only works for single node, for multi node you need the Enterprise Edition starting at 3000€/year.

There are work arounds, but they need some work. Best solution is probably a separate certbot or lego container with a shared folder for a dynamic certificates file for all Traefik instances. See discussions like this one.

1 Like

I have traefik on 3 swarm managers, and a single lego container floating in the swarm to do the acme foo, with shared files sitting on NFS, and a tls/certificates file to soft update traefik containers on file change.
This works well, but only after the traefik containers have managed to soft update. at first there are often default traefik certs served when other of the traefik containers will be serving the legit le cert files

I had a quick look through the logs and it says it is "unknown authority -> serving default traefik cert"
after a few clicks and all traefik proxies have the certs, then this works rock solid, but at this stage it is unacceptable

do you have this problem in your proof of concept?

Strange, your dynamic configuration and cert files should be there from the beginning, why is there a problem? Is Traefik listening to ports and answering requests before loading the dynamic config?

Do you have certs inline or as separate files? Are the files always there and only updated for new domains or renewed certs?

So far we only use LE for internal stuff, external services all with paid wildcards certs, so haven't done extensive testing with LE.

Swarm is a live environment so certs need to be added after traefik has started up and running.
Traefik watches providers.file.directory=/lego/traefik for an updated certificates.yaml file, which contains seperate cert/keys in the following format:

# Dynamic configuration

tls:
  certificates:
    - certFile: /path/to/domain.cert
      keyFile: /path/to/domain.key
    - certFile: /path/to/other-domain.cert
      keyFile: /path/to/other-domain.key

So far I have only been testing with creating new domains, with a post hook to create the certificates.yaml file such as:

lego --email="you@example.com" --domains="example.com" --http run --run-hook="./myscript.sh"

I am not aware that files can be inline, would you care to share a link as I would like to test this.

See my post about certbot:

Traefik dynamic configuration with certificates inline:

tls:
  options:
    default:
      minVersion: VersionTLS12
  certificates:
    # CERT FILE /etc/letsencrypt/live/example.com
    - certFile: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      keyFile: |-
        -----BEGIN PRIVATE KEY-----
        ...
      -----END PRIVATE KEY-----
    # CERT FILE /etc/letsencrypt/live/example.org
    - certFile: |-
        -----BEGIN CERTIFICATE-----
        ...

thanks for the info, I'll have a look at your link to see what sort of awk wizardry you conjure to create this file

on a side note, I enquired about a feature request for lego, for it to watch the docker socket, so on change it would create certs ala traefik, however at this stage the project has no intention to support docker (which is understandable).

loading certs inline has not changed the behaviour :frowning:

here is the error I get:

time="2023-02-23T08:35:59Z" level=debug msg="Serving default certificate for request: \"pinky.example.com\""
time="2023-02-23T08:35:59Z" level=debug msg="http: TLS handshake error from 10.0.0.3:46772: remote error: tls: unknown certificate"

and this is random, so traefik is aware of the certificate sometimes. I would imagine that 1 or 2 of the 3 traekik containers is not recognizing the change from providers.file.directory=/lego/traefik

usually one or two refreshes resolves the correct certificate

I am running browser/cert checks in an incognito window so no previous requests should be affecting the situation

as this thread is now going off-topic slightly I have created a new thread, which defines a definite problem, rather than talking about the generality of Traefik as a global proxy.

The new thread is here: "providers.file.directory" and NFS

Feel free to help me fix my problem :stuck_out_tongue:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.