"providers.file.directory" and NFS

I have traefik on 3 swarm managers, and a single lego container floating in the swarm to do the acme foo, with shared files/certificates sitting on NFS. Each manager has the certifcates volume mounted.

Swarm is a live environment so certs need to be added after traefik has started up and running.

Traefik watches providers.file.directory=/lego/traefik for an updated certificates.yaml file, which contains seperate cert/keys in the following format:

# Dynamic configuration

tls:
  certificates:
    - certFile: /path/to/domain.cert
      keyFile: /path/to/domain.key
    - certFile: /path/to/other-domain.cert
      keyFile: /path/to/other-domain.key

I have also tried inline certificates but the behavior is still the same.

This works well, but only after the traefik containers have managed to soft update. at first there are often default traefik certs served when other of the traefik containers will be serving the legit le cert files. Something tells me that all the Traefik containers are not noticing the update to the shared dynamic configuration that exists in the /lego/traefik directroy.

from the logs, here is the error I get:

time="2023-02-23T08:35:59Z" level=debug msg="Serving default certificate for request: \"pinky.example.com\""
time="2023-02-23T08:35:59Z" level=debug msg="http: TLS handshake error from 10.0.0.3:46772: remote error: tls: unknown certificate"

and this is random, so traefik is aware of the certificate sometimes. I would imagine that 1 or 2 of the 3 traefik containers is not recognizing the change from providers.file.directory=/lego/traefik

I am running browser/cert checks in an incognito window so no previous requests should be affecting the situation

If I hard reset traefik then certificates are served without fail.

So my question :).

What is the nature of the providers.file.directory, is this compatible with NFS?

OK, I see this issue for fsnotify which is used in provider.file.directory: File Notification not working in NFS storage · Issue #306 · fsnotify/fsnotify · GitHub

Therefore, I will explore systemd.path and update any success.

Have you mounted NFS on the host or directly with a Docker volume driver?

it is external NFS server, ie not defined in Docker Compose files. There is also not a possibility to change this.

IIRC, you are polling a http file for certs.

A hard reset of traefik, everything works, without fail, so this points to my theory of the other two traefik proxies not picking up the file changes on a soft reload (the other 2 actually not soft reloading).

Not sure if it makes a difference, but asking again:
Have you mounted NFS on the host/server and use a folder bind to the container or did you directly mount it with a Docker volume driver for NFS into the container?

we have an NFS mount on /var/lib/docker/volumes. Docker does not know anything about the underlying file system for this and sees it as a regular directory.

I appreciate your interest in this, and also your initial post which explained the whole concept of the global traefik proxy.

If I touch /var/lib/lego/traefik/certificates.yaml (which is the NFS mount/file for the traefik/lego containers) on each node then so far it looks like all traefik proxies soft reload and I no longer have any problems with default certificates.

so if I run systemd.path on the nodes and watch a file (not certificates.yaml as to avoid an infinite loop) I can touch /var/lib/lego/traefik/certificates.yaml and everything should be good

I will also make use of /var/run/docker.sock:ro and get nodename from container running lego, so as not to apply twice on this node.

Some people state they can enable inotify on NFS by disabling cache. Also check Stackoverflow.

I have gone with the Traefik HTTP Documentation - Traefik provider as it is inherent problem with inotify not picking up changes to the NFS file system.

From my understanding, setting the flags as mentioned in the Stackoverflow thread would not resolve this problem, as the default times are betweein 30-60 seconds. In this case I would have had a working system after this period of time, however this was not the case.

thanks for keeping this place alive :stuck_out_tongue:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.