Traefik on docker - not turning back on after a host restart

I'm running Traefik as a docker instance, and it works great... until I need to restart the docker host server (ex. updates).

After the server restarts I find the Traefik instance is stopped and doesn't start automatically, unlike all the other docker containers. And, before anyone asks, yes, it does have the restart: unless-stopped setting.

Here's the log from Traefik itself:

2024-10-25T07:36:09Z INF I have to go...
2024-10-25T07:36:09Z INF Stopping server gracefully
2024-10-25T07:36:09Z ERR error="accept tcp [::]:8899: use of closed network connection" entryPointName=metrics
2024-10-25T07:36:09Z ERR error="accept tcp [::]:443: use of closed network connection" entryPointName=websecure
2024-10-25T07:36:09Z ERR Error while starting server error="accept tcp [::]:8899: use of closed network connection" entryPointName=metrics
2024-10-25T07:36:09Z ERR Error while starting server error="accept tcp [::]:443: use of closed network connection" entryPointName=websecure
2024-10-25T07:36:09Z ERR error="accept tcp [::]:8080: use of closed network connection" entryPointName=traefik
2024-10-25T07:36:09Z ERR error="accept tcp [::]:80: use of closed network connection" entryPointName=web
2024-10-25T07:36:09Z ERR error="close tcp [::]:8080: use of closed network connection" entryPointName=traefik
2024-10-25T07:36:09Z ERR error="close tcp [::]:80: use of closed network connection" entryPointName=web
2024-10-25T07:36:19Z INF Server stopped
2024-10-25T07:36:19Z INF Shutting down
2024-10-25T07:36:19Z ERR Failed to list containers for docker error="Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json\": context canceled" providerName=docker

And here's the compose file:

services:

  traefik:
    image: traefik:v3.1.4
    container_name: "traefik"
    env_file:
      - ./azuredns/.env
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "./letsencrypt/:/letsencrypt/"
      - "./traefik.yml:/etc/traefik/traefik.yml"
      - "./services/:/services/"
      - "./azuredns/:/azuredns/"
      - "./certs/:/certs/"
      - "/var/log/traefik/:/var/log/traefik/"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.rule=Host(`traefik.domain.local`)"
      - "traefik.http.routers.traefik.entrypoints=web"
      - "traefik.http.routers.traefik.middlewares=redirect_to_https@file"
      - "traefik.http.routers.traefik-https.rule=Host(`traefik.arp.local`)"
      - "traefik.http.routers.traefik-https.entrypoints=websecure"
      - "traefik.http.routers.traefik-https.service=api@internal"
      - "traefik.http.routers.traefik-https.tls.certResolver=local"
      - "traefik.http.routers.traefik-https.middlewares=local-limit@file"
    healthcheck:
      test: "traefik healthcheck"
      retries: 12
      interval: 5s

And here's the traefik.yml config (slightly redacted for privacy):

global:
  checkNewVersion: true
  sendAnonymousUsage: false

entryPoints:
  web:
    address: :80

  websecure:
    asDefault: true
    address: :443
    http:
      tls:
        certResolver: le
        domains:
          - main: "*.domain.com"
            sans: 
              - "domain.com"
  metrics:
    address: :8899

serversTransport:
  insecureSkipVerify: true
  rootCAs:
    - /certs/domain.ca.crt

log:
  level: INFO

accessLog:
  filePath: /var/log/traefik/access.log
  bufferingSize: 100

api:
  insecure: true

ping: {}

providers:
  file:
    directory: /services/
    watch: true
  docker:
    network: traefik-proxy
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false

certificatesResolvers:
  le:
    acme:
      email: admin@domain.com
      storage: /letsencrypt/le.json
      caserver: "https://acme-v02.api.letsencrypt.org/directory"
      dnsChallenge:
        provider: azuredns

  le-staging:
    acme:
      email: admin@domain.com
      storage: /letsencrypt/le-staging.json
      caserver: "https://acme-staging-v02.api.letsencrypt.org/directory"
      dnsChallenge:
        provider: azuredns

  local:
    acme:
      email: admin@domain.pl
      storage: /letsencrypt/domain.json
      caserver: "https://acme.domain.local"
      httpChallenge:
        entryPoint: web

metrics:
  prometheus:
    buckets:
      - 0.1
      - 0.3
      - 1.2
      - 5.0
    addEntryPointsLabels: true
    addServicesLabels: true
    entrypoint: metrics

experimental:
  plugins:
    GeoBlock:
      moduleName: "github.com/PascalMinder/geoblock"
      version: "v0.2.8"

Any idea why Traefik fails to re-start when the docker host is back up?

Does it run ok when you start it manually?

I would use absolute paths for the bind mounts.

I can manually shut down and restart the container without any issues.

This isn't the only container with relative paths so I'm not sure why that would be an issue (the others are working fine after a host restart). AFAIK the paths should be "resolved" and linked appropriately when the container is first created, so it makes no sense why that would be an issue when said container is stopped and then started.

The only interesting line in the logs is the last one:

2024-10-25T07:36:19Z ERR Failed to list containers for docker error="Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json\": context canceled" providerName=docker

If I manually restart the container this line doesn't show up. I'm not sure why this would be an issue, however - I've also got Portainer linked with docker.sock (it's a web-based GUI for managing Docker instances) and that container "survives" a restart just fine.