Traefik on docker - not turning back on after a host restart

I'm running Traefik as a docker instance, and it works great... until I need to restart the docker host server (ex. updates).

After the server restarts I find the Traefik instance is stopped and doesn't start automatically, unlike all the other docker containers. And, before anyone asks, yes, it does have the restart: unless-stopped setting.

Here's the log from Traefik itself:

2024-10-25T07:36:09Z INF I have to go...
2024-10-25T07:36:09Z INF Stopping server gracefully
2024-10-25T07:36:09Z ERR error="accept tcp [::]:8899: use of closed network connection" entryPointName=metrics
2024-10-25T07:36:09Z ERR error="accept tcp [::]:443: use of closed network connection" entryPointName=websecure
2024-10-25T07:36:09Z ERR Error while starting server error="accept tcp [::]:8899: use of closed network connection" entryPointName=metrics
2024-10-25T07:36:09Z ERR Error while starting server error="accept tcp [::]:443: use of closed network connection" entryPointName=websecure
2024-10-25T07:36:09Z ERR error="accept tcp [::]:8080: use of closed network connection" entryPointName=traefik
2024-10-25T07:36:09Z ERR error="accept tcp [::]:80: use of closed network connection" entryPointName=web
2024-10-25T07:36:09Z ERR error="close tcp [::]:8080: use of closed network connection" entryPointName=traefik
2024-10-25T07:36:09Z ERR error="close tcp [::]:80: use of closed network connection" entryPointName=web
2024-10-25T07:36:19Z INF Server stopped
2024-10-25T07:36:19Z INF Shutting down
2024-10-25T07:36:19Z ERR Failed to list containers for docker error="Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json\": context canceled" providerName=docker

And here's the compose file:

services:

  traefik:
    image: traefik:v3.1.4
    container_name: "traefik"
    env_file:
      - ./azuredns/.env
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "./letsencrypt/:/letsencrypt/"
      - "./traefik.yml:/etc/traefik/traefik.yml"
      - "./services/:/services/"
      - "./azuredns/:/azuredns/"
      - "./certs/:/certs/"
      - "/var/log/traefik/:/var/log/traefik/"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.rule=Host(`traefik.domain.local`)"
      - "traefik.http.routers.traefik.entrypoints=web"
      - "traefik.http.routers.traefik.middlewares=redirect_to_https@file"
      - "traefik.http.routers.traefik-https.rule=Host(`traefik.arp.local`)"
      - "traefik.http.routers.traefik-https.entrypoints=websecure"
      - "traefik.http.routers.traefik-https.service=api@internal"
      - "traefik.http.routers.traefik-https.tls.certResolver=local"
      - "traefik.http.routers.traefik-https.middlewares=local-limit@file"
    healthcheck:
      test: "traefik healthcheck"
      retries: 12
      interval: 5s

And here's the traefik.yml config (slightly redacted for privacy):

global:
  checkNewVersion: true
  sendAnonymousUsage: false

entryPoints:
  web:
    address: :80

  websecure:
    asDefault: true
    address: :443
    http:
      tls:
        certResolver: le
        domains:
          - main: "*.domain.com"
            sans: 
              - "domain.com"
  metrics:
    address: :8899

serversTransport:
  insecureSkipVerify: true
  rootCAs:
    - /certs/domain.ca.crt

log:
  level: INFO

accessLog:
  filePath: /var/log/traefik/access.log
  bufferingSize: 100

api:
  insecure: true

ping: {}

providers:
  file:
    directory: /services/
    watch: true
  docker:
    network: traefik-proxy
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false

certificatesResolvers:
  le:
    acme:
      email: admin@domain.com
      storage: /letsencrypt/le.json
      caserver: "https://acme-v02.api.letsencrypt.org/directory"
      dnsChallenge:
        provider: azuredns

  le-staging:
    acme:
      email: admin@domain.com
      storage: /letsencrypt/le-staging.json
      caserver: "https://acme-staging-v02.api.letsencrypt.org/directory"
      dnsChallenge:
        provider: azuredns

  local:
    acme:
      email: admin@domain.pl
      storage: /letsencrypt/domain.json
      caserver: "https://acme.domain.local"
      httpChallenge:
        entryPoint: web

metrics:
  prometheus:
    buckets:
      - 0.1
      - 0.3
      - 1.2
      - 5.0
    addEntryPointsLabels: true
    addServicesLabels: true
    entrypoint: metrics

experimental:
  plugins:
    GeoBlock:
      moduleName: "github.com/PascalMinder/geoblock"
      version: "v0.2.8"

Any idea why Traefik fails to re-start when the docker host is back up?

Does it run ok when you start it manually?

I would use absolute paths for the bind mounts.

I can manually shut down and restart the container without any issues.

This isn't the only container with relative paths so I'm not sure why that would be an issue (the others are working fine after a host restart). AFAIK the paths should be "resolved" and linked appropriately when the container is first created, so it makes no sense why that would be an issue when said container is stopped and then started.

The only interesting line in the logs is the last one:

2024-10-25T07:36:19Z ERR Failed to list containers for docker error="Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json\": context canceled" providerName=docker

If I manually restart the container this line doesn't show up. I'm not sure why this would be an issue, however - I've also got Portainer linked with docker.sock (it's a web-based GUI for managing Docker instances) and that container "survives" a restart just fine.

I am getting the exact same behavior from traefik, upon server restart, traefik docker container will not start, all other containers boot just fine.

Have you been able to find the root cause of this issue?

Share your full Traefik static and dynamic config, and Docker compose file of used.

Enable and check Traefik debug log.

Check Docker daemon logs.

Did you set restart?

@bluepuma77
Most of these things have been provided in the original post...

I suppose a better thread title would have been "On docker restart Traefik container starts but then stops / crashes". Because it's not an issue related to a docker container "just" not starting - if it were, it would be docker-specific. Instead on restart the container starts but then immediately stops and doesn't restart any more.

Sorry, it was intended for the new post of @hu24ebr.

If your issue still persists, I recommend to set the log level to DEBUG.