Traefik stops working after redeploying any service (Docker Swarm)

Hey,

This is a follow-up to this issue - Traefik stops routing after some minutes (Docker Swarm)

Traefik stops routing after some minutes. (Docker Swarm) I suspect it's when a service (any service) gets redeployed

I'm running 3 nodes in Docker Swarm When I restart each server, the problem is fixed, until I redeploy the service or just wait a little time until one service gets redeployed automatically
I'm running Ubuntu 22.04 on two servers, and 22.10 on one

It seems like no request reaches traefik. In traefik logs there are no logs when it comes to routing.

It still occurs with a 2.11.2 version
Here's what I've done over the past 10 days:

  1. Reviewed entrypoints setup
  2. Added lbswarm=true label to every container (this didn't fix the issue)
  3. Edited /etc/resolv.conf on each node to make sure it points to my PiHole instance and not router
  4. Re-read Documentation two times ( :exploding_head: )
  5. Made sure that the ports: 80 in docker-compose.yml stack is only on Traefik Container
  6. Examined traefik logs (I know there's one related to lack of Middleware, but It's not related I think, since the container stops routing about 30 minutes after start-up)
  7. Copied my rootCA.pem certificate to /etc/ssl/certs on each node
  8. Set up trusted Let's Encrypt Certificates
  9. Set up default swarm: network: web in traefik.conf
  10. Switched to not using :latest image
  11. Switched to Traefik v2.11.2
  12. Removed every ports: section in other docker-stack.yml files except traefik
  13. Disabled every firewall rule on the router and disabled ufw (Although now I see that Docker bypasses ufw and firewalld)
  14. Checked the journalctl -p 3 and there are no errors regarding Docker Swarm
  15. I've tried the minimal traefik config traefik-best-practice/docker-traefik-dashboard-letsencrypt at main · bluepuma77/traefik-best-practice · GitHub by bluepuma and It didn't fixed the issue
  16. Captured packets with tcpdump and analyzed them with wireshark - from what I see and understand - the TCP-Handshakes completes, but later it gives only Keep-Alive packets
  17. Checked if Docker Swarm Port 2377 and 7946 are opened with netstat -ntlpand they are
  18. Reinstalled docker runtime as specified in Install Docker Engine on Ubuntu | Docker Docs
  19. Removed all Docker Services and deploy only traefik and additional service
  • With this, my two services work appriopriately all the time.
  • However, I've increased the number to 4 services, and the issue still occured
  • I thought the issue is related to Monitoring and huge amount of ICMP packets, so I've removed grafana, prometheus logging and uptime-kuma, but the issue still occurs
    traefik-stack.yml:
version: '3.5'

services:
  reverse-proxy:
    image: traefik:2.11.2
    ports:
      - "8080:8080"
      - "80:80"
      - "443:443"
    #  - "222:222"
    volumes:
      - /home/swarm/traefik/traefik-conf.yml:/etc/traefik/traefik.yml:ro
      - /var/run/docker.sock:/var/run/docker.sock
      - /home/swarm/traefik/configuration/:/configuration/
      - /etc/localtime:/etc/localtime:ro # To get timezones
      - /home/swarm/traefik/acme.json:/acme.json
    environment:
      - TZ=Europe/Warsaw
      - CLOUDFLARE_EMAIL=<REDACTED>
      - CF_DNS_API_TOKEN=<REDACTED>#Added new variable
    networks:
      - web

    deploy:
      labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.entrypoints=web"
      - "traefik.http.routers.traefik.rule=Host(`traefik.swarm.<REDACTED>`)"
      - "traefik.http.services.traefik.loadbalancer.server.port=8080"
      - "traefik.http.middlewares.traefik-auth.basicauth.users=admin:<REDACTED>"
      - "traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https"
      - "traefik.http.middlewares.sslheader.headers.customrequestheaders.X-Forwarded-Proto=https"
      - "traefik.http.routers.traefik.middlewares=traefik-https-redirect"
      - "traefik.http.routers.traefik-secure.entrypoints=websecure"
      - "traefik.http.routers.traefik-secure.rule=Host(`traefik.swarm.<REDACTED>`)"
      - "traefik.http.services.traefik-secure.loadbalancer.server.port=8080"
      - "traefik.http.routers.traefik-secure.middlewares=traefik-auth"
      - "traefik.http.routers.traefik-secure.tls=true"
      - "traefik.http.routers.traefik-secure.tls.certresolver=cloudflare"
      - "traefik.http.routers.traefik-secure.tls.domains[0].main=swarm.<REDACTED>"
      - "traefik.http.routers.traefik-secure.tls.domains[0].sans=*.swarm.<REDACTED>"
      - "traefik.http.routers.traefik-secure.service=api@internal"
      # Dummy service for Swarm port detection. The port can be any valid integer value.
      - "traefik.http.services.dummy-svc.loadbalancer.server.port=9999"
      mode: global
      placement:
        constraints: [node.role == manager]

networks:
  web:
    driver: overlay
    attachable: true
    name: web

traefik-conf.yml

global:
  sendAnonymousUsage: true
accessLog: {}

log:
  level: DEBUG

api:
  dashboard: true
  debug: true

entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
  websecure:
    address: ":443"

serversTransport:
  insecureSkipVerify: true

providers:
  docker:
    swarmMode: true
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: web
  # file:
  #   filename: /external-services.yml
certificatesResolvers:
  cloudflare:
    acme:
      email: <REDACTED>
      storage: acme.json
      caServer: https://acme-v02.api.letsencrypt.org/directory # prod (default)
      # caServer: https://acme-staging-v02.api.letsencrypt.org/directory # staging
      dnsChallenge:
        provider: cloudflare
        # disablePropagationCheck: true # uncomment this if you have issues pulling certificates through cloudflare, By setting this flag to true disables the need to wait
for the propagation of the TXT record to all authoritative name servers.
        # delayBeforeCheck: 600s # uncomment along with disablePropagationCheck if needed to ensure the TXT record is ready before verification is attempted
        resolvers:
          - 1.1.1.1:53
          - 8.8.8.8:53

traefik-logs

traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Configuration loaded from file: /etc/traefik/traefik.yml"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Traefik version 2.11.2 built on 2024-04-11T15:38:45Z"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="Static configuration loaded {\"global\":{\"checkNewVersion\":true,\"sendAnonymousUsage\":true},\"serversTransport\":{\"insecureSkipVerify\":true,\"maxIdleConnsPerHost\":200},\"entryPoints\":{\"web\":{\"address\":\":80\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":\"10s\"},\"respondingTimeouts\":{\"readTimeout\":\"1m0s\",\"idleTimeout\":\"3m0s\"}},\"forwardedHeaders\":{},\"http\":{\"redirections\":{\"entryPoint\":{\"to\":\"websecure\",\"scheme\":\"https\",\"permanent\":true,\"priority\":9223372036854775806}}},\"http2\":{\"maxConcurrentStreams\":250},\"udp\":{\"timeout\":\"3s\"}},\"websecure\":{\"address\":\":443\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":\"10s\"},\"respondingTimeouts\":{\"readTimeout\":\"1m0s\",\"idleTimeout\":\"3m0s\"}},\"forwardedHeaders\":{},\"http\":{},\"http2\":{\"maxConcurrentStreams\":250},\"udp\":{\"timeout\":\"3s\"}}},\"providers\":{\"providersThrottleDuration\":\"2s\",\"docker\":{\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"swarmMode\":true,\"network\":\"web\",\"swarmModeRefreshSeconds\":\"15s\"}},\"api\":{\"dashboard\":true,\"debug\":true},\"log\":{\"level\":\"DEBUG\",\"format\":\"common\"},\"accessLog\":{\"format\":\"common\",\"filters\":{},\"fields\":{\"defaultMode\":\"keep\",\"headers\":{\"defaultMode\":\"drop\"}}},\"certificatesResolvers\":{\"cloudflare\":{\"acme\":{\"email\":\"<REDACTED>\",\"caServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"storage\":\"acme.json\",\"keyType\":\"RSA4096\",\"certificatesDuration\":2160,\"dnsChallenge\":{\"provider\":\"cloudflare\",\"resolvers\":[\"1.1.1.1:53\",\"8.8.8.8:53\"]}}}}}"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Stats collection is enabled."
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Many thanks for contributing to Traefik's improvement by allowing us to receive anonymous information from your configuration."
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Help us improve Traefik by leaving this feature on :)"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="More details on: https://doc.traefik.io/traefik/contributing/data-collection/"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Account URI does not match the current CAServer. The account will be reset." providerName=cloudflare.acme
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider aggregator aggregator.ProviderAggregator"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="Starting TCP Server" entryPointName=websecure
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="Starting TCP Server" entryPointName=web
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider *acme.ChallengeTLSALPN"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="*acme.ChallengeTLSALPN provider configuration: {}"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider *traefik.Provider"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="*traefik.Provider provider configuration: {}"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider *docker.Provider"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="*docker.Provider provider configuration: {\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"swarmMode\":true,\"network\":\"web\",\"swarmModeRefreshSeconds\":\"15s\"}"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider *acme.Provider"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="*acme.Provider provider configuration: {\"email\":\"<REDACTED>\",\"caServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"storage\":\"acme.json\",\"keyType\":\"RSA4096\",\"certificatesDuration\":2160,\"dnsChallenge\":{\"provider\":\"cloudflare\",\"resolvers\":[\"1.1.1.1:53\",\"8.8.8.8:53\"]},\"ResolverName\":\"cloudflare\",\"store\":{},\"TLSChallengeProvider\":{},\"HTTPChallengeProvider\":{}}"
traefik-swarm_reverse-proxy.0.qgw3cuz1b6o1@swarm2    | time="2024-06-01T08:14:37+02:00" level=info msg="Configuration loaded from file: /etc/traefik/traefik.yml"
traefik-swarm_reverse-proxy.0.qgw3cuz1b6o1@swarm2    | time="2024-06-01T08:14:37+02:00" level=info msg="Traefik version 2.11.2 built on 2024-04-11T15:38:45Z"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="Static configuration loaded {\"global\":{\"checkNewVersion\":true,\"sendAnonymousUsage\":true},\"serversTransport\":{\"insecureSkipVerify\":true,\"maxIdleConnsPerHost\":200},\"entryPoints\":{\"web\":{\"address\":\":80\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":\"10s\"},\"respondingTimeouts\":{\"readTimeout\":\"1m0s\",\"idleTimeout\":\"3m0s\"}},\"forwardedHeaders\":{},\"http\":{\"redirections\":{\"entryPoint\":{\"to\":\"websecure\",\"scheme\":\"https\",\"permanent\":true,\"priority\":9223372036854775806}}},\"http2\":{\"maxConcurrentStreams\":250},\"udp\":{\"timeout\":\"3s\"}},\"websecure\":{\"address\":\":443\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":\"10s\"},\"respondingTimeouts\":{\"readTimeout\":\"1m0s\",\"idleTimeout\":\"3m0s\"}},\"forwardedHeaders\":{},\"http\":{},\"http2\":{\"maxConcurrentStreams\":250},\"udp\":{\"timeout\":\"3s\"}}},\"providers\":{\"providersThrottleDuration\":\"2s\",\"docker\":{\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"swarmMode\":true,\"network\":\"web\",\"swarmModeRefreshSeconds\":\"15s\"}},\"api\":{\"dashboard\":true,\"debug\":true},\"log\":{\"level\":\"DEBUG\",\"format\":\"common\"},\"accessLog\":{\"format\":\"common\",\"filters\":{},\"fields\":{\"defaultMode\":\"keep\",\"headers\":{\"defaultMode\":\"drop\"}}},\"certificatesResolvers\":{\"cloudflare\":{\"acme\":{\"email\":\"<REDACTED>\",\"caServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"storage\":\"acme.json\",\"keyType\":\"RSA4096\",\"certificatesDuration\":2160,\"dnsChallenge\":{\"provider\":\"cloudflare\",\"resolvers\":[\"1.1.1.1:53\",\"8.8.8.8:53\"]}}}}}"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Stats collection is enabled."
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Many thanks for contributing to Traefik's improvement by allowing us to receive anonymous information from your configuration."
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Help us improve Traefik by leaving this feature on :)"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="More details on: https://doc.traefik.io/traefik/contributing/data-collection/"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Account URI does not match the current CAServer. The account will be reset." providerName=cloudflare.acme
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider aggregator aggregator.ProviderAggregator"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="Starting TCP Server" entryPointName=websecure
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="Starting TCP Server" entryPointName=web
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider *acme.ChallengeTLSALPN"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="*acme.ChallengeTLSALPN provider configuration: {}"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider *traefik.Provider"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="*traefik.Provider provider configuration: {}"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider *docker.Provider"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="*docker.Provider provider configuration: {\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"swarmMode\":true,\"network\":\"web\",\"swarmModeRefreshSeconds\":\"15s\"}"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=info msg="Starting provider *acme.Provider"
traefik-swarm_reverse-proxy.0.mdkkygeb0msx@swarm-lite    | time="2024-06-01T08:14:28+02:00" level=debug msg="*acme.Provider provider configuration: {\"email\":\"<REDACTED>\",\"caServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"storage\":\"acme.json\",\"keyType\":\"RSA4096\",\"certificatesDuration\":2160,\"dnsChallenge\":{\"provider\":\"cloudflare\",\"resolvers\":[\"1.1.1.1:53\",\"8.8.8.8:53\"]},\"ResolverName\":\"cloudflare\",\"store\":{},\"TLSChallengeProvider\":{},\"HTTPChallengeProvider\":{}}"
traefik-swarm_reverse-proxy.0.qgw3cuz1b6o1@swarm2    | time="2024-06-01T08:14:37+02:00" level=info msg="Configuration loaded from file: /etc/traefik/traefik.yml"
traefik-swarm_reverse-proxy.0.qgw3cuz1b6o1@swarm2    | time="2024-06-01T08:14:37+02:00" level=info msg="Traefik version 2.11.2 built on 2024-04-11T15:38:45Z"
/11T15:

dashy-stack.yml

version: "3.5"

services:
  dashy:
    image: lissy93/dashy:latest
    networks:
      - web
      - dashy
    volumes:
      - /home/swarm/dashy/dashy-conf.yml:/app/user-data/conf.yml
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Warsaw
    deploy:
      mode: replicated
      replicas: 3
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.dashy.rule=Host(`dashy.swarm.<REDACTED>`)"
        - "traefik.http.services.dashy.loadbalancer.server.port=8080"
        - "traefik.docker.network=web"
        - "io.portainer.accesscontrol.users=admin"
        - "traefik.http.routers.dashy.tls=true"
        - "traefik.http.routers.dashy.entrypoints=websecure"
        - "traefik.docker.lbswarm=true" # Disables load-balancing in Traefik and delegates it to Docker Swarm

networks:
  dashy:
    driver: overlay
    attachable: true
    name: dashy
  web:
    external: true
    name: web

volumes:
  dashy-volume:

journalctl -p 3 on node 1

-- Boot b6ec750fee204cc6bee99eec73241d9c --
May 30 09:26:30 swarm2 systemd-networkd-wait-online[575]: Timeout occurred while waiting for network connectivity.
May 30 09:26:30 swarm2 systemd[1]: Failed to start Wait for Network to be Configured.
May 30 09:26:31 swarm2 kernel: FS-Cache: Duplicate cookie detected
May 30 09:26:31 swarm2 kernel: FS-Cache: O-cookie c=00000003 [p=00000002 fl=222 nc=0 na=1]
May 30 09:26:31 swarm2 kernel: FS-Cache: O-cookie d=00000000cf4b6adb{CIFS.server} n=000000007a526fdc
May 30 09:26:31 swarm2 kernel: FS-Cache: O-key=[8] '020001bdc0a8010a'
May 30 09:26:31 swarm2 kernel: FS-Cache: N-cookie c=00000004 [p=00000002 fl=2 nc=0 na=1]
May 30 09:26:31 swarm2 kernel: FS-Cache: N-cookie d=00000000cf4b6adb{CIFS.server} n=0000000010ba92a2
May 30 09:26:31 swarm2 kernel: FS-Cache: N-key=[8] '020001bdc0a8010a'
May 30 09:26:31 swarm2 kernel: FS-Cache: Duplicate cookie detected
May 30 09:26:31 swarm2 kernel: FS-Cache: O-cookie c=00000003 [p=00000002 fl=222 nc=0 na=1]
May 30 09:26:31 swarm2 kernel: FS-Cache: O-cookie d=00000000cf4b6adb{CIFS.server} n=000000007a526fdc
May 30 09:26:31 swarm2 kernel: FS-Cache: O-key=[8] '020001bdc0a8010a'
May 30 09:26:31 swarm2 kernel: FS-Cache: N-cookie c=00000006 [p=00000002 fl=2 nc=0 na=1]
May 30 09:26:31 swarm2 kernel: FS-Cache: N-cookie d=00000000cf4b6adb{CIFS.server} n=0000000082c04e43
May 30 09:26:31 swarm2 kernel: FS-Cache: N-key=[8] '020001bdc0a8010a'
May 30 09:26:31 swarm2 kernel: FS-Cache: Duplicate cookie detected
May 30 09:26:31 swarm2 kernel: FS-Cache: O-cookie c=00000003 [p=00000002 fl=222 nc=0 na=1]
May 30 09:26:31 swarm2 kernel: FS-Cache: O-cookie d=00000000cf4b6adb{CIFS.server} n=000000007a526fdc
May 30 09:26:31 swarm2 kernel: FS-Cache: O-key=[8] '020001bdc0a8010a'
May 30 09:26:31 swarm2 kernel: FS-Cache: N-cookie c=00000005 [p=00000002 fl=2 nc=0 na=1]
May 30 09:26:31 swarm2 kernel: FS-Cache: N-cookie d=00000000cf4b6adb{CIFS.server} n=00000000875921c9
May 30 09:26:31 swarm2 kernel: FS-Cache: N-key=[8] '020001bdc0a8010a'
May 30 09:26:31 swarm2 kernel: FS-Cache: Duplicate cookie detected
May 30 09:26:31 swarm2 kernel: FS-Cache: O-cookie c=00000003 [p=00000002 fl=222 nc=0 na=1]
May 30 09:26:31 swarm2 kernel: FS-Cache: O-cookie d=00000000cf4b6adb{CIFS.server} n=000000007a526fdc

journalctl -p 3 on node 2

-- Boot e3b660167dce4eafae3f731a296b0b04 --
May 30 09:26:48 swarm3 systemd-networkd-wait-online[563]: Timeout occurred while waiting for network connectivity.
May 30 09:26:48 swarm3 systemd[1]: Failed to start Wait for Network to be Configured.
May 30 09:26:51 swarm3 kernel: FS-Cache: Duplicate cookie detected
May 30 09:26:51 swarm3 kernel: FS-Cache: O-cookie c=00000003 [p=00000002 fl=222 nc=0 na=1]
May 30 09:26:51 swarm3 kernel: FS-Cache: O-cookie d=00000000a60920a8{CIFS.server} n=000000004c3ed402
May 30 09:26:51 swarm3 kernel: FS-Cache: O-key=[8] '020001bdc0a8010a'
May 30 09:26:51 swarm3 kernel: FS-Cache: N-cookie c=00000004 [p=00000002 fl=2 nc=0 na=1]
May 30 09:26:51 swarm3 kernel: FS-Cache: N-cookie d=00000000a60920a8{CIFS.server} n=00000000d6db6c67
May 30 09:26:51 swarm3 kernel: FS-Cache: N-key=[8] '020001bdc0a8010a'
May 30 09:26:51 swarm3 kernel: FS-Cache: Duplicate cookie detected
May 30 09:26:51 swarm3 kernel: FS-Cache: O-cookie c=00000003 [p=00000002 fl=222 nc=0 na=1]
May 30 09:26:51 swarm3 kernel: FS-Cache: O-cookie d=00000000a60920a8{CIFS.server} n=000000004c3ed402
May 30 09:26:51 swarm3 kernel: FS-Cache: O-key=[8] '020001bdc0a8010a'
May 30 09:26:51 swarm3 kernel: FS-Cache: N-cookie c=00000005 [p=00000002 fl=2 nc=0 na=1]
May 30 09:26:51 swarm3 kernel: FS-Cache: N-cookie d=00000000a60920a8{CIFS.server} n=00000000d6639c2e
May 30 09:26:51 swarm3 kernel: FS-Cache: N-key=[8] '020001bdc0a8010a'
May 30 09:26:51 swarm3 kernel: FS-Cache: Duplicate cookie detected
May 30 09:26:51 swarm3 kernel: FS-Cache: O-cookie c=00000003 [p=00000002 fl=222 nc=0 na=1]
May 30 09:26:51 swarm3 kernel: FS-Cache: O-cookie d=00000000a60920a8{CIFS.server} n=000000004c3ed402
May 30 09:26:51 swarm3 kernel: FS-Cache: O-key=[8] '020001bdc0a8010a'
May 30 09:26:51 swarm3 kernel: FS-Cache: N-cookie c=00000006 [p=00000002 fl=2 nc=0 na=1]
May 30 09:26:51 swarm3 kernel: FS-Cache: N-cookie d=00000000a60920a8{CIFS.server} n=00000000aa517f44
May 30 09:26:51 swarm3 kernel: FS-Cache: N-key=[8] '020001bdc0a8010a'
May 30 09:26:51 swarm3 kernel: FS-Cache: Duplicate cookie detected

journalctl -p 3 on node1

-- Boot 0de23155652648edb6fb5d1149791747 --
May 16 07:21:01 swarm-lite systemd-udevd[386]: event_source: Failed to get device name: No such file or directory
May 16 07:21:16 swarm-lite ntpd[693]: CONFIG: restrict nopeer ignored
May 16 07:21:16 swarm-lite ntpd[693]: CLOCK: leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): expired less than 141 days ago
May 16 07:21:16 swarm-lite ntpd[693]: statistics directory /var/log/ntpsec/ does not exist or is unwriteable, error No such file or directory
May 16 07:21:16 swarm-lite systemd[1]: Failed to start nginx.service - A high performance web server and a reverse proxy server.
May 16 17:26:02 swarm-lite kernel: IPVS: rr: FWM 259 0x00000103 - no destination available
May 16 17:26:03 swarm-lite kernel: IPVS: rr: FWM 259 0x00000103 - no destination available

curl -vk https://dashy.swarm.REDACTED:443

<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1"><!--[if IE]><link rel="icon" type="image/png" sizes="64x64" href="//web-icons/favicon-64x64.png"><![endif]--><link rel="icon" type="image/png" sizes="32x32" href="web-icons/favicon-32x32.png"><link rel="icon" type="image/png" href="/favicon.ico"><link rel="stylesheet" type="text/css" href="/loading-screen.css"><title>Dashy</title><link href="/css/chunk-0044633e.0e433876.css" rel="prefetch"><link href="/css/chunk-0248a1e9.2af758e1.css" rel="prefetch"><link href="/css/chunk-0367deae.0f98d711.css" rel="prefetch"><link href="/css/chunk-0387fd77.7aa83618.css" rel="prefetch"><link href="/css/chunk-03c5a0ba.fdf5ccee.css" rel="prefetch"><link href="/css/chunk-class="loading-placeholder" id="loader"><h1>Dashy</h1><p class="loading">Loading...</p><div class="catastrophic-error" id="err-wrap" style="display:none;"><p class="err-l1">It looks like something's gone wrong...</p><p class="err-l2">This is likely caused by the app source not being found at the current domain</p><p class="err-l2">If you need additional support, check the browser console then <a href="https://github.co

curl -vk https://chatgpt.swarm.REDACTED:443 | tee curl.log

<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><link rel="stylesheet" href="/_next/static/css/b0ebe1618ba2d39f.css" data-precedence="next"/><link rel="stylesheet" href="/_next/static/css/6320fcde60ec292e.css" data-precedence="next"/><link rel="preload" href="/_next/static/chunks/webpack-4ae5a32a166d69a6.js" as="script"/><link rel="preload" href="/_next/static/chunks/bce60fc1-e55b90606913faf1.js" as="script"/><link rel="preload" href="/_next/static/chunks/7698-bb5d18468650f39a.js" as="script"/><link rel="preload" href="/_next/static/chunks/main-app-bf1f72eb5224e6ea.js" as="script"/><title>NextChat</title><meta name="description" content="Your personal ChatGPT Chat Bot."/><meta name="theme-color" media="(prefers-color-scheme: light)" content="#fafafa"/><meta name="theme-color" media="(prefers-color-scheme: dark)" content="#151515"/><meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1"/><meta name="apple-mobile-web-app-capable" content="yes"/><meta name="apple-mobile-web-app-title" content="NextChat"/><meta name="apple-mobile-web-app-status-bar-style" content="default"/><meta name="config" content="{&quot;version&quot;:&quot;v2.12.2&quot;,&quot;commitDate&quot;:&quot;1714454910000&quot;,&quot;commitHash&quot;:&quot;52312dbd23b0080cc8efd21c92fab2cf6a5ef832&quot;,&quot;buildMode&quot;:&quot;standalone&quot;,&quot;isApp&quot;:false}"/><meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no"/><link rel="manifest" href="/site.webmanifest"/><script src="/serviceWorkerRegister.js" defer=""></script><script src="/_next/static/chunks/polyfills-78c92fac7aa8fdd8.js" noModule=""></script></head><body><div class="home_loading-content__7_JjP no-dark"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="30" height="30" fill="none"><defs><path id="bot_svg__a" d="M0 0h30v30H0z"></path><path id="bot_svg__c" d="M0 0h20.455v20.455H0z"></path></defs><g><rect width="30" height="30" fill="#E7F8FF" rx="10"></rect><mask id="bot_svg__b" fill="#fff"><use xlink:href="#bot_svg__a"></use></mask><g mask="url(#bot_svg__b)"><g transform="translate(4.773 4.773)"><mask id="bot_svg__d" fill="#fff"><use xlink:href="#bot_svg__c"></use></mask><g mask="url(#bot_svg__d)"><path fill-rule="evenodd" style="fill:#1f948c" d="M19.11 8.37c.17-.52.26-1.06.26-1.61 0-.9-.24-1.79-.71-2.57a5.24 5.24 0 0 0-4.53-2.59c-.37 0-.73.04-1.09.11A5.201 5.201 0 0 0 9.17 0h-.04C6.86 0 4.86 1.44 4.16 3.57A5.11 5.11 0 0 0 .71 6.04C.24 6.83 0 7.72 0 8.63c0 1.27.48 2.51 1.35 3.45-.18.52-.27 1.07-.27 1.61 0 .91.25 1.8.71 2.58 1.13 1.94 3.41 2.94 5.63 2.47a5.18 5.18 0 0 0 3.86 1.71h.05c2.26 0 4.27-1.44 4.97-3.57a5.132 5.132 0 0 0 3.45-2.47c.46-.78.7-1.67.7-2.58 0-1.28-.48-2.51-1.34-3.46ZM8.947 18.158c-.04.03-.08.05-.12.07.7.58 1.57.89 2.48.89h.01c2.14 0 3.88-1.72 3.88-3.83v-4.76c0-.02-.02-.04-.04-.05l-1.74-.99v5.75c0 .23-.13.45-.34.57l-4.13 2.35Zm-.67-1.153 4.17-2.38c.02-.01.03-.03.03-.05v-1.99l-5.04 2.87c-.21.12-.47.12-.68 0l-4.13-2.35c-.04-.02-.09-.06-.12-.07-.04.21-.06.43-.06.65 0 .67.18 1.33.52 1.92v-.01c.7 1.19 1.98 1.92 3.37 1.92.68 0 1.35-.18 1.94-.51ZM3.903 5.168v-.14c-.85.31-1.57.9-2.02 1.68a3.78 3.78 0 0 0-.52 1.91c0 1.37.74 2.64 1.94 3.33l4.17 2.37c.02.01.04.01.06 0l1.75-1-5.04-2.87a.64.64 0 0 1-.34-.57v-4.71Zm13.253 3.337-4.18-2.38c-.02 0-.04 0-.06.01l-1.74.99 5.04 2.87c.21.12.34.34.34.58v4.85c1.52-.56 2.54-1.99 2.54-3.6 0-1.37-.74-2.63-1.94-3.32ZM8.014 5.83c-.02.01-.03.03-.03.05v1.99L13.024 5a.692.692 0 0 1 .68 0l4.13 2.35c.04.02.08.05.12.07.03-.21.05-.43.05-.65 0-2.11-1.74-3.83-3.88-3.83-.68 0-1.35.18-1.94.51l-[...]

tcpdump
From what I see and understand - it completes 3-Way TCP Handshake, but then it only gives me Keep-Alive Packets

Compare to simple Traefik Swarm example.

Note that providers.docker (v2) in swarm mode only works on Docker Swarm manager nodes.

Also note that Traefik CE LetsEncrypt only works with a single Traefik instance. If you have multiple Traefik instances and want to use LE, you need to use a workaround.

I've adjusted the traefik docker-compose.yml file and added this section:

    ports:
      # listen on host ports without ingress network
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host

Now I've noticed something.
When the error occurs when I run sudo lsof -i tcp:443

I see many established connections

docker-pr 3394 root  924u  IPv4 104366      0t0  TCP 172.18.0.1:38506->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  929u  IPv4 105753      0t0  TCP 192.168.REDACTED:https->172.18.0.32:54296 (ESTABLISHED)
docker-pr 3394 root  930u  IPv4 105755      0t0  TCP 172.18.0.1:38508->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  935u  IPv4 105137      0t0  TCP 192.168.REDACTED:https->172.18.0.32:54302 (ESTABLISHED)
docker-pr 3394 root  936u  IPv4 105139      0t0  TCP 172.18.0.1:38516->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  939u  IPv4 105141      0t0  TCP 192.168.REDACTED:https->172.18.0.32:54312 (ESTABLISHED)
docker-pr 3394 root  940u  IPv4 105143      0t0  TCP 172.18.0.1:38520->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  945u  IPv4 105145      0t0  TCP 192.168.REDACTED:https->172.18.0.32:54326 (ESTABLISHED)
docker-pr 3394 root  946u  IPv4 106866      0t0  TCP 172.18.0.1:38530->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  951u  IPv4 104382      0t0  TCP 192.168.REDACTED:https->172.18.0.32:54340 (ESTABLISHED)
docker-pr 3394 root  952u  IPv4 104384      0t0  TCP 172.18.0.1:38532->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  957u  IPv4 118943      0t0  TCP 192.168.REDACTED:https->172.18.0.32:34190 (ESTABLISHED)
docker-pr 3394 root  958u  IPv4 118945      0t0  TCP 172.18.0.1:48354->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  963u  IPv4 116967      0t0  TCP 192.168.REDACTED:https->172.18.0.32:34206 (ESTABLISHED)
docker-pr 3394 root  964u  IPv4 117990      0t0  TCP 172.18.0.1:48362->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  969u  IPv4 118947      0t0  TCP 192.168.REDACTED:https->172.18.0.32:34212 (ESTABLISHED)
docker-pr 3394 root  970u  IPv4 118949      0t0  TCP 172.18.0.1:48364->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  975u  IPv4 116972      0t0  TCP 192.168.REDACTED:https->172.18.0.32:34222 (ESTABLISHED)
docker-pr 3394 root  976u  IPv4 116974      0t0  TCP 172.18.0.1:48376->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  981u  IPv4 116698      0t0  TCP 192.168.REDACTED:https->172.18.0.32:34232 (ESTABLISHED)
docker-pr 3394 root  982u  IPv4 116700      0t0  TCP 172.18.0.1:48382->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  987u  IPv4 109124      0t0  TCP 192.168.REDACTED:https->172.18.0.32:34234 (ESTABLISHED)
docker-pr 3394 root  988u  IPv4 109126      0t0  TCP 172.18.0.1:48392->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  993u  IPv4 118956      0t0  TCP 192.168.REDACTED:https->172.18.0.32:34240 (ESTABLISHED)
docker-pr 3394 root  994u  IPv4 118957      0t0  TCP 192.168.REDACTED:https->172.18.0.32:34242 (ESTABLISHED)
docker-pr 3394 root  995u  IPv4 118009      0t0  TCP 172.18.0.1:48418->172.18.0.5:https (ESTABLISHED)
docker-pr 3394 root  996u  IPv4 118959      0t0  TCP 172.18.0.1:48402->172.18.0.5:https (ESTABLISHED)
docker-pr 3402 root    4u  IPv6  22564      0t0  TCP *:https (LISTEN)

I still have both IPv4 and IPv6 listenning on port 443, 80

I've also done:

  1. Distro upgrade to 24.04LTS
  2. With this Docker Engine Upgrade to the newest version possible on Ubuntu
  3. Debugged the logs all day. But there are no sign of docker or traefik malfunctioning in the logs. (Other than the lsof
  4. I switched to single-node environment

I think that this:

caused the issue.
When I simply removed this curly brackets and added

accessLog:
  filePath: /logs/access.log
  bufferingSize: 100

The issue after reboot got fixed

I might be also because I had a DNS Misconfiguration. (I'm not going to go much into detail here)

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.