Docker Swarm,Traefik only works for one node and ignores every other node

Hi guys! I hope someone can help me with this. Otherwise, I am considering living in the forest, far away from all technology.
I have 3 VPS running, each one is a docker swarm manager node, everything works fantastic as long as all the containers are in the same node. However, as soon as I deployed all three nodes and separated the containers in each one, I noticed that Traefik was only listening to the ports of the same host he was in. Also, Traefik only communicates with the containers from the same node. I have tried several things now, but nothing seems to work. I am trying to solve it following the next Similar problem but maybe I am just too dumb to do it right. Also I dont have any error log from any container. Traefik is just not communicating to anything outside its own vps. The files mounted in volumes work for each node, they are using gluster. There is no problem about that.
Please help. I will provide you my yml files. The comments would be about other configuration that also worked for one node.

This is my docker-stack.yml for Traefik.

networks:
  proxy:
    name: proxy
    driver: overlay
    attachable: true
  internal: 
    name: internal 
    driver: overlay
#internal network is for my services 
#that communicate internally with or without it, it does not work

services:
  traefik:
    image: traefik:v3.2
    hostname: '{{.Node.Hostname}}'
    cap_drop: #I tried without this cap_drop but nothing changes.
      - ALL

    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host
      - target: 1870
        published: 1870
        protocol: tcp
        mode: host
      # I had my ports like this
      #- 80:80
      #- 443:443
      #- 1870:1870 

    logging:
      driver: "fluentd"
      options:
        fluentd-address: localhost:24224
        tag: traefik
        fluentd-async-connect: "true"
        fluentd-retry-wait: "5s"

    env_file: ../.env
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /mnt/config/traefik/traefik.yml:/etc/traefik/traefik.yml:ro
      - /mnt/config/traefik/acme.json:/acme.json
    environment:
      CF_DNS_API_TOKEN: Mytoken
      TRAEFIK_DASHBOARD_CREDENTIALS: myuser:myencriptedpassword

    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.entrypoints=http"
      - "traefik.http.routers.traefik.rule=Host(`mydomain`)"
      - "traefik.http.middlewares.traefik-auth.basicauth.users=myuser:myencriptedpassword"
      - "traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https"
      - "traefik.http.middlewares.sslheader.headers.customrequestheaders.X-Forwarded-Proto=https"
      - "traefik.http.routers.traefik.middlewares=traefik-https-redirect"
      - "traefik.http.routers.traefik-secure.entrypoints=https"
      - "traefik.http.routers.traefik-secure.rule=Host(`mydomain`)"
      - "traefik.http.routers.traefik-secure.middlewares=traefik-auth"
      - "traefik.http.routers.traefik-secure.tls=true"
      - "traefik.http.routers.traefik-secure.tls.certresolver=cloudflare"
      - "traefik.http.routers.traefik-secure.tls.domains[0].main=mydomain"
      - "traefik.http.routers.traefik-secure.tls.domains[0].sans=*.mydomain"
      - "traefik.http.routers.traefik-secure.service=api@internal"

    networks:
      - proxy
#      - internal 

    deploy:
      replicas: 1
      resources:
        limits:
          cpus: "0.5"
          memory: "256M"
          pids: 100
      restart_policy:
        condition: any 
        delay: 1s
        max_attempts: 3
        window: 90s

      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback 
        monitor: 60s 
        max_failure_ratio: 0
        order: start-first

    healthcheck:
      test: ["CMD", "traefik", "healthcheck"]
      interval: 10s
      timeout: 2s
      retries: 3
      start_period: 30s

This is my traefik.yml.

global:
  checkNewVersion: false
  sendAnonymousUsage: false
log:
  level: INFO
api:
  dashboard: true
  debug: true
ping: {} 

entryPoints:
  http:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
  https:
    address: ":443"
  myentry:
    address: ":1870"

certificatesResolvers:
  cloudflare:
    acme:
      email: mymail@gmail.com
      storage: acme.json
      caServer: https://acme-staging-v02.api.letsencrypt.org/directory 
      dnsChallenge:
        provider: cloudflare
        resolvers:
          - "1.1.1.1:53"
          - "1.0.0.1:53"

serversTransport:
  insecureSkipVerify: true 

providers:
  docker: 
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: proxy

This is an example of one of my services that uses Traefik.

volumes:
  vol-sisyphus:
    name: foo-sisyphus

  vol-alexandria:
    name: foo-alexandria

networks:
  proxy:
    external: true
  internal:
    name: internal
    driver: overlay

services:
  sisyphus:
    hostname: '{{.Node.Hostname}}'
    image: nodered/node-red:4.0.3-22-minimal

    logging:
      driver: "fluentd"
      options:
        fluentd-address: localhost:24224
        tag: sisyphus
        fluentd-async-connect: "true"
        fluentd-retry-wait: "5s"

    cap_drop:
      - ALL

    volumes:
      - /mnt/volumes/vol-sisyphus:/data
      - /mnt/data/thunder.bin:/data/thunder.bin
      - /mnt/config/nodered/node-red-settings2.js:/data/settings.js

    environment:
      NODE_CREDENTIALS: myuser:mypassencripted

    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=proxy"
      - "traefik.http.routers.sisyphus-secure.rule=Host(`sisyphus.mydomain`) && Path(`/happy/`)"
      - "traefik.http.routers.sisyphus-secure.entrypoints=https"
      - "traefik.http.routers.sisyphus-secure.service=sisyphus"
      - "traefik.http.routers.sisyphus-secure.tls=true"
      - "traefik.http.routers.sisyphus-secure.middlewares=sisyphus-auth"
      - "traefik.http.middlewares.sisyphus-auth.basicauth.users=myuser:mypassencripted"
      - "traefik.http.routers.sisyphus-other.rule=Host(`sisyphus.mydomain`) && PathPrefix(`/`)"
      - "traefik.http.routers.sisyphus-other.entrypoints=https"
      - "traefik.http.routers.sisyphus-other.service=sisyphus"
      - "traefik.http.routers.sisyphus-other.tls=true"

      - "traefik.http.services.sisyphus.loadbalancer.server.port=1890"

    deploy:
      replicas: 3
      resources:
        limits:
          cpus: "0.4"
          memory: "512M"
          pids: 100

      restart_policy:
        condition: on-failure
        delay: 1s
        max_attempts: 3
        window: 60s 

      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback 
        monitor: 60s 
        max_failure_ratio: 0
        order: start-first

    networks:
      proxy:
        aliases:
          - sisyphus_network
      internal:
        aliases:
          - sisyphus_internal

I did a docker inspect proxy this is the result
It only seems to be communication with MAGI_magi, which is a service that is running in the same node as traefik. In peers only appears 2 nodes because the drain the third one.

[
    {
        "Name": "proxy",
        "Id": "929vhoiy6ns7to6uhf2xkt4md",
        "Created": "2024-10-25T20:23:19.111101573Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.1.0/24",
                    "Gateway": "10.0.1.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "6b5d47121c69fbfc475a81a61b716d04a454b74c7e23e704653f4b4883194dbe": {
                "Name": "MAGI_magi.1.ph4uirt7a1ls3fk6rm9v028i4",
                "EndpointID": "1a4f872dc7c25e8e1ada07eae66dea672d112ef5e58d136ae66cd69c9b60a330",
                "MacAddress": "02:42:0a:00:01:0b",
                "IPv4Address": "10.0.1.11/24",
                "IPv6Address": ""
            },
            "74e6f05f45e7e1f2861ccffaf769e2d70b125184cbb63c38d574acce53963aa5": {
                "Name": "TRAEFIK_traefik.1.e30u6cc9slq00gaxp1eoohga4",
                "EndpointID": "cc40c63f5cb6e0a28a523fc393787afd0f9ee9572e5a5c51ddcd01f651753cc2",
                "MacAddress": "02:42:0a:00:01:03",
                "IPv4Address": "10.0.1.3/24",
                "IPv6Address": ""
            },
            "lb-proxy": {
                "Name": "proxy-endpoint",
                "EndpointID": "3c2524276d86a9ef42738e9c1c3a3de46ef0ae59c024f3d7f235e78eca47a3e0",
                "MacAddress": "02:42:0a:00:01:04",
                "IPv4Address": "10.0.1.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4138"
        },
        "Labels": {
            "com.docker.stack.namespace": "TRAEFIK"
        },
        "Peers": [
            {
                "Name": "a1e70c16d451",
                "IP": "217.219.95.99"
            },
            {
                "Name": "888b24a56e04",
                "IP": "163.246.22.142"
            }
        ]
    }
]

When using Docker Swarm, you need to place the labels inside deploy and use docker stack deploy.

In Traefik v3, you need to use providers.swarm.

Compare to simple Traefik Swarm example.

1 Like

@bluepuma77 I made these changes according to the document. But traefik breaks after 2 minutes, did I miss something? Instead of providers.swarm I also tried providers.docker without success.
I know that what is killing the container is the healthchek, but if I take out the healthcheck, it continues running, but it doesn't even generate the certificates.

    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
#      - /mnt/config/traefik/traefik.yml:/etc/traefik/traefik.yml:ro
      - /mnt/config/traefik/acme.json:/acme.json
      #- ./data/config.yml:/config.yml:ro
    command:
#      - "--global.checkNewVersion=false"
#      - "--global.sendAnonymousUsage=false"
      - --log.level=INFO
      - --api.dashboard=true
#      - "--api.debug=true"
#      - "--ping=true"
      - --entrypoints.http.address=:80
      - --entrypoints.https.address=:443
      - --entrypoints.entrynode.address=:1870
      - --entrypoints.entryemqxdash.address=:18083
      - --entrypoints.http.http.redirections.entryPoint.to=https
      - --entrypoints.http.http.redirections.entryPoint.scheme=https
      - --certificatesResolvers.cloudflare.acme.email=mymail@gmail.com
      - --certificatesResolvers.cloudflare.acme.storage=acme.json
      - --certificatesResolvers.cloudflare.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory
      - --certificatesResolvers.cloudflare.acme.dnsChallenge.provider=cloudflare
      - --certificatesResolvers.cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
#      - --serversTransport.insecureSkipVerify=true
#      - --providers.swarm.endpoint=unix:///var/run/docker.sock
      - --providers.swarm.exposedByDefault=false
      - --providers.swarm.network=proxy

These are the logs

2024-10-25T22:01:53Z INF Starting provider aggregator aggregator.ProviderAggregator
2024-10-25T22:01:53Z INF Starting provider *traefik.Provider
2024-10-25T22:01:53Z INF Starting provider *docker.Provider
2024-10-25T22:01:53Z INF Starting provider *acme.ChallengeTLSALPN
2024-10-25T22:01:53Z INF Starting provider *acme.Provider
2024-10-25T22:01:53Z INF Testing certificate renew... acmeCA=https://acme-staging-v02.api.letsencrypt.org/directory providerName=cloudflare.acme
2024-10-25T22:02:48Z INF I have to go...
2024-10-25T22:02:48Z INF Stopping server gracefully
2024-10-25T22:02:48Z ERR error="accept tcp [::]:443: use of closed network connection" entryPointName=https
2024-10-25T22:02:48Z ERR Error while starting server error="accept tcp [::]:443: use of closed network connection" entryPointName=https
2024-10-25T22:02:48Z ERR error="accept tcp [::]:1870: use of closed network connection" entryPointName=entrynode
2024-10-25T22:02:48Z ERR Error while starting server error="accept tcp [::]:1870: use of closed network connection" entryPointName=entrynode
2024-10-25T22:02:48Z ERR error="accept tcp [::]:80: use of closed network connection" entryPointName=http
2024-10-25T22:02:48Z ERR Error while starting server error="accept tcp [::]:80: use of closed network connection" entryPointName=http
2024-10-25T22:02:48Z ERR error="accept tcp [::]:18083: use of closed network connection" entryPointName=entryemqxdash
2024-10-25T22:02:48Z ERR error="close tcp [::]:18083: use of closed network connection" entryPointName=entryemqxdash
2024-10-25T22:02:48Z INF Server stopped
2024-10-25T22:02:48Z INF Shutting down

is a message that appears usually during shutdown, so it’s not really relevant.

Thanks, @bluepuma77 I made it work following your example. But I have a question. I noticed that you use mode: global. Is there a reason for that? With your configuration, Traefik can communicate with other containers that are not in its own node, which solves my original problem. However, if I only deploy 1 replica, Traefik can not listen to the ports that are not in its own node. So it must be in global mode.

On the production system we run 3 manager nodes, each has a Traefik instance running (deploy mode global with constraint), which is only listening on local node (ports mode host).

In front of the 3 we have a managed load balancer in hope for a better uptime :sunglasses:

Note that LetsEncrypt with multiple Traefik CE instances only works with dnsChallenge, one individual cert per instance. So you better not lose your certs, because you can only create 5 certs per domain per week.

1 Like

I have other question, why did you use port 1337 in traefik.http.services.mydashboard.loadbalancer.server.port=1337?
I created wildcard certificates for *.mydomain, but it did not work until I assigned one loadbalancer port (the same as yours).

  • "traefik.http.routers.traefik-secure.tls.domains[0].main=babel.cercaelectrica.com.mx"
  • "traefik.http.routers.traefik-secure.tls.domains[0].sans=*.mydomain"
  • "traefik.http.routers.traefik-secure.service=api@internal"
  • "traefik.http.services.api@internal.loadbalancer.server.port=1337"

I think you need to create a fake port entry for Traefik internal services for Traefik Swarm discovery to work.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.