Hello!
I think this question has been asked many times before but it always seems to creep back up because of different nuances of uniqueness to every case.
Before you embark on this painful journey, let me give you the checklist that I assembled, the hard way:
- Are nodes on the same network?
$ docker network inspect `network-name`
- Is your overlay network, the one you use between Traefik and the other containers, an 'attachable' network? Go an try using an attachable network
- Disable TLS for all the routes and from the static configurations. See if that works. Start with the most basic non-TLS example
- Are you using the same domain on N routers? Try using a different domain, see if that works
- Do you have any labels that restrict where Traefik is being spun up? Remove those and see if that helps
Basically do the simplest working example ever and then start adding pieces of functionality one by one and see what causes trouble
tl;dr;
Traefik not discovering any services in a 3 node Swarm Stack cluster
tl;dr;
The 3 node cluster is made out of 3 Droplets on Digital Ocean, each one of those droplets has a DNS A record (same host name) pointing to the public IP of the Droplet.
Aim
What I'm trying to do is to get the example at dockerswarm.rocks/traefik working in a 3 node Swarm Stack Cluster
Current status:
/api/entrypoints
[
  {
    address: ':80',
    transport: {
      lifeCycle: { graceTimeOut: 10000000000 },
      respondingTimeouts: { idleTimeout: 180000000000 },
    },
    forwardedHeaders: {},
    http: {},
    name: 'web',
  },
  {
    address: ':443',
    transport: {
      lifeCycle: { graceTimeOut: 10000000000 },
      respondingTimeouts: { idleTimeout: 180000000000 },
    },
    forwardedHeaders: {},
    http: {},
    name: 'websecure',
  },
];
/api/routers
[
  {
    entryPoints: ['websecure'],
    service: 'api@internal',
    rule: 'Host(`${DOMAIN}?Variable not set`)',
    tls: { certResolver: 'le' },
    status: 'enabled',
    using: ['websecure'],
    name: 'traefik@docker',
    provider: 'docker',
  },
];
/api/services
[
  {
    status: 'enabled',
    usedBy: ['traefik@docker'],
    name: 'api@internal',
    provider: 'internal',
  },
  { status: 'enabled', name: 'dashboard@internal', provider: 'internal' },
  { status: 'enabled', name: 'noop@internal', provider: 'internal' },
  {
    loadBalancer: {
      servers: [{ url: 'http://10.0.16.4:8080' }],
      passHostHeader: true,
    },
    status: 'enabled',
    name: 'traefik@docker',
    provider: 'docker',
    type: 'loadbalancer',
  },
];
Configuration files
docker-compose.yml
version: "3.1"
services:
  traefik:
    image: traefik:2.3
  api:
    image: secretexpress/api
  postgres:
    image: postgres
docker-compose-staging.yml
version: "3.1"
services:
  traefik:
    ports:
      - 80:80
      - 443:443
    deploy:
      placement:
        constraints:
          - node.labels.traefik-certificates == true
      labels:
        - traefik.enable=true
        - traefik.docker.network=public
        - traefik.constraint-label=public
        - traefik.http.routers.traefik.tls=true
        - traefik.http.routers.traefik.tls.certresolver=le
        - traefik.http.routers.traefik.entrypoints=websecure
        - traefik.http.routers.traefik.rule=Host(`${DOMAIN?Variable not set}`)
        - traefik.http.routers.traefik.service=api@internal
        - traefik.http.services.traefik.loadbalancer.server.port=8080
    command:
      - --providers.docker
      - --providers.docker.constraints=Label(`traefik.constraint-label`, `public`)
      - --providers.docker.exposedbydefault=false
      #
      - --providers.docker.swarmMode=true
      #
      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entryPoint.to=websecure
      - --entrypoints.websecure.address=:443
      #
      - --certificatesresolvers.le.acme.email=${EMAIL?Variable not set}
      - --certificatesresolvers.le.acme.storage=/certificates/acme.json
      - --certificatesresolvers.le.acme.tlschallenge=true
      #
      - --accesslog
      - --log
      - --api
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - traefik-certificates:/certificates
    networks:
      - public
  api:
    deploy:
      labels:
      - traefik.enable=true
      - traefik.docker.network=public
      #
      - traefik.http.routers.api.tls=true
      - traefik.http.routers.api.tls.certresolver=le
      #
      - traefik.http.routers.api.entrypoints=websecure
      - traefik.http.routers.api.rule=Host(`${DOMAIN?Variable not set}`) && PathPrefix(`/api`)
      - traefik.http.routers.api.middlewares=api-stripprefix
      - traefik.http.middlewares.api-stripprefix.stripprefix.prefixes=/api
      - traefik.http.services.api-service.loadbalancer.server.port=3000
    environment:
      - API_ENV_FILE=/run/secrets/api_env
    secrets:
      - api_env
    networks:
      - public
  postgres:
    environment:
      - POSTGRES_USER_FILE=/run/secrets/psql_user
      - POSTGRES_PASSWORD_FILE=/run/secrets/psql_password
    secrets:
      - psql_user
      - psql_password
    networks:
      - public
networks:
  public:
    external: true
volumes:
  traefik-certificates:
secrets:
  api_env:
    external: true
  #
  psql_user:
    external: true
  psql_password:
    external: true
I'm deploying my stack in the following way:
$ docker stack deploy --compose-file docker-compose.yml -c docker-compose.staging.yml --with-registry-auth tma1
And here are some details about my cluster
$ docker stack ls
NAME      SERVICES   ORCHESTRATOR
tma1      3          Swarm
$ docker stack ps tma1
ID             NAME              IMAGE                      NODE      DESIRED STATE   CURRENT STATE            ERROR     PORTS
nve5grdy466q   tma1_api.1        secretexpress/api:latest   tma-2     Running         Running 22 minutes ago
r9lfawkkjbv6   tma1_postgres.1   postgres:latest            tma-4     Running         Running 22 minutes ago
q8n3o1wncjcn   tma1_traefik.1    traefik:2.3                tma-2     Running         Running 22 minutes ago
root@tma-2:~# docker node ls
ID                            HOSTNAME   STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
wivscyn7mbqlb2svv39ho1zsi *   tma-2      Ready     Active         Leader           20.10.2
o2pq2zdoevbgp3b8hkvhlbw8a     tma-3      Ready     Active                          20.10.2
zyp4nwh3ym8xydlmla1uy6ewo     tma-4      Ready     Active         Reachable        20.10.2
root@tma-2:~# docker node ps
ID             NAME             IMAGE                      NODE      DESIRED STATE   CURRENT STATE            ERROR     PORTS
nve5grdy466q   tma1_api.1       secretexpress/api:latest   tma-2     Running         Running 27 minutes ago
q8n3o1wncjcn   tma1_traefik.1   traefik:2.3                tma-2     Running         Running 27 minutes ago
When viewing the Traefik instance logs, there is no mention whatsoever of the api service
root@tma-2:~# docker service logs -f a75am0o0pne0
tma1_traefik.1.q8n3o1wncjcn@tma-2    | time="2021-02-02T08:32:18Z" level=info msg="Configuration loaded from flags."
tma1_traefik.1.q8n3o1wncjcn@tma-2    | 10.0.0.2 - - [02/Feb/2021:08:32:24 +0000] "GET / HTTP/2.0" 302 34 "-" "-" 1 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2    | 10.0.0.2 - - [02/Feb/2021:08:32:24 +0000] "GET /dashboard/ HTTP/2.0" 304 0 "-" "-" 2 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2    | 10.0.0.2 - - [02/Feb/2021:08:32:25 +0000] "GET /api/version HTTP/2.0" 200 85 "-" "-" 3 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2    | 10.0.0.2 - - [02/Feb/2021:08:32:25 +0000] "GET /api/version HTTP/2.0" 200 85 "-" "-" 4 "traefik@docker" "-" 0ms
Last but not least, the network on which the Swarm runs is the public network
root@tma-2:~# docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
eb9307d02fe7   bridge            bridge    local
4b68edde0727   docker_gwbridge   bridge    local
fd35d8cf012f   host              host      local
xi7i24kk6rfa   ingress           overlay   swarm
2ecb3b601edd   none              null      local
odng3c8kllyw   public            overlay   swarm
Now for the million dollar question. What am I doing wrong?
Edit:
I think the service might not be added to the network
$ docker network inspect public
[
    {
        "Name": "public",
        "Id": "igo934icpt7v1ga90nfwq1k0b",
        "Created": "2021-02-02T19:38:55.101260242Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.9.0/24",
                    "Gateway": "10.0.9.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "af963274da2aeb99b9603d22c501a15b8bb02a7568c524a5dfc52bbbcf150290": {
                "Name": "tma1_postgres.1.tadfw3v5mewi16m67dm0d14vy",
                "EndpointID": "0af39804821b873fbadbc36d9cd3a6c9a287e9780090057b93681b8629db23f5",
                "MacAddress": "02:42:0a:00:09:0c",
                "IPv4Address": "10.0.9.12/24",
                "IPv6Address": ""
            },
            "b6521b845b7ee5c77750a36ca10193060958b66154ed3115a13e1de9eec9bcc0": {
                "Name": "tma1_traefik.1.nqlieh2e8f72rq7dos1kud9nv",
                "EndpointID": "a6f9bc32fd1bef628ae81fef3abaff7f84e1b9581c73b430d5d367c2c84d2f7f",
                "MacAddress": "02:42:0a:00:09:0f",
                "IPv4Address": "10.0.9.15/24",
                "IPv6Address": ""
            },
            "lb-public": {
                "Name": "public-endpoint",
                "EndpointID": "068ca9a1e63b8f1d236cc505a84bdcd70dee0236634ab23a0e2556a44b659982",
                "MacAddress": "02:42:0a:00:09:0d",
                "IPv4Address": "10.0.9.13/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4105"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "95da2126a35e",
                "IP": "10.116.0.3"
            },
            {
                "Name": "1064ccfe47e7",
                "IP": "10.116.0.2"
            }
        ]
    }
]
Edit 2:
Turns out that you need to create an attachable network overlay. Quote:
To create an overlay network which can be used by swarm services or standalone containers to communicate with other standalone containers running on other Docker daemons, add the --attachable flag:
$ docker network create -d overlay --attachable public
At least now I got the containers on the same network:
docker network inspect public
[
    {
        "Name": "public",
        "Id": "yq9lzu9qfabh8w9e9apqdkg27",
        "Created": "2021-02-02T19:55:01.222053595Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.14.0/24",
                    "Gateway": "10.0.14.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "049e0b8c7151f59b0f7edf9e649f23a94c8c68898b7237a9c8b5906f28a18046": {
                "Name": "tma1_traefik.1.l5ptux6kys1lual26s42e2uul",
                "EndpointID": "ccbc46084f6282f06af1d7cf511adff4a9436c98c51422a1a61cda7ebf71d97a",
                "MacAddress": "02:42:0a:00:0e:11",
                "IPv4Address": "10.0.14.17/24",
                "IPv6Address": ""
            },
            "13436984d76ca15ef7265bee0d69af893ef90850d7c02562345ca9bfad88bd62": {
                "Name": "tma1_api.1.z05pn34czvdypfqgpdsjty8ax",
                "EndpointID": "7ba4b3edb2342e01debf05553c297ee8a2811e6c0657cab31f389124afaf395f",
                "MacAddress": "02:42:0a:00:0e:0b",
                "IPv4Address": "10.0.14.11/24",
                "IPv6Address": ""
            },
            "lb-public": {
                "Name": "public-endpoint",
                "EndpointID": "032b8316b147313172adc49ae462894eb4ad144c7074202a3c6e28ddc48c6e9d",
                "MacAddress": "02:42:0a:00:0e:0c",
                "IPv4Address": "10.0.14.12/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4110"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "fbe7045c9292",
                "IP": "10.116.0.4"
            },
            {
                "Name": "95da2126a35e",
                "IP": "10.116.0.3"
            }
        ]
    }
]
Edit 3: No, that's not it
Edit 4:
Turns out the culprits, which caused Traefik in my configuration to not pick up the services, where the following labels:
- traefik.constraint-label=public
And the following commands:
- --providers.docker.constraints=Label(`traefik.constraint-label`, `public`)
I also decided to turn off TLS in the deployed Docker stack as it was also causing bugs, pending further investigation. This translates in the following commands getting removed from the Traefik container:
- --entrypoints.web.http.redirections.entryPoint.to=websecure
- --entrypoints.websecure.address=:443
- --certificatesresolvers.le.acme.email=${EMAIL?Variable not set}
- --certificatesresolvers.le.acme.storage=/certificates/acme.json
- --certificatesresolvers.le.acme.tlschallenge=true
This also means that I went out and removed any TLS options from the other containers:
- traefik.http.routers.api.tls=true
- traefik.http.routers.api.tls.certresolver=le
Last, but not least, the route path conflict. This was also causing issues. You cannot ( and I learned it the hard way ) route match on any other containers the path prefix of /api and obviously the same domain as the one used on the Traefik router configuration
This
- traefik.http.routers.api.rule=Host(`${DOMAIN?Variable not set}`) && PathPrefix(`/api`)
Got turned to this:
- traefik.http.routers.api.rule=Host(`api.something`)