Traefik v2.3 not finding any services in Docker Swarm Stack

Hello!

I think this question has been asked many times before but it always seems to creep back up because of different nuances of uniqueness to every case.

Before you embark on this painful journey, let me give you the checklist that I assembled, the hard way:

  1. Are nodes on the same network?
$ docker network inspect `network-name`
  1. Is your overlay network, the one you use between Traefik and the other containers, an 'attachable' network? Go an try using an attachable network
  2. Disable TLS for all the routes and from the static configurations. See if that works. Start with the most basic non-TLS example
  3. Are you using the same domain on N routers? Try using a different domain, see if that works
  4. Do you have any labels that restrict where Traefik is being spun up? Remove those and see if that helps

Basically do the simplest working example ever and then start adding pieces of functionality one by one and see what causes trouble

tl;dr;

Traefik not discovering any services in a 3 node Swarm Stack cluster

tl;dr;

The 3 node cluster is made out of 3 Droplets on Digital Ocean, each one of those droplets has a DNS A record (same host name) pointing to the public IP of the Droplet.

Aim

What I'm trying to do is to get the example at dockerswarm.rocks/traefik working in a 3 node Swarm Stack Cluster

Current status:

/api/entrypoints

[
  {
    address: ':80',
    transport: {
      lifeCycle: { graceTimeOut: 10000000000 },
      respondingTimeouts: { idleTimeout: 180000000000 },
    },
    forwardedHeaders: {},
    http: {},
    name: 'web',
  },
  {
    address: ':443',
    transport: {
      lifeCycle: { graceTimeOut: 10000000000 },
      respondingTimeouts: { idleTimeout: 180000000000 },
    },
    forwardedHeaders: {},
    http: {},
    name: 'websecure',
  },
];

/api/routers

[
  {
    entryPoints: ['websecure'],
    service: 'api@internal',
    rule: 'Host(`${DOMAIN}?Variable not set`)',
    tls: { certResolver: 'le' },
    status: 'enabled',
    using: ['websecure'],
    name: 'traefik@docker',
    provider: 'docker',
  },
];

/api/services

[
  {
    status: 'enabled',
    usedBy: ['traefik@docker'],
    name: 'api@internal',
    provider: 'internal',
  },
  { status: 'enabled', name: 'dashboard@internal', provider: 'internal' },
  { status: 'enabled', name: 'noop@internal', provider: 'internal' },
  {
    loadBalancer: {
      servers: [{ url: 'http://10.0.16.4:8080' }],
      passHostHeader: true,
    },
    status: 'enabled',
    name: 'traefik@docker',
    provider: 'docker',
    type: 'loadbalancer',
  },
];

Configuration files

docker-compose.yml

version: "3.1"

services:

  traefik:
    image: traefik:2.3

  api:
    image: secretexpress/api

  postgres:
    image: postgres

docker-compose-staging.yml

version: "3.1"

services:

  traefik:
    ports:
      - 80:80
      - 443:443
    deploy:
      placement:
        constraints:
          - node.labels.traefik-certificates == true
      labels:
        - traefik.enable=true
        - traefik.docker.network=public
        - traefik.constraint-label=public
        - traefik.http.routers.traefik.tls=true
        - traefik.http.routers.traefik.tls.certresolver=le
        - traefik.http.routers.traefik.entrypoints=websecure
        - traefik.http.routers.traefik.rule=Host(`${DOMAIN?Variable not set}`)
        - traefik.http.routers.traefik.service=api@internal
        - traefik.http.services.traefik.loadbalancer.server.port=8080
    command:
      - --providers.docker
      - --providers.docker.constraints=Label(`traefik.constraint-label`, `public`)
      - --providers.docker.exposedbydefault=false
      #
      - --providers.docker.swarmMode=true
      #
      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entryPoint.to=websecure
      - --entrypoints.websecure.address=:443
      #
      - --certificatesresolvers.le.acme.email=${EMAIL?Variable not set}
      - --certificatesresolvers.le.acme.storage=/certificates/acme.json
      - --certificatesresolvers.le.acme.tlschallenge=true
      #
      - --accesslog
      - --log
      - --api
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - traefik-certificates:/certificates
    networks:
      - public

  api:
    deploy:
      labels:
      - traefik.enable=true
      - traefik.docker.network=public
      #
      - traefik.http.routers.api.tls=true
      - traefik.http.routers.api.tls.certresolver=le
      #
      - traefik.http.routers.api.entrypoints=websecure
      - traefik.http.routers.api.rule=Host(`${DOMAIN?Variable not set}`) && PathPrefix(`/api`)
      - traefik.http.routers.api.middlewares=api-stripprefix
      - traefik.http.middlewares.api-stripprefix.stripprefix.prefixes=/api
      - traefik.http.services.api-service.loadbalancer.server.port=3000
    environment:
      - API_ENV_FILE=/run/secrets/api_env
    secrets:
      - api_env
    networks:
      - public

  postgres:
    environment:
      - POSTGRES_USER_FILE=/run/secrets/psql_user
      - POSTGRES_PASSWORD_FILE=/run/secrets/psql_password
    secrets:
      - psql_user
      - psql_password
    networks:
      - public

networks:
  public:
    external: true

volumes:
  traefik-certificates:

secrets:
  api_env:
    external: true
  #
  psql_user:
    external: true
  psql_password:
    external: true

I'm deploying my stack in the following way:

$ docker stack deploy --compose-file docker-compose.yml -c docker-compose.staging.yml --with-registry-auth tma1

And here are some details about my cluster

$ docker stack ls
NAME      SERVICES   ORCHESTRATOR
tma1      3          Swarm

$ docker stack ps tma1
ID             NAME              IMAGE                      NODE      DESIRED STATE   CURRENT STATE            ERROR     PORTS
nve5grdy466q   tma1_api.1        secretexpress/api:latest   tma-2     Running         Running 22 minutes ago
r9lfawkkjbv6   tma1_postgres.1   postgres:latest            tma-4     Running         Running 22 minutes ago
q8n3o1wncjcn   tma1_traefik.1    traefik:2.3                tma-2     Running         Running 22 minutes ago

root@tma-2:~# docker node ls
ID                            HOSTNAME   STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
wivscyn7mbqlb2svv39ho1zsi *   tma-2      Ready     Active         Leader           20.10.2
o2pq2zdoevbgp3b8hkvhlbw8a     tma-3      Ready     Active                          20.10.2
zyp4nwh3ym8xydlmla1uy6ewo     tma-4      Ready     Active         Reachable        20.10.2

root@tma-2:~# docker node ps
ID             NAME             IMAGE                      NODE      DESIRED STATE   CURRENT STATE            ERROR     PORTS
nve5grdy466q   tma1_api.1       secretexpress/api:latest   tma-2     Running         Running 27 minutes ago
q8n3o1wncjcn   tma1_traefik.1   traefik:2.3                tma-2     Running         Running 27 minutes ago

When viewing the Traefik instance logs, there is no mention whatsoever of the api service

root@tma-2:~# docker service logs -f a75am0o0pne0
tma1_traefik.1.q8n3o1wncjcn@tma-2    | time="2021-02-02T08:32:18Z" level=info msg="Configuration loaded from flags."
tma1_traefik.1.q8n3o1wncjcn@tma-2    | 10.0.0.2 - - [02/Feb/2021:08:32:24 +0000] "GET / HTTP/2.0" 302 34 "-" "-" 1 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2    | 10.0.0.2 - - [02/Feb/2021:08:32:24 +0000] "GET /dashboard/ HTTP/2.0" 304 0 "-" "-" 2 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2    | 10.0.0.2 - - [02/Feb/2021:08:32:25 +0000] "GET /api/version HTTP/2.0" 200 85 "-" "-" 3 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2    | 10.0.0.2 - - [02/Feb/2021:08:32:25 +0000] "GET /api/version HTTP/2.0" 200 85 "-" "-" 4 "traefik@docker" "-" 0ms

Last but not least, the network on which the Swarm runs is the public network

root@tma-2:~# docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
eb9307d02fe7   bridge            bridge    local
4b68edde0727   docker_gwbridge   bridge    local
fd35d8cf012f   host              host      local
xi7i24kk6rfa   ingress           overlay   swarm
2ecb3b601edd   none              null      local
odng3c8kllyw   public            overlay   swarm

Now for the million dollar question. What am I doing wrong?

Edit:

I think the service might not be added to the network

$ docker network inspect public
[
    {
        "Name": "public",
        "Id": "igo934icpt7v1ga90nfwq1k0b",
        "Created": "2021-02-02T19:38:55.101260242Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.9.0/24",
                    "Gateway": "10.0.9.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "af963274da2aeb99b9603d22c501a15b8bb02a7568c524a5dfc52bbbcf150290": {
                "Name": "tma1_postgres.1.tadfw3v5mewi16m67dm0d14vy",
                "EndpointID": "0af39804821b873fbadbc36d9cd3a6c9a287e9780090057b93681b8629db23f5",
                "MacAddress": "02:42:0a:00:09:0c",
                "IPv4Address": "10.0.9.12/24",
                "IPv6Address": ""
            },
            "b6521b845b7ee5c77750a36ca10193060958b66154ed3115a13e1de9eec9bcc0": {
                "Name": "tma1_traefik.1.nqlieh2e8f72rq7dos1kud9nv",
                "EndpointID": "a6f9bc32fd1bef628ae81fef3abaff7f84e1b9581c73b430d5d367c2c84d2f7f",
                "MacAddress": "02:42:0a:00:09:0f",
                "IPv4Address": "10.0.9.15/24",
                "IPv6Address": ""
            },
            "lb-public": {
                "Name": "public-endpoint",
                "EndpointID": "068ca9a1e63b8f1d236cc505a84bdcd70dee0236634ab23a0e2556a44b659982",
                "MacAddress": "02:42:0a:00:09:0d",
                "IPv4Address": "10.0.9.13/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4105"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "95da2126a35e",
                "IP": "10.116.0.3"
            },
            {
                "Name": "1064ccfe47e7",
                "IP": "10.116.0.2"
            }
        ]
    }
]

Edit 2:

Turns out that you need to create an attachable network overlay. Quote:

To create an overlay network which can be used by swarm services or standalone containers to communicate with other standalone containers running on other Docker daemons, add the --attachable flag:

$ docker network create -d overlay --attachable public

At least now I got the containers on the same network:

docker network inspect public
[
    {
        "Name": "public",
        "Id": "yq9lzu9qfabh8w9e9apqdkg27",
        "Created": "2021-02-02T19:55:01.222053595Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.14.0/24",
                    "Gateway": "10.0.14.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "049e0b8c7151f59b0f7edf9e649f23a94c8c68898b7237a9c8b5906f28a18046": {
                "Name": "tma1_traefik.1.l5ptux6kys1lual26s42e2uul",
                "EndpointID": "ccbc46084f6282f06af1d7cf511adff4a9436c98c51422a1a61cda7ebf71d97a",
                "MacAddress": "02:42:0a:00:0e:11",
                "IPv4Address": "10.0.14.17/24",
                "IPv6Address": ""
            },
            "13436984d76ca15ef7265bee0d69af893ef90850d7c02562345ca9bfad88bd62": {
                "Name": "tma1_api.1.z05pn34czvdypfqgpdsjty8ax",
                "EndpointID": "7ba4b3edb2342e01debf05553c297ee8a2811e6c0657cab31f389124afaf395f",
                "MacAddress": "02:42:0a:00:0e:0b",
                "IPv4Address": "10.0.14.11/24",
                "IPv6Address": ""
            },
            "lb-public": {
                "Name": "public-endpoint",
                "EndpointID": "032b8316b147313172adc49ae462894eb4ad144c7074202a3c6e28ddc48c6e9d",
                "MacAddress": "02:42:0a:00:0e:0c",
                "IPv4Address": "10.0.14.12/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4110"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "fbe7045c9292",
                "IP": "10.116.0.4"
            },
            {
                "Name": "95da2126a35e",
                "IP": "10.116.0.3"
            }
        ]
    }
]

Edit 3: No, that's not it
Edit 4:

Turns out the culprits, which caused Traefik in my configuration to not pick up the services, where the following labels:

- traefik.constraint-label=public

And the following commands:

- --providers.docker.constraints=Label(`traefik.constraint-label`, `public`)

I also decided to turn off TLS in the deployed Docker stack as it was also causing bugs, pending further investigation. This translates in the following commands getting removed from the Traefik container:

- --entrypoints.web.http.redirections.entryPoint.to=websecure
- --entrypoints.websecure.address=:443
- --certificatesresolvers.le.acme.email=${EMAIL?Variable not set}
- --certificatesresolvers.le.acme.storage=/certificates/acme.json
- --certificatesresolvers.le.acme.tlschallenge=true

This also means that I went out and removed any TLS options from the other containers:

- traefik.http.routers.api.tls=true
- traefik.http.routers.api.tls.certresolver=le

Last, but not least, the route path conflict. This was also causing issues. You cannot ( and I learned it the hard way ) route match on any other containers the path prefix of /api and obviously the same domain as the one used on the Traefik router configuration

This

- traefik.http.routers.api.rule=Host(`${DOMAIN?Variable not set}`) && PathPrefix(`/api`)

Got turned to this:

- traefik.http.routers.api.rule=Host(`api.something`)