Hello!
I think this question has been asked many times before but it always seems to creep back up because of different nuances of uniqueness to every case.
Before you embark on this painful journey, let me give you the checklist that I assembled, the hard way:
- Are nodes on the same network?
$ docker network inspect `network-name`
- Is your overlay network, the one you use between Traefik and the other containers, an 'attachable' network? Go an try using an attachable network
- Disable TLS for all the routes and from the static configurations. See if that works. Start with the most basic non-TLS example
- Are you using the same domain on N routers? Try using a different domain, see if that works
- Do you have any labels that restrict where Traefik is being spun up? Remove those and see if that helps
Basically do the simplest working example ever and then start adding pieces of functionality one by one and see what causes trouble
tl;dr;
Traefik not discovering any services in a 3 node Swarm Stack cluster
tl;dr;
The 3 node cluster is made out of 3 Droplets on Digital Ocean, each one of those droplets has a DNS A record (same host name) pointing to the public IP of the Droplet.
Aim
What I'm trying to do is to get the example at dockerswarm.rocks/traefik working in a 3 node Swarm Stack Cluster
Current status:
/api/entrypoints
[
{
address: ':80',
transport: {
lifeCycle: { graceTimeOut: 10000000000 },
respondingTimeouts: { idleTimeout: 180000000000 },
},
forwardedHeaders: {},
http: {},
name: 'web',
},
{
address: ':443',
transport: {
lifeCycle: { graceTimeOut: 10000000000 },
respondingTimeouts: { idleTimeout: 180000000000 },
},
forwardedHeaders: {},
http: {},
name: 'websecure',
},
];
/api/routers
[
{
entryPoints: ['websecure'],
service: 'api@internal',
rule: 'Host(`${DOMAIN}?Variable not set`)',
tls: { certResolver: 'le' },
status: 'enabled',
using: ['websecure'],
name: 'traefik@docker',
provider: 'docker',
},
];
/api/services
[
{
status: 'enabled',
usedBy: ['traefik@docker'],
name: 'api@internal',
provider: 'internal',
},
{ status: 'enabled', name: 'dashboard@internal', provider: 'internal' },
{ status: 'enabled', name: 'noop@internal', provider: 'internal' },
{
loadBalancer: {
servers: [{ url: 'http://10.0.16.4:8080' }],
passHostHeader: true,
},
status: 'enabled',
name: 'traefik@docker',
provider: 'docker',
type: 'loadbalancer',
},
];
Configuration files
docker-compose.yml
version: "3.1"
services:
traefik:
image: traefik:2.3
api:
image: secretexpress/api
postgres:
image: postgres
docker-compose-staging.yml
version: "3.1"
services:
traefik:
ports:
- 80:80
- 443:443
deploy:
placement:
constraints:
- node.labels.traefik-certificates == true
labels:
- traefik.enable=true
- traefik.docker.network=public
- traefik.constraint-label=public
- traefik.http.routers.traefik.tls=true
- traefik.http.routers.traefik.tls.certresolver=le
- traefik.http.routers.traefik.entrypoints=websecure
- traefik.http.routers.traefik.rule=Host(`${DOMAIN?Variable not set}`)
- traefik.http.routers.traefik.service=api@internal
- traefik.http.services.traefik.loadbalancer.server.port=8080
command:
- --providers.docker
- --providers.docker.constraints=Label(`traefik.constraint-label`, `public`)
- --providers.docker.exposedbydefault=false
#
- --providers.docker.swarmMode=true
#
- --entrypoints.web.address=:80
- --entrypoints.web.http.redirections.entryPoint.to=websecure
- --entrypoints.websecure.address=:443
#
- --certificatesresolvers.le.acme.email=${EMAIL?Variable not set}
- --certificatesresolvers.le.acme.storage=/certificates/acme.json
- --certificatesresolvers.le.acme.tlschallenge=true
#
- --accesslog
- --log
- --api
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- traefik-certificates:/certificates
networks:
- public
api:
deploy:
labels:
- traefik.enable=true
- traefik.docker.network=public
#
- traefik.http.routers.api.tls=true
- traefik.http.routers.api.tls.certresolver=le
#
- traefik.http.routers.api.entrypoints=websecure
- traefik.http.routers.api.rule=Host(`${DOMAIN?Variable not set}`) && PathPrefix(`/api`)
- traefik.http.routers.api.middlewares=api-stripprefix
- traefik.http.middlewares.api-stripprefix.stripprefix.prefixes=/api
- traefik.http.services.api-service.loadbalancer.server.port=3000
environment:
- API_ENV_FILE=/run/secrets/api_env
secrets:
- api_env
networks:
- public
postgres:
environment:
- POSTGRES_USER_FILE=/run/secrets/psql_user
- POSTGRES_PASSWORD_FILE=/run/secrets/psql_password
secrets:
- psql_user
- psql_password
networks:
- public
networks:
public:
external: true
volumes:
traefik-certificates:
secrets:
api_env:
external: true
#
psql_user:
external: true
psql_password:
external: true
I'm deploying my stack in the following way:
$ docker stack deploy --compose-file docker-compose.yml -c docker-compose.staging.yml --with-registry-auth tma1
And here are some details about my cluster
$ docker stack ls
NAME SERVICES ORCHESTRATOR
tma1 3 Swarm
$ docker stack ps tma1
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
nve5grdy466q tma1_api.1 secretexpress/api:latest tma-2 Running Running 22 minutes ago
r9lfawkkjbv6 tma1_postgres.1 postgres:latest tma-4 Running Running 22 minutes ago
q8n3o1wncjcn tma1_traefik.1 traefik:2.3 tma-2 Running Running 22 minutes ago
root@tma-2:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
wivscyn7mbqlb2svv39ho1zsi * tma-2 Ready Active Leader 20.10.2
o2pq2zdoevbgp3b8hkvhlbw8a tma-3 Ready Active 20.10.2
zyp4nwh3ym8xydlmla1uy6ewo tma-4 Ready Active Reachable 20.10.2
root@tma-2:~# docker node ps
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
nve5grdy466q tma1_api.1 secretexpress/api:latest tma-2 Running Running 27 minutes ago
q8n3o1wncjcn tma1_traefik.1 traefik:2.3 tma-2 Running Running 27 minutes ago
When viewing the Traefik instance logs, there is no mention whatsoever of the api
service
root@tma-2:~# docker service logs -f a75am0o0pne0
tma1_traefik.1.q8n3o1wncjcn@tma-2 | time="2021-02-02T08:32:18Z" level=info msg="Configuration loaded from flags."
tma1_traefik.1.q8n3o1wncjcn@tma-2 | 10.0.0.2 - - [02/Feb/2021:08:32:24 +0000] "GET / HTTP/2.0" 302 34 "-" "-" 1 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2 | 10.0.0.2 - - [02/Feb/2021:08:32:24 +0000] "GET /dashboard/ HTTP/2.0" 304 0 "-" "-" 2 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2 | 10.0.0.2 - - [02/Feb/2021:08:32:25 +0000] "GET /api/version HTTP/2.0" 200 85 "-" "-" 3 "traefik@docker" "-" 0ms
tma1_traefik.1.q8n3o1wncjcn@tma-2 | 10.0.0.2 - - [02/Feb/2021:08:32:25 +0000] "GET /api/version HTTP/2.0" 200 85 "-" "-" 4 "traefik@docker" "-" 0ms
Last but not least, the network on which the Swarm runs is the public
network
root@tma-2:~# docker network ls
NETWORK ID NAME DRIVER SCOPE
eb9307d02fe7 bridge bridge local
4b68edde0727 docker_gwbridge bridge local
fd35d8cf012f host host local
xi7i24kk6rfa ingress overlay swarm
2ecb3b601edd none null local
odng3c8kllyw public overlay swarm
Now for the million dollar question. What am I doing wrong?
Edit:
I think the service might not be added to the network
$ docker network inspect public
[
{
"Name": "public",
"Id": "igo934icpt7v1ga90nfwq1k0b",
"Created": "2021-02-02T19:38:55.101260242Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.9.0/24",
"Gateway": "10.0.9.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"af963274da2aeb99b9603d22c501a15b8bb02a7568c524a5dfc52bbbcf150290": {
"Name": "tma1_postgres.1.tadfw3v5mewi16m67dm0d14vy",
"EndpointID": "0af39804821b873fbadbc36d9cd3a6c9a287e9780090057b93681b8629db23f5",
"MacAddress": "02:42:0a:00:09:0c",
"IPv4Address": "10.0.9.12/24",
"IPv6Address": ""
},
"b6521b845b7ee5c77750a36ca10193060958b66154ed3115a13e1de9eec9bcc0": {
"Name": "tma1_traefik.1.nqlieh2e8f72rq7dos1kud9nv",
"EndpointID": "a6f9bc32fd1bef628ae81fef3abaff7f84e1b9581c73b430d5d367c2c84d2f7f",
"MacAddress": "02:42:0a:00:09:0f",
"IPv4Address": "10.0.9.15/24",
"IPv6Address": ""
},
"lb-public": {
"Name": "public-endpoint",
"EndpointID": "068ca9a1e63b8f1d236cc505a84bdcd70dee0236634ab23a0e2556a44b659982",
"MacAddress": "02:42:0a:00:09:0d",
"IPv4Address": "10.0.9.13/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4105"
},
"Labels": {},
"Peers": [
{
"Name": "95da2126a35e",
"IP": "10.116.0.3"
},
{
"Name": "1064ccfe47e7",
"IP": "10.116.0.2"
}
]
}
]
Edit 2:
Turns out that you need to create an attachable
network overlay. Quote:
To create an overlay network which can be used by swarm services or standalone containers to communicate with other standalone containers running on other Docker daemons, add the --attachable flag:
$ docker network create -d overlay --attachable public
At least now I got the containers on the same network:
docker network inspect public
[
{
"Name": "public",
"Id": "yq9lzu9qfabh8w9e9apqdkg27",
"Created": "2021-02-02T19:55:01.222053595Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.14.0/24",
"Gateway": "10.0.14.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"049e0b8c7151f59b0f7edf9e649f23a94c8c68898b7237a9c8b5906f28a18046": {
"Name": "tma1_traefik.1.l5ptux6kys1lual26s42e2uul",
"EndpointID": "ccbc46084f6282f06af1d7cf511adff4a9436c98c51422a1a61cda7ebf71d97a",
"MacAddress": "02:42:0a:00:0e:11",
"IPv4Address": "10.0.14.17/24",
"IPv6Address": ""
},
"13436984d76ca15ef7265bee0d69af893ef90850d7c02562345ca9bfad88bd62": {
"Name": "tma1_api.1.z05pn34czvdypfqgpdsjty8ax",
"EndpointID": "7ba4b3edb2342e01debf05553c297ee8a2811e6c0657cab31f389124afaf395f",
"MacAddress": "02:42:0a:00:0e:0b",
"IPv4Address": "10.0.14.11/24",
"IPv6Address": ""
},
"lb-public": {
"Name": "public-endpoint",
"EndpointID": "032b8316b147313172adc49ae462894eb4ad144c7074202a3c6e28ddc48c6e9d",
"MacAddress": "02:42:0a:00:0e:0c",
"IPv4Address": "10.0.14.12/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4110"
},
"Labels": {},
"Peers": [
{
"Name": "fbe7045c9292",
"IP": "10.116.0.4"
},
{
"Name": "95da2126a35e",
"IP": "10.116.0.3"
}
]
}
]
Edit 3: No, that's not it
Edit 4:
Turns out the culprits, which caused Traefik in my configuration to not pick up the services, where the following labels:
- traefik.constraint-label=public
And the following commands:
- --providers.docker.constraints=Label(`traefik.constraint-label`, `public`)
I also decided to turn off TLS in the deployed Docker stack as it was also causing bugs, pending further investigation. This translates in the following commands getting removed from the Traefik container:
- --entrypoints.web.http.redirections.entryPoint.to=websecure
- --entrypoints.websecure.address=:443
- --certificatesresolvers.le.acme.email=${EMAIL?Variable not set}
- --certificatesresolvers.le.acme.storage=/certificates/acme.json
- --certificatesresolvers.le.acme.tlschallenge=true
This also means that I went out and removed any TLS options from the other containers:
- traefik.http.routers.api.tls=true
- traefik.http.routers.api.tls.certresolver=le
Last, but not least, the route path conflict. This was also causing issues. You cannot ( and I learned it the hard way ) route match on any other containers the path prefix of /api
and obviously the same domain as the one used on the Traefik router configuration
This
- traefik.http.routers.api.rule=Host(`${DOMAIN?Variable not set}`) && PathPrefix(`/api`)
Got turned to this:
- traefik.http.routers.api.rule=Host(`api.something`)