I have a docker swarm running with one manager and two work nodes. Now have a very strange situation. Here is my traefik config
version: "3.7"
networks:
cluster_network:
driver: overlay
ipam:
driver: default
config:
- subnet: 192.168.99.0/24
services:
traefik:
image: traefik:v2.3.6
deploy:
mode: replicated
replicas: 1
placement:
constraints: [ node.role == manager ]
labels:
- traefik.enable=true
- traefik.http.routers.traefik_http.rule=Host(`dev.xxx.com`)
- traefik.http.routers.traefik_http.service=api@internal
- traefik.http.routers.traefik_http.tls.certresolver=letsencryptresolver
- traefik.http.routers.traefik_http.tls=true
- traefik.http.routers.traefik_http.entrypoints=webs
- traefik.http.routers.traefik_http.middlewares=auth
- traefik.http.services.admin.loadbalancer.server.port=8080
- traefik.http.middlewares.auth.basicauth.removeheader=true
- traefik.http.middlewares.auth.basicauth.users=admin:xxx
- traefik.docker.network=ks_cluster_network
command: >
--providers.docker
--providers.docker.endpoint=unix:///var/run/docker.sock
--providers.docker.exposedbydefault=false
--providers.docker.swarmmode=true
--providers.docker.network=ks_cluster_network
--entryPoints.web.address=:80
--entryPoints.webs.address=:443
--accesslog
--log.level=DEBUG
--api=true
--tracing=false
--api.dashboard=true
--api.insecure=true
--tracing.serviceName=admin
--serverstransport.insecureskipverify=true
--certificatesresolvers.letsencryptresolver.acme.httpchallenge=true
--certificatesresolvers.letsencryptresolver.acme.httpchallenge.entryPoint=web
--certificatesresolvers.letsencryptresolver.acme.email=admin@xxx.com
--certificatesresolvers.letsencryptresolver.acme.storage=/letsencrypt/acme.json
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- acme:/letsencrypt
Then I run traefik as a separate stack with docker stack deploy -c ./traefik.yml ks
.
Then I create another stack for services.
networks:
ks_cluster_network:
external: true
services:
whoami:
image: containous/whoami:latest
deploy:
mode: replicated
replicas: 1
placement:
constraints: [ node.hostname == node1 ]
labels:
- traefik.enable=true
- traefik.backend=whoami
- traefik.http.routers.whoami.rule=Host(`whoami.xxx.com`)
- traefik.http.routers.whoami.entrypoints=webs
- traefik.http.routers.whoami.tls.certresolver=letsencryptresolver
- traefik.http.routers.whoami.tls=true
- traefik.docker.network=ks_cluster_network
- traefik.http.services.whoami.loadbalancer.server.port=80
networks:
- ks_cluster_network
erp:
image: monogramm/docker-dolibarr
deploy:
mode: replicated
replicas: 1
placement:
constraints: [ node.hostname == node1 ]
labels:
- traefik.enable=true
- traefik.backend=erp
- traefik.http.routers.erp.rule=Host(`erp.xxx.com`)
- traefik.http.routers.erp.entrypoints=webs
- traefik.http.routers.erp.tls.certresolver=letsencryptresolver
- traefik.http.routers.erp.tls=true
- traefik.docker.network=ks_cluster_network
- traefik.http.services.erp.loadbalancer.server.port=80
networks:
- ks_cluster_network
whoami service running perfectly ok with "https://whoami.xxx.com". But erp "https://erp.xxx.com" doesn't work.
Using curl from traefik, it shows (192.168.99.13 is the ip of the erp container):
docker exec -it 95bb curl -v http://192.168.99.13:80
* Trying 192.168.99.13:80...
* TCP_NODELAY set
* Connected to 192.168.99.13 (192.168.99.13) port 80 (#0)
> GET / HTTP/1.1
> Host: 192.168.99.13
> User-Agent: curl/7.67.0
> Accept: */*
>
after a while
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
But network is reachable.
docker exec -it 95bb ping 192.168.99.13
PING 192.168.99.13:80 (192.168.99.13): 56 data bytes
64 bytes from 192.168.99.13: seq=0 ttl=64 time=70.196 ms
64 bytes from 192.168.99.13: seq=1 ttl=64 time=31.833 ms
64 bytes from 192.168.99.13: seq=2 ttl=64 time=30.375 ms
64 bytes from 192.168.99.13: seq=3 ttl=64 time=69.206 ms
Traefik log related to erp is only getting a close request after i terminate:
"GET / HTTP/2.0" 499 21 "-" "-" 1229 "erp@docker" "http://192.168.99.13:80" 36890ms
But if I run erp on node2, then erp works perfectly fine. In summary,
- traefik dashboard shows everything fine;
- if erp is deploy in node1 then erp stuck no error message, but can't access
- deploy in node2 then works fine.
i did clean out all iptable rules and restart docker, also recreate docker swarm cluster. Still the same situation.
What could be wrong?
Thanks,