Services timeout if on a swarm worker node

Sandroggy · September 25, 2023, 9:23am

I'm using traefik for over a year and is working great, but i have trouble making it run in swarm mode.

My setup

I have a docker swarm with 1 manager and 1 worker. Both are on an overlay network called "traefik_net"
Traefik run on the manager node and my web apps on the worker.

The issue

Service discovery work perfectly fine, the only issue is that i cannot access my services if deployed on the worker node, if i deploy them on the manager node i can access them.

If i try to access my webapp it load for few seconds and my browser return a 400 error.

What i've tryed

From within the traefik container I can make a wget on a service running in the worker node therefore i assume that the swarm network is working well.
Running my apps on the manager node, work well.
Look at traefik log on DEBUG mode, there is none when i try to access my service.
In the dasboard my services have correct IP:PORT in the 'Servers' section
As it work if my services run on the manager i assume my docker compose is correct.

Docker compose

Traefik

version: '3.8'

services:
  traefik:
    image: traefik:latest
    restart: always
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"

    environment:
      - OVH_APPLICATION_KEY=x
      - OVH_ENDPOINT=ovh-eu
      - OVH_CONSUMER_KEY=x
      - OVH_APPLICATION_SECRET=x

    command:
      - --api.insecure=true
      - --api.dashboard=true
      - --api.debug=true
      - --log.level=DEBUG
      - --providers.docker=true
      - --providers.docker.swarmMode=true
      - --providers.docker.network=traefik_net
      - --providers.docker.exposedByDefault=false
      - --entrypoints.http.address=:80
      - --entrypoints.https.address=:443
      - --entrypoints.http.http.redirections.entrypoint.to=https
      - --entrypoints.http.http.redirections.entrypoint.scheme=https
      - --providers.docker.network=traefik_net
      - --certificatesresolvers.sslresolver.acme.dnschallenge=true
      - --certificatesresolvers.sslresolver.acme.dnschallenge.provider=ovh
      - --certificatesresolvers.sslresolver.acme.email=x
      - --certificatesresolvers.sslresolver.acme.storage=/letsencrypt/acme.json

    deploy:
      mode: global
      placement:
        constraints:
          - node.role == manager
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.dashboard.entrypoints=https"
        - "traefik.http.routers.dashboard.tls.certresolver=sslresolver"
        - "traefik.http.routers.dashboard.rule=Host(`MY_DOMAIN`)"
        - "traefik.http.services.dumy.loadbalancer.server.port=9999"
        - "traefik.http.routers.dashboard.service=api@internal"
        - "traefik.http.routers.dashboard.middlewares=auth"

    networks:
      - traefik_net

    extra_hosts:
      - "host.docker.internal:host-gateway"

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - .:/traefik
      - ./letsencrypt:/letsencrypt

networks:
  traefik_net:
    external: true

Wep app

version: '3.8'

services:
  foxtv:
    env_file: .env
    image:  "registry.gitlab.com/..."
    command: npm run start
    deploy:
      replicas: 1
      labels:
        - traefik.enable=true
        - traefik.http.routers.foxtv.rule=Host(`MY_DOMAIN`)
        - traefik.http.routers.foxtv.tls.certresolver=sslresolver
        - traefik.http.routers.foxtv.service=foxtv
        - traefik.http.services.foxtv.loadbalancer.server.port=3010
    networks:
      - traefik_net

Thank you in advance for your help !

bluepuma77 · September 25, 2023, 8:04pm

Are you using the Docker Swarm Overlay Network over a VLAN/VSWITCH? Then make sure that you have the right MTU set.

It’s tricky to detect, as TCP packets (and http requests) below ~1400 bytes usually work, larger ones fail.

Sandroggy · September 26, 2023, 8:43am

Thanks for the lead but unfortunatly it doesn't seems that's the source of my issue.
I tested to ping between the node with high packet size and it work, also i can access my service if t dont go thought traefik so i think that the network side is ok.

I'm out of idea on where to look

bluepuma77 · September 26, 2023, 9:24am

How do you know the service is on worker? You set replicas: 1, but no constraint.

You use host.docker.internal, which is only supported by Docker Desktop.

You use --api.insecure=true, make sure to remove this when put in production. It will ignore any middlewares for auth, also you assign middlewares=auth which does not seem to exist.

Note that you can assign the certresolver globally on entrypoint. Maybe check your config against simple Traefik example.

Last note: you use MY_DOMAIN for Traefik dashboard and your service, of course they must be different.

Sandroggy · September 26, 2023, 1:58pm

Thanks for this feed back, i was checking where my services was running with docker command but adding a constraint is a good idea !

MY_DOMAIN is obviously set accordingly on my docker compose and i can confirme that it work well because if i force my service on the manager node everything works.

So using this new compose it does work if i use node.role == manager and i get a HTTP ERROR 400 from chrome after ~1min when using node.role == worker.

New docker compose

Wap app

version: '3.8'

services:
  foxtv:
    env_file: .env
    image:  "registry.gitlab.com/..."
    command: npm run start -- --filter=foxtv
    deploy:
      replicas: 1
      labels:
        - traefik.enable=true
        - traefik.http.routers.foxtv.rule=Host(`tv.web02...`)
        - traefik.http.routers.foxtv.service=foxtv
        - traefik.http.services.foxtv.loadbalancer.server.port=3010
        - traefik.http.routers.foxtv.tls.certresolver=sslresolver
      placement:
        constraints:
          - node.role == worker
    networks:
      - traefik_net

bluepuma77 · September 26, 2023, 2:21pm

So Traefik forward works when the target service runs on manager, but not on worker?

Is traefik_net a Docker Swarm overlay network?

Have you tried with a simple whoami service instead?

Here is the simple Swarm Traefik example.

Sandroggy · September 26, 2023, 2:50pm

So Traefik forward works when the target service runs on manager, but not on worker? YES

Is traefik_net a Docker Swarm overlay network? YES
And i can confirm that the network works because i can make a wget from the traefik container (docker exec) to the ip of the swarm network. Also this IP is the same as the one shown in the traefik dashboard (HTTP > Services => Servers) so traefik detect the correct IP and use the correct port

Have you tried with a simple whoami service instead? Not yet

bluepuma77 · September 26, 2023, 3:51pm

But the service is running ok on worker node? Did you login on worker node to your private repository so it can be pulled?

Sandroggy · September 26, 2023, 5:13pm

Yes the service is working fine. I can access it if I use the directly the IP of the worker node.

bluepuma77 · September 26, 2023, 6:13pm

How is that possible? You don't have a port exposed in your service/container.

Sandroggy · September 26, 2023, 6:32pm

You're right for this test I made I had to expose a port. I should have mentioned it in my previous post.

bluepuma77 · September 27, 2023, 5:44am

What infra & distribution are you running on?

Try whoami service with /data?size=10000. (Doc)

Sandroggy · September 27, 2023, 7:13am

Its 2 ubuntu VM on one proxmox server, they have a wireguard VPN to communicate (the swarm network is using the interface of this VPN)

I will try a simple who ami service.

bluepuma77 · September 27, 2023, 11:24am

When using a VPN, usually the MTU of TCP packets needs to be reduced. Make sure your Docker Swarm overlay network has a MTU that fits inside the VPN MTU.

You can try this (taken from ChatGPT):

Check current MTU:

docker network inspect my_overlay_network

Update MTU:

docker network update --opt com.docker.network.driver.mtu=1400 my_overlay_network

Sandroggy · September 28, 2023, 8:49am

A lower MTU has fixed my issue !

Thanks a lot for the time you took to help me!

system · October 1, 2023, 8:50am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Traefik gateway timeout for every service Traefik v2 docker-swarm	2	4345	January 8, 2022
Traefik 2.0 Swarm not working with Services on Ports Traefik v2 docker-swarm	18	15554	February 17, 2020
Can not reach host service Traefik v2 docker-swarm	4	1087	May 18, 2022
Traefik v2 no access to service Traefik v2 docker-swarm	3	1549	January 7, 2021
the problem with balancing in swarm mode Traefik v3 (latest) docker-swarm	2	124	May 30, 2024