Docker swarm multi manager with traefik

Hi all,
I have a 5 node swarm (3 managers and 2 worker), managers use a vip address with keepalived that insist on the first master, if it goes down jump to the second etc... all the stuff about entrypoint are made by traefik.
I'm quite new to this kind of infrastructure and maybe my request is silly.
If a schedule a service (let's say grafana) on the first master everything work fine, if i schedule grafana on the second/third master i have a 504 gateway timeout error.
Could be a traefik bad configuration?
If i go directly to the ip of node i could reach grafana so it seems that is a traefik issue.
No firewall between nodes, atm i don't need tls/https/certs

Docker compose grafana

version: '3'
services:
  prometheus:
    image: prom/prometheus:latest
    deploy:
      labels:
        - "traefik.enable=false"
    networks:
      - monitoring
    ports:
      - 9090:9090
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.console.libraries=/usr/share/prometheus/console_libraries
      - --web.console.templates=/usr/share/prometheus/consoles
    volumes:
      - /mnt/docker_data/volumes/monitoring/prometheus:/etc/prometheus/
      - /mnt/docker_data/volumes/monitoring/prometheus_data:/prometheus
  grafana:
    image: grafana/grafana
    ports:
      - 3000:3000
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=xxxxxxxxxxxxxxxxxxxx
      - GF_SERVER_DOMAIN=monitoring.example.local
    depends_on:
      - prometheus
    volumes:
      - /mnt/docker_data/volumes/monitoring/grafana-storage:/var/lib/grafana
    networks:
      - traefik_public
      - monitoring
    deploy:
      placement:
        constraints: [node.role == manager]
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=traefik_public"
        - "traefik.http.routers.grafana_ict.rule=Host(`monitoring.example.local`)"
        - "traefik.port=3000"
        - "traefik.http.routers.grafana_ict.service=grafana_ict"
        - "traefik.http.routers.grafana_ict.entrypoints=http"
        - "traefik.http.services.grafana_ict.loadbalancer.server.port=3000"

networks:
  traefik_public:
    external: true
  monitoring:
    external: true

If i inspect the container i see the assigned ip for traefik_public and is the same i see into traefik dashboard as load balanced ip/port.
Traefik log report only

time="2023-03-24T11:29:23Z" level=debug msg="'504 Gateway Timeout' caused by: dial tcp 10.0.5.141:3000: i/o timeout"

10.0.5.141 is the ip assigned on master2

I'm expecting that traefik will route traffic despite the master node is not the 1st node.
I cannot figure if is my configuration error or if treafink dosen't work in this way.
Thanks in advance

Tell Traefik which docker.network to use to forward requests to the service. Your service has multiple but Traefik probably just shares one with it. You can set that globally in provider.docker or per service with labels.

I already specified in grafana container, in prometheus i don't need to expose via traefik for now, it don't work sadly

Did you define it as overlay network? If using compose to create the network, did you give it a name (otherwise the name may be extended by compose)?

PS: we run multi-manager Docker Swarm with Traefik and Grafana, works like a charme, so it is possible.

Yes, it is an overlay network created by docker cli, I'm fighting this is sue since months

Some steps ahead, now grafana work as intended, i just need to connect prometheus, I want not to pass by http endpoint for answer to http://monitoring.example.local but forcing it to http://monitoring.example.local:9090, how can i do? i try with

Version: '3'
services:
  prometheus:
    image: prom/prometheus:latest
    deploy:
      labels:
        - "traefik.enable=false"
        - "traefik.docker.network=traefik_public"
        - "traefik.http.routers.prometheus_ict.rule=Host(`monitoring-example.local`)"
        - "traefik.http.services.prometheus_ict.loadbalancer.server.port=9090"
        - "traefik.http.routers.prometheus_ict.service=prometheus_ict"
        - "traefik.port=9090"
    networks:
      - monitoring
      - traefik_public
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.console.libraries=/usr/share/prometheus/console_libraries
      - --web.console.templates=/usr/share/prometheus/consoles
    volumes:
      - /mnt/docker_data/volumes/monitoring/prometheus:/etc/prometheus/
      - /mnt/docker_data/volumes/monitoring/prometheus_data:/prometheus
  grafana:
    image: grafana/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=xxxxxxxxxxxxxxxxxxxx
      - GF_SERVER_DOMAIN=monitoring-example.local
    depends_on:
      - prometheus
    volumes:
      - /mnt/docker_data/volumes/monitoring/grafana-storage:/var/lib/grafana
    networks:
      - traefik_public
      - monitoring
    deploy:
      placement:
        constraints: [node.role == manager]
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=traefik_public"
        - "traefik.http.routers.grafana_ict.rule=Host(`monitoring.example.local`)"
        - "traefik.port=3000"
        - "traefik.http.routers.grafana_ict.service=grafana_ict"
        - "traefik.http.routers.grafana_ict.entrypoints=http"
        - "traefik.http.services.grafana_ict.loadbalancer.server.port=3000"

networks:
  traefik_public:
    external: true
  monitoring:
    external: true

But everything stop working: 504 from grafana on monitoring.example.local (even if untouched) and 404 on monitoring.example.local:9090.
I know the issue is just on traefik conf but i don't know how to solve it

You are using Host(`monitoring.example.local`) and Host(`monitoring-example.local`), is that on purpose?

You want Grafana to use Traefik to connect to Prometheus? On external (Traefik) port 9000? Then you need to enable a Traefik entrypoint on port 9000 and let Prometheus use it.

Or you can let Grafana connect to Prometheus using the Docker network monitoring (without Traefik), the Docker DNS should resolve prometheus to the internal IP.

The host has to be intended the same, i made an error redacting the original website in compose.
I solved the issue using internal "monitoring" network, the desired behaviour should be
monitoring.example.com - grafana
monitoring.example.com:9090 - prometheus
I will add an entrypoint and start testing around, thank you :slight_smile: