Traefik cannot obtain service containers of other nodes on the manager node

traefik-stack.yml

version: '3.3'

services:
  traefik:
    image: traefik:v3.0
    container_name: traefik
    restart: unless-stopped
    command:
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.swarm=true"
      - "--providers.docker.network=dms-overlay-network"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
    ports:
      - "9080:80"
      - "9880:8080"
      - "9443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./traefik.yml:/traefik.yml
      - ./certs/certfile.crt:/certs/certfile.crt
      - ./certs/keyfile.key:/certs/keyfile.key
    networks:
      - dms-overlay-network
    deploy:
      placement:
        constraints:
          - node.role == manager

networks:
  dms-overlay-network:
    external: true

traefik.yml

entryPoints:
  web:
    address: ":80"
  websecure:
    address: ":443"

# Docker configuration backend
providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    network: "dms-overlay-network"
    watch: true
    defaultRule: "Host(`{{ trimPrefix `/` .Name }}.demo.com`)"

# API and Dashboard settings
api:
  insecure: true
  dashboard: true

# Log settings
log:
  level: DEBUG 
accessLog:
  format: json 
  filters:
    statusCodes: "400-599"  
  fields:
    defaultMode: keep
    names:
      "ClientUsername": drop
      "ClientHost": keep

tls:
  stores:
    default:
      defaultCertificate:
        certFile: /certs/certfile.crt
        keyFile: /certs/keyfile.key

http:
  routers:
    dashboard:
      rule: "Host(`dashboard.demo.com`)"
      service: "api@internal"
      entryPoints:
        - "websecure"
      tls:
        certFile: /certs/certfile.crt
        keyFile: /certs/keyfile.key
      middlewares:
        - auth

  middlewares:
    auth:
      basicAuth:
        users:
          - "admin:haitai@123"

model-server.yml

version: '3.3'

services:
  model-server:
    image: model_run_sklearn:1.0
    user: "8888:981"
    environment:
      - ORG_CODE=100001
      - IMAGE_TYPE=model_run_sklearn:1.0
    volumes:
      - /home/dms_data/run/base_file/config.yaml:/dms/conf/config.yaml
      - /home/dms_data/run/api/run_file/e5205e88889c49c4815c27f8def82d7f/hander.py:/work_dir/actuator/hander.py
      - /home/dms_data/run/task/model/d5f1fe3f843f4deebc46a5a1b63d0223.model:/work_dir/actuator/run.model
    networks:
      - dms-overlay-network
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.model-servers.rule=PathPrefix(`/model-servers`)"
      - "traefik.http.services.model-servers.loadbalancer.server.port=5000"
      - "traefik.http.routers.model-servers.entrypoints=websecure"
      - "traefik.http.routers.model-servers.tls=true"
    deploy:
      replicas: 3

networks:
  dms-overlay-network:
    external: true

按照上面的配置启动model-server.yml后,dashboard中只能看到主节点启动的容器ip。其他两个节点启动的容器ip无法获取。这三个节点机器的dms-overlay-network网络id是相同的,三个容器可以相互访问。

English language, please.

After starting model-server.yml according to the above configuration, only the container IP started by the master node can be seen in the dashboard servers. The container IP started by the other two nodes cannot be obtained. The Docker network dms-overlay-network ID of these three node machines are the same, and the three containers can access each other.

Note that you are mixing multiple things:

  1. Traefik static config (entrypoints, providers, api, log, certresolver) can be in traefik.yml or command:, but not both. Decide for one (doc).

  2. Traefik dynamic config (tls, http and tcp, routers, middlewares and services) is better in a separate file, which you load with providers.file. Or routers/middlewares/services in labels via providers.docker.

traefik-stack.yml

version: '3.3'

services:
  traefik:
    image: traefik:v3.0
    container_name: traefik
    restart: unless-stopped
    command:
      - --api.insecure=true
      - --api.debug=true
      - --providers.docker=true
      - --providers.docker.exposedByDefault=false
      - --providers.docker.network=dms-overlay-network
      - --log.level=DEBUG
      - --accesslog=true
      - --accesslog.filters.statuscodes=400-599
      - --accesslog.format=json
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
    ports:
      - "9080:80"
      - "9880:8080"
      - "9443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./certs/certfile.crt:/certs/certfile.crt
      - ./certs/keyfile.key:/certs/keyfile.key
    networks:
      - dms-overlay-network
    deploy:
      placement:
        constraints:
          - node.role == manager

networks:
  dms-overlay-network:
    external: true

model-server.yml

version: '3.3'

services:
  model-server:
    image: model_run_sklearn:1.0
    user: "8888:981"
    environment:
      - ORG_CODE=100001
      - IMAGE_TYPE=model_run_sklearn:1.0
    volumes:
      - /home/dms_data/run/base_file/config.yaml:/dms/conf/config.yaml
      - /home/dms_data/run/api/run_file/e5205e88889c49c4815c27f8def82d7f/hander.py:/work_dir/actuator/hander.py
      - /home/dms_data/run/task/model/d5f1fe3f843f4deebc46a5a1b63d0223.model:/work_dir/actuator/run.model
    networks:
      - dms-overlay-network
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.model-servers.rule=PathPrefix(`/model-servers`)"
      - "traefik.http.services.model-servers.loadbalancer.server.port=5000"
      - "traefik.http.routers.model-servers.entrypoints=web"
    deploy:
      replicas: 3

networks:
  dms-overlay-network:
    external: true

Servers

Status URL

--- http://10.0.1.224:5000

[root@liuwenhua tmp]# docker service ls
ID             NAME                 MODE         REPLICAS   IMAGE                                         PORTS
6jbzx7zov6du   model_model-server   replicated   3/3        docker.haitai.com/dms/model_run_sklearn:1.0   
wcs28kd9a0ns   stack_traefik        replicated   1/1        docker.haitai.com/dms/traefik:v3.0            *:9080->80/tcp, *:9443->443/tcp, *:9880->8080/tcp
[root@liuwenhua tmp]# docker service ps model_model-server
ID             NAME                   IMAGE                                         NODE                             DESIRED STATE   CURRENT STATE            ERROR     PORTS
g7vd590i7o2n   model_model-server.1   docker.haitai.com/dms/model_run_sklearn:1.0   zhoujiancheng.haitaichina.net    Running         Running 10 minutes ago             
y71ce93qrd1r   model_model-server.2   docker.haitai.com/dms/model_run_sklearn:1.0   zuojunxiang-nb.haitaichina.net   Running         Running 10 minutes ago             
wk9nrndtt78d   model_model-server.3   docker.haitai.com/dms/model_run_sklearn:1.0   liuwenhua.haitaichina.net        Running         Running 10 minutes ago 

When I switched to booting without using static configuration, the situation remained the same. only the container IP started by the master node can be seen in the dashboard servers.

Config seems ok.

You deploy with docker stack deploy?

What does Traefik debug log tell you?

When you want to use your custom TLS certs, you need to use a dynamic config file, loaded with providers.file in static config with tls section, then enable TLS on entrypoint or router (no certresolver).

version: '3.3'

services:
  traefik:
    image: docker.haitai.com/dms/traefik:v3.0
    container_name: traefik
    restart: unless-stopped
    command:
      - --providers.swarm=true
      - --providers.swarm.endpoint=unix:///var/run/docker.sock
      - --api.insecure=true
      - --api.debug=true
      - --log.level=DEBUG
      - --accesslog=true
      - --accesslog.filters.statuscodes=400-599
      - --accesslog.format=json
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
    ports:
      - "9080:80"
      - "9880:8080"
      - "9443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - dms-overlay-network
    deploy:
      placement:
        constraints:
          - node.role == manager

networks:
  dms-overlay-network:
    external: true

version: '3.3'

services:
  model-server:
    image: docker.haitai.com/dms/model_run_sklearn:1.0
    user: "8888:981"
    environment:
      - ORG_CODE=100001
      - IMAGE_TYPE=model_run_sklearn:1.0
    volumes:
      - /home/dms_data/run/base_file/config.yaml:/dms/conf/config.yaml
      - /home/dms_data/run/api/run_file/e5205e88889c49c4815c27f8def82d7f/hander.py:/work_dir/actuator/hander.py
      - /home/dms_data/run/task/model/d5f1fe3f843f4deebc46a5a1b63d0223.model:/work_dir/actuator/run.model
    networks:
      - dms-overlay-network
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.model-servers.rule=PathPrefix(`/model-servers`)"
      - "traefik.http.services.model-servers.loadbalancer.server.port=5000"
      - "traefik.http.routers.model-servers.entrypoints=web"
    deploy:
      replicas: 3

networks:
  dms-overlay-network:
    external: true

stack_traefik.1.dk04ah1atmy3@liuwenhua.haitaichina.net | 2024-06-17T12:33:45Z ERR github.com/traefik/traefik/v3/pkg/provider/docker/config.go:81 > error="service "stack-traefik" error: port is missing" container=stack-traefik-dk04ah1atmy3wy791thuxxvzn providerName=swarm
stack_traefik.1.dk04ah1atmy3@liuwenhua.haitaichina.net | 2024-06-17T12:33:45Z ERR github.com/traefik/traefik/v3/pkg/provider/docker/config.go:81 > error="service "model-model-server" error: port is missing" container=model-model-server-p3wvigshoyjl5vmq7hgj4broz providerName=swarm
stack_traefik.1.dk04ah1atmy3@liuwenhua.haitaichina.net | 2024-06-17T12:33:45Z ERR github.com/traefik/traefik/v3/pkg/provider/docker/config.go:81 > error="service "model-model-server" error: port is missing" container=model-model-server-p6ll1be0w6ekjg1kmnci4t4ju providerName=swarm
stack_traefik.1.dk04ah1atmy3@liuwenhua.haitaichina.net | 2024-06-17T12:33:45Z ERR github.com/traefik/traefik/v3/pkg/provider/docker/config.go:81 > error="service "model-model-server" error: port is missing" container=model-model-server-q00eg65553v6fj7eoxd72xwh8 providerName=swarm

why does this configuration report that the port cannot be found

When using Swarm, labels need to go inside deploy: section.

The adjustment has been successful. Thank you very much.