Noob connectivity issues when connecting to backend services

I have an annoying problem, that is perhaps related to something I do not understand.
I have a simple docker swarm installation: one manager, two workers.
I want to setup Traefik on our manager to listen on port 443.
Now I don't use let's encrypt (and ACME in general), because i got signed keys and of course CA cert. But that is not relevant at the moment. - I think
I wanted to setup a simple rest service, that will be loadbalanced by Traefik.

I prepared docker-compose file:

version: '3'

services:
  traefik:
    image: traefik:v2.6
#    network_mode: "host"
    command:
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=true"
      - "--entrypoints.websecure.address=:443"
      - "--providers.docker.swarmMode=true"
      - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
      - "--entrypoints.web.http.redirections.entryPoint.scheme=https"
      - "--log.level=DEBUG"
      - "--providers.file.directory=/config/"
      - "--providers.file.watch=true"
    secrets:
      - ca_cert.crt
      - traefik_cert.key
      - traefik_cert.crt
      - basic-auth
    configs:
      - source: tls_config
        target: /config/traefik.yml
    network_mode: "host"

    ports:
      - "443:443"
      - 80:8080
    networks:
      - traefik-net
    volumes:
      - "traefik-config:/config/"
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "traefik-certs:/run/secrets/"
    deploy:
        placement:
          constraints:
            - node.role == manager

volumes:
  traefik-certs:
  traefik-config:

secrets:
  basic-auth:
    file: ./secrets/htpasswd
  ca_cert.crt:
    file: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
  traefik_cert.crt:
    file: ./secrets/myorg.net
  traefik_cert.key:
    file: ./secrets/myorg.net

configs:
  tls_config:
    file: $PWD/config/traefik.yml


networks:
  traefik-net:
    driver: overlay
    attachable: true

My idea is to create an overlay internal network, that I would attach my services to, that would dynamically alter the configuration.
my traefik configuration looks like this:

api:
  dashboard: true
  insecure: true
tls:
  certificates:
    certFile: /run/secrets/traefik_cert.crt
    keyFile: /run/secrets/traefik_cert.key
    stores:
      - default
tls:
  options:
    default:
      clientCAFiles:
        - /run/secrets/ca_cert.crt
      sniStrict: true

log:
    level: DEBUG

providers:
  docker:
    swarmMode: true
    watch: true
    exposedByDefault: true
#    network: traefik_traefik-net

tls:
  stores:
    default:
      defaultCertificate:
        certFile: /run/secrets/traefik_cert.crt
        keyFile: /run/secrets/traefik_cert.key

certificatesResolvers:
  certres:
    acme: false
    httpChallenge:
      entryPoint: http
    tlsChallenge:
      entryPoint: websecure

#    defaultChallenge: "tls-sni-01"

I create a stack:

docker stack deploy --with-registry-auth --compose-file docker-compose.yml traefik

So I created a service:

  cpl-test:
    image: cye.myorg.net:5000/cpl-backend:1582
    deploy:
      placement:
          constraints:
            - node.role == worker

      replicas: 2
      labels:
        - traefik.http.routers.cpl-test.rule=Host(`test-cpl.myorg.net`)
        - traefik.http.routers.cpl-test.entrypoints=websecure
        - traefik.http.routers.cpl-test.tls=true
        - traefik.http.routers.cpl-test.service=cpl-test
        - traefik.http.services.cpl-test.loadbalancer.server.port=8091
    networks:
        - traefik_traefik-net
    environment:
      - "SPRING_PROFILES_ACTIVE=test"
    ports:
      - 8087:8091
    volumes:
      - /opt/cplPortal/configuration/test:/config
      - /opt/cplPortal/logs:/logs
networks:
  traefik_traefik-net:
    external: true

My services seem to start ok. I can see in the logs something like this:

 time="2023-03-08T15:26:58Z" level=debug msg="Configuration received from provider docker: {\"http\":{\"routers\":{\"cpl-test\":{\"entryPoints\":[\"websecure\"],\"service\":\"cpl-test\",\"rule\":\"Host(`test-cpl.myorg.net`)\",\"tls\":{}}},\"services\":{\"cpl-test\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.0.88:8091\"},{\"url\":\"http://10.0.0.89:8091\"}],\"passHostHeader\":true}}}},\"tcp\":{},\"udp\":{}}" providerName=docker

And now there we got to the place, where things happen without my full understandning:
It seems the hosts:
10.0.0.88 and 10.0.0.89 are not reachable by Traefik (they do not seem to be reachable on both ports 8091 and 8087). I did try to run docker exec -it dockerImage sh
and then I tried to check the hosts with netcat, both are not reachable.
I suppose I'm missing something. But I have not really found out what.
By the way, if I ssh to the hosts, where my workers are, I can freely conncect to locahost:8087 (as that is the port exposed by the service)

Things I already tried:
Traefik.http.services.cpl-test.loadbalancer.server.port - I changed that back and forth from 8087 to 8091 (I assume it should be 8091)

My question is, what am I doing wrong?

First I would make config easier by naming your Docker network:

networks:
  proxy:
    name: proxy

That way it does not change when using compose (with different services) and it stays an easy name :slight_smile:

Second, you have static config in Traefik command and your Traefik static config file (usually traefik.yml with entrypoints, logs, certresolver) - you can only have one.

Third, you have TLS in your static config. TLS cert files need to be loaded in a dynamic config file, which is loaded via provider.file in static config.

Fourth, you have an entrypoint.web redirect, but web is never declared.

I tried to simplify the configuration.
So I came out with something like this:

version: "3.8"
services:
  traefik:
    image: "traefik:v2.9"
    command:
      - --entrypoints.web.address=:80
      - --providers.docker
      - --providers.docker.swarmMode=true
      - "--log.level=DEBUG"
    ports:
      - "80:80"
      - "8080:8080"
    networks:
      - proxy
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    deploy:
        placement:
          constraints:
            - node.role == manager
  cpl-test:
    image: cye.myorg.net:5000/cpl-backend:1582
    deploy:
      replicas: 2
      labels:
        - traefik.enable=true
        - traefik.http.routers.cpl-test.rule=Host(`test-cpl.myorg.net`)

        - traefik.http.services.cpl-test.loadbalancer.server.port=8091
    networks:
        - proxy
    environment:
      - "SPRING_PROFILES_ACTIVE=test"
    ports:
      - 8087:8091
    volumes:
      - /opt/cplPortal/configuration/test:/config
      - /opt/cplPortal/logs:/logs
networks:
  proxy:
    name: proxy
    driver: overlay
    attachable: true`

I see my endpoints being detected:

traefik_traefik.1.wcvduw21a2j4@si0vm08509 | time="2023-03-09T09:58:15Z" level=debug msg="Configuration received: {\"http\":{\"routers\":{\"cpl-test\":{\"service\":\"cpl-test\",\"rule\":\"Host(test-cpl.myorg.net)\"}},\"services\":{\"cpl-test\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://10.0.0.81:8091\"},{\"url\":\"http://10.0.0.78:8091\"}],\"passHostHeader\":true}}}},\"tcp\":{},\"udp\":{}}" providerName=docker

now I connect to my docker container which has traefik in it, and I tried to check the connection:
/# nc -w2 -vv 10.0.0.78 8091
dnc: 10.0.0.78 (10.0.0.78:8091): Operation timed out
/ # nc -w2 -vv 10.0.0.81 8091
nc: 10.0.0.81 (10.0.0.81:8091): Operation timed out
sent 0, rcvd 0
/ # nc -w2 -vv 10.0.0.81 8087
nc: 10.0.0.81 (10.0.0.81:8087): Operation timed out
sent 0, rcvd 0
I got firewalls disabled so these would not get in the way

Is you backend a http service? If yes, can you try wget? Also try ping.

Are you connecting your nodes over a vSwitch? Then you should look into setting a smaller MTU for your Docker overlay network.

I tried that. What seems odd, is that pings work (I can see them when I do tcpdump on destination host), but I cannot seem to see the tcp connections. I'm fairly sure it is not the firewall...

Is 8091 the right internal port? :laughing:

I think there is something wrong with my docker installation. As Honestly I cannot get any service to communicate over an overlay network. I can ping one container from another, but cannot create a tcp connection.