Loadbalancing containers in separate docker hosts

Good Morning everybody,
I'm trying something not clearly documented here.
I have a docker application deployed on 2 separate docker hosts, each one with its own traefik v. 2.2
No swarm, no kubernetes, no other orchestration tool.
This is due to the need to reconfigure an existing legacy application that is sending requests to each docker host via an external loadbalancing.
In the docker hosts applications are responding to url like .host1/host2. via traefik as usual.
What I'm trying to accomplish here is to to insert another level of loadbalancing to redirect/balance/stop traffic to one of the containers if needed.
I'm working with dynamic traefik configurations on file, with a sample app that consist only in an nginx container in both servers :

version: "3.5" 

services:
  test:
    image: nginx
    volumes:
     - ./templates:/etc/nginx/templates
     - ./site/:/var/site/:rw
    expose:
     - "80"
    networks:
    - public
    labels: 
      - traefik.enable=true
      - traefik.docker.network=public
      - traefik.http.services.test.loadbalancer.server.port=80

networks:
  public:
   external: true

In this compose I don't specify any other traefik labels, demanding the rest of config to a file config :

http:
  routers:
    test:
      entryPoints: [web]
      service: test-cluster@file
      rule: Host(`cluster.host1.domain.com`)

    test1:
      entryPoints: [web]
      service: test@docker
      rule: Host(`container_name.host1.domain.com`)

  services:
    test-cluster:
      loadBalancer:
        healthCheck:
          path: /index.html
          interval: "10s"
          timeout: "3s"
        servers:
        - url: http://container_name.host1.domain.com
        - url: http://container_name.host2.domain.com

In this config, the test router, should send traffic to a loadbalanced list of servers, called 'test-cluster' that includes the containers running in both docker hosts.
The test1 router, calls just the service 'test@docker' created by the 'local' docker-compose.
In the other docker hosts, this router calls its local service.
All names are defined in dns.
Al routers and services are showed as correct in traefik dashboard of both servers.

Each containers works if called with its own url, but calling the "cluster" url does not.
Calling with curl or browser, the request just hangs.

In traefik log I cannot see messages indicating some errors, just the initial 'the service does not exists' while you restart the compose or the servers are being checked by the loadBalancer.

Anybody has a suggestion about to do it or a more efficient way to debug it, since traefik logs are not helpful in this ?

Thanks a lot

1 Like

Hi @enrido

This looks like a symptom of the client not connecting to the traefik instance. In general the file provider looks good. If the issue were router/service configuration I would expect to see some kind of HTTP response code 404/502 etc....

I use --log.level=DEBUG and --accesslog when investigating.

Yes, debug and accesslog are already activated in traefik static config.
Filtering the log lines during test with jq the only messages are this :

"Refreshing health check for backend: test-cluster@file"
"vulcand/oxy/roundrobin/rr: begin ServeHttp on request"
"vulcand/oxy/roundrobin/rr: completed ServeHttp on request"
"vulcand/oxy/roundrobin/rr: Forwarding this request to URL"

What response does curl ultimately give if left ?

Than you for you reply,
It just hangs, until I break it with Ctrl+C.
This call instantiate some kind of loop that I must exit.
Sorry I wasn't clear about that.

The only other message in traefik.log is :

499 Client Closed Request' caused by: context canceled

After I kill curl obviously.
In a few seconds before the kill, it opens hundreds of requests, roughly estimated this way :

tail -n 2000 /opt/docker/volumes/traefik/_data/logs/traefik.log | jq '.msg' | grep -i 499 | wc -l
737

Interesting, have you tried curl in verbose mode?

Yes, but after 'Accept:' it is stuck

curl -v http://cluster.xxx.yyy.com
* Rebuilt URL to: http://cluster.xxx.yyy.com/
*   Trying 192.168.10.172...
* TCP_NODELAY set
* Connected to cluster.xxx.yyy.com (192.168.10.172) port 80 (#0)
> GET / HTTP/1.1
> Host: cluster.xxx.yyy.com
> User-Agent: curl/7.58.0
> Accept: */*
> 
^C

I'd be tempted to run a container to test resolution and connection from a container environment to the backend.

Also are you seeing the health checks coming in on the services?

Yes, I can see in traefik.log the message :

Refreshing health check for backend: test-cluster@file

All urls are resolved via dns in the same way in docker hosts and in my client workstation from which i run the curl.