Requests getting routed to unhealthy backend

Hi, I created this GH issue (Requests getting routed to unhealthy backend · Issue #8570 · traefik/traefik · GitHub) which got closed by a bot. Does anyone have a clue if this is a bug or if I'm just doing it wrong?

Welcome!

  • [X] Yes, I've searched similar issues on GitHub and didn't find any.
  • [X] Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What did you do?

I have a service behind Traefik that I deploy in a blue/green kind of way for zero downtime deployments. Discovery is via Docker provider. I do ZDD via green/blue namespaced service/router labels:

Suppose we have our green container running, the labels are like this:

"traefik.docker.network": "traefik_default",
"traefik.enable": "true",
"traefik.http.routers.my_service_green.priority": "1636587340",
"traefik.http.routers.my_service_green.rule": "HostRegexp(`{var:.*}`) || Host(`green`)",
"traefik.http.routers.my_service_green.tls": "true",
"traefik.http.services.my_service_green.loadbalancer.healthcheck.interval": "5s",
"traefik.http.services.my_service_green.loadbalancer.healthcheck.path": "/ping",
"traefik.http.services.my_service_green.loadbalancer.healthcheck.port": "5000",
"traefik.http.services.my_service_green.loadbalancer.healthcheck.timeout": "4s",
"traefik.http.services.my_service_green.loadbalancer.server.port": "5000",
"traefik.http.services.my_service_green.loadbalancer.server.scheme": "http"

The next time we deploy the app, the labels on the new containers will be different, the new blue container will have the following labels:

"traefik.docker.network": "traefik_default",
"traefik.enable": "true",
"traefik.http.routers.my_service_blue.priority": "1636595625",
"traefik.http.routers.my_service_blue.rule": "HostRegexp(`{var:.*}`) || Host(`blue`)",
"traefik.http.routers.my_service_blue.tls": "true",
"traefik.http.services.my_service_blue.loadbalancer.healthcheck.interval": "5s",
"traefik.http.services.my_service_blue.loadbalancer.healthcheck.path": "/ping",
"traefik.http.services.my_service_blue.loadbalancer.healthcheck.port": "5000",
"traefik.http.services.my_service_blue.loadbalancer.healthcheck.timeout": "4s",
"traefik.http.services.my_service_blue.loadbalancer.server.port": "5000",
"traefik.http.services.my_service_blue.loadbalancer.server.scheme": "http"

Note that they both match HostRegexp({var:.*}) and additionally they either match a host header blue or green so I can query them directly to know that they're up. They also use the router priority to make sure the traffic starts flowing to the one with the higher priority.

What did you see instead?

What I expect is that requests will only be routed to the backend with

  1. A succeeding health check
  2. With the highest priority.

So in my example above, I expect requests to go to my_service_green even though it has the lower priority, because my_service_blue isn't healthy yet. As soon as my_service_blue is healthy, all requests should get routed there. This is not what I'm seeing however. Even though my_service_blue is correctly marked as unhealthy

time="2021-11-10T23:35:43Z" level=warning msg="Health check failed, removing from server list. Backend: \"my_service_blue@docker\" URL: \"http://172.18.0.4:5000\" Weight: 1 Reason: HTTP request failed: Get \"http://172.18.0.4:5000/ping\": dial tcp 172.18.0.4:5000: connect: connection refused"

I can see requests already being routed to that instance

XXX.XXX.62.57 - - [10/Nov/2021:23:35:43 +0000] "GET /ping HTTP/1.1" 503 19 "-" "python-requests/2.25.1" 28850 "my_service_blue@docker" "-" 0ms

What version of Traefik are you using?

traefik version
Version:      2.5.2
Codename:     livarot
Go version:   go1.17
Built:        2021-09-02T15:07:43Z
OS/Arch:      linux/amd64

What is your environment & configuration?

api:
  dashboard: true
  insecure: true

entryPoints:
  web-secure:
    address: ':443'

tls:
  stores:
    default:
      defaultCertificate:
        certFile: /etc/ssl/certs/public.crt
        keyFile: /etc/ssl/private/private.key

providers:
  docker:
    network: traefik_default
    exposedByDefault: false
  file:
    directory: /etc/traefik/
    watch: true

serversTransport:
  insecureSkipVerify: true
  maxIdleConnsPerHost: -1

log:
  level: DEBUG

accessLog:
  fields:
    headers:
      names:
        User-Agent: keep
        Referer: keep

If applicable, please paste the log output in DEBUG level

Please see Requests getting routed to unhealthy backend · Issue #8570 · traefik/traefik · GitHub for full log as it's too big for this post.

Hello @Tobias,

Thanks for your interest in Traefik!

One thing that may help you to achieve your configuration, is regarding the rules of the routers.
With a rule like "HostRegexp({var:.*}) || Host(green)", the first HostRegexp matcher is going to match any request.
This means that any other matcher OR connected with it will not be evaluated.
So, in your case, "Host(blue)" or "Host(green)" are not evaluated, which can explain the routing issue you are facing.

Hope it helps!