Hi, I created this GH issue (Requests getting routed to unhealthy backend · Issue #8570 · traefik/traefik · GitHub) which got closed by a bot. Does anyone have a clue if this is a bug or if I'm just doing it wrong?
Welcome!
- [X] Yes, I've searched similar issues on GitHub and didn't find any.
- [X] Yes, I've searched similar issues on the Traefik community forum and didn't find any.
What did you do?
I have a service behind Traefik that I deploy in a blue/green kind of way for zero downtime deployments. Discovery is via Docker provider. I do ZDD via green/blue namespaced service/router labels:
Suppose we have our green
container running, the labels are like this:
"traefik.docker.network": "traefik_default",
"traefik.enable": "true",
"traefik.http.routers.my_service_green.priority": "1636587340",
"traefik.http.routers.my_service_green.rule": "HostRegexp(`{var:.*}`) || Host(`green`)",
"traefik.http.routers.my_service_green.tls": "true",
"traefik.http.services.my_service_green.loadbalancer.healthcheck.interval": "5s",
"traefik.http.services.my_service_green.loadbalancer.healthcheck.path": "/ping",
"traefik.http.services.my_service_green.loadbalancer.healthcheck.port": "5000",
"traefik.http.services.my_service_green.loadbalancer.healthcheck.timeout": "4s",
"traefik.http.services.my_service_green.loadbalancer.server.port": "5000",
"traefik.http.services.my_service_green.loadbalancer.server.scheme": "http"
The next time we deploy the app, the labels on the new containers will be different, the new blue
container will have the following labels:
"traefik.docker.network": "traefik_default",
"traefik.enable": "true",
"traefik.http.routers.my_service_blue.priority": "1636595625",
"traefik.http.routers.my_service_blue.rule": "HostRegexp(`{var:.*}`) || Host(`blue`)",
"traefik.http.routers.my_service_blue.tls": "true",
"traefik.http.services.my_service_blue.loadbalancer.healthcheck.interval": "5s",
"traefik.http.services.my_service_blue.loadbalancer.healthcheck.path": "/ping",
"traefik.http.services.my_service_blue.loadbalancer.healthcheck.port": "5000",
"traefik.http.services.my_service_blue.loadbalancer.healthcheck.timeout": "4s",
"traefik.http.services.my_service_blue.loadbalancer.server.port": "5000",
"traefik.http.services.my_service_blue.loadbalancer.server.scheme": "http"
Note that they both match HostRegexp(
{var:.*})
and additionally they either match a host header blue
or green
so I can query them directly to know that they're up. They also use the router priority
to make sure the traffic starts flowing to the one with the higher priority.
What did you see instead?
What I expect is that requests will only be routed to the backend with
- A succeeding health check
- With the highest priority.
So in my example above, I expect requests to go to my_service_green
even though it has the lower priority, because my_service_blue
isn't healthy yet. As soon as my_service_blue
is healthy, all requests should get routed there. This is not what I'm seeing however. Even though my_service_blue
is correctly marked as unhealthy
time="2021-11-10T23:35:43Z" level=warning msg="Health check failed, removing from server list. Backend: \"my_service_blue@docker\" URL: \"http://172.18.0.4:5000\" Weight: 1 Reason: HTTP request failed: Get \"http://172.18.0.4:5000/ping\": dial tcp 172.18.0.4:5000: connect: connection refused"
I can see requests already being routed to that instance
XXX.XXX.62.57 - - [10/Nov/2021:23:35:43 +0000] "GET /ping HTTP/1.1" 503 19 "-" "python-requests/2.25.1" 28850 "my_service_blue@docker" "-" 0ms
What version of Traefik are you using?
traefik version
Version: 2.5.2
Codename: livarot
Go version: go1.17
Built: 2021-09-02T15:07:43Z
OS/Arch: linux/amd64
What is your environment & configuration?
api:
dashboard: true
insecure: true
entryPoints:
web-secure:
address: ':443'
tls:
stores:
default:
defaultCertificate:
certFile: /etc/ssl/certs/public.crt
keyFile: /etc/ssl/private/private.key
providers:
docker:
network: traefik_default
exposedByDefault: false
file:
directory: /etc/traefik/
watch: true
serversTransport:
insecureSkipVerify: true
maxIdleConnsPerHost: -1
log:
level: DEBUG
accessLog:
fields:
headers:
names:
User-Agent: keep
Referer: keep
If applicable, please paste the log output in DEBUG level
Please see Requests getting routed to unhealthy backend · Issue #8570 · traefik/traefik · GitHub for full log as it's too big for this post.