Healthchecks do not take effect

2manyvcos · March 3, 2025, 10:23am

I am trying to configure a Ceph dashboard behind Traefik, but have issues with setting up service health checks in Traefik.

Here is my scenario:

There are 3 Ceph managers available to serve the dashboard at port 7000.
However, the dashboard is only ever served over one of the 3 managers.
The other 2 managers reply with status 500 and a corresponding error message:

{"status": "500 Internal Server Error", "detail": "Keep on looking", "request_id": "efc1d3ba-66cc-4138-9d11-8bf8fe51af34"}

Only the active manager replies with status 200.

Previously, we used Traefik v1, where we were able to set a service healthcheck to always point to the active manager.

Traefik v1 config

# Ceph UI
[frontends.ceph]
backend = "ceph"
[frontends.ceph.routes.route1]
rule = "Host:ceph.example.com"
[backends.ceph.healthcheck]
interval = "10s"
path = "/"
[backends.ceph.loadbalancer]
method = 'drr'
[backends.ceph.servers.server1]
url = "http://192.168.1.1:7000"
[backends.ceph.servers.server2]
url = "http://192.168.1.2:7000"
[backends.ceph.servers.server3]
url = "http://192.168.1.3:7000"

After migrating to Traefik v3, the healthcheck does not seem to work correctly anymore.

Traefik v3 config

http:
  routers:
    # Ceph UI
    ceph:
      service: ceph@file
      rule: Host(`ceph.example.com`)

  services:
    ceph:
      loadBalancer:
        servers:
          - url: http://192.168.1.1:7000
          - url: http://192.168.1.2:7000
          - url: http://192.168.1.3:7000
        healthCheck:
          path: /
          interval: 10s
          timeout: 3s
          status: 200

In the Traefik dashboard, all 3 servers are displayed as healthy for about 10s, then the 2 inactive managers are displayed as unhealthy for about 3s, then they appear as healthy again.

When I try to reach the Ceph dashboard over Traefik, I randomly get either the HTML page or status 500.

Is this a bug or did I miss something in the Traefik configuration?

bluepuma77 · March 3, 2025, 11:47am

Simple example works for me.

docker-compose.yml:

services:
  traefik:
    image: traefik:v3
    container_name: traefik
    ports:
      - 80:80
    networks:
      - proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./dynamic.yml:/dynamic/dynamic.yml
    command:
      - --api.dashboard=true
      - --log.level=DEBUG
      - --accesslog=true
      - --providers.docker.network=proxy
      - --providers.docker.exposedByDefault=false
      - --providers.file.directory=/dynamic
      - --providers.file.watch=true
      - --entrypoints.web.address=:80
    labels:
      - traefik.enable=true
      - traefik.http.routers.mytraefik.rule=Host(`traefik.example.com`)
      - traefik.http.routers.mytraefik.service=api@internal

  whoami:
    image: traefik/whoami:v1.10
    container_name: whoami
    networks:
      - proxy
    labels:
      - traefik.enable=true
      - traefik.http.routers.mywhoami.rule=Host(`whoami.example.com`)
      - traefik.http.services.mywhoami.loadbalancer.server.port=80

  service_ok_1:
    container_name: service_ok_1
    image: hashicorp/http-echo
    command: ["-text=OK1", "-status-code=200"]
    networks:
      - proxy

  service_ok_2:
    container_name: service_ok_2
    image: hashicorp/http-echo
    command: ["-text=OK2", "-status-code=200"]
    networks:
      - proxy

  service_fail_1:
    container_name: service_fail_1
    image: hashicorp/http-echo
    command: ["-text=FAIL1", "-status-code=500"]
    networks:
      - proxy

  service_fail_2:
    container_name: service_fail_2
    image: hashicorp/http-echo
    command: ["-text=FAIL2", "-status-code=500"]
    networks:
      - proxy

networks:
  proxy:
    name: proxy
    #external: true

dynamic.yml:

http:
  routers:
    healthy:
      service: healthy
      rule: Host(`healthy.example.com`)

  services:
    healthy:
      loadBalancer:
        servers:
          - url: http://service_ok_1:5678
          - url: http://service_ok_2:5678
          - url: http://service_fail_1:5678
          - url: http://service_fail_2:5678
        healthCheck:
          path: /
          interval: 10s
          timeout: 3s
          status: 200

2manyvcos · March 12, 2025, 3:49pm

Hi @bluepuma77,

I played with your example configuration and found out that my issue seems to be related to the swarm provider, so I created Github issue #11604.

A possible workaround is to set --providers.swarm.network=ingress.

Thank you for your help!

Topic		Replies	Views
Failing health check not taking service offline Traefik v2 docker	0	561	October 2, 2019
Traefik and healtcheck : backend server list Traefik v1 docker-swarm , dashboard-api	0	792	July 19, 2019
Can't get healthcheck to work Traefik v2 docker	0	699	February 2, 2021
How to setup health checks for Traefik? Traefik v1 kubernetes-ingress	3	3147	July 2, 2019
Strange healthcheck behaviour Traefik v2 ping , cli	8	3516	December 31, 2019

Healthchecks do not take effect

Related topics