Service Health Checks

Hi,

i read that health checks are not available on KubernetesIngress and that LivenessProbe should be used, however i had the following issue and now have some doubts using LivenessProbe.

Node C on the cluster crashd, restarted and came up almost properly. Pods, Service and CLuster communication worked again but due to networking configuration issues the Pods from Node C were not reachable by other nodes. And so also traefik from Node A was not able to reach Node C but Pods on Node C reported everything is ok because local liveness probe was OK and therefore no alerts has been generated.

After i read the documentation and i figured out that traefik_service_server_up is something i am not able to use I decided to change the LivenessProbe to test the external traefik endpoint.

Now i worry that if for some reason the external Probe was not successfull but the Pod is perfectly fine actually and the failure is somewhere else then recycling the pods is a waste of resources? Imagine recycling huge amount of pods only because dns, loadbalancer or tls is not working properly.

Hello @cinatic,

Traefik uses the kubernetes API to generate a list of service endpoints to forward traffic to. Kubernetes populates service endpoints with pods that pass both the liveness and readiness checks. Note that liveness and readiness are not the same.

Therefore endpoints that are reported to Traefik are already passing the kubernetes healthchecks, and duplicating that functionality within Traefik would be meaningless, as every endpoint has already passed.

Traefik as with most applications relies on the underlying networking of the orchestration platform. It cannot detect networking configuration or issues, nor is it designed to. If your cluster has internal networking issues after a node restart, then that should be your first priority on resolving.

For more information on liveness and readiness probes, the kubernetes documentation has a great description: (Pod Lifecycle | Kubernetes)