The Traefik health check is not available for kubernetesCRD and
kubernetesIngress providers because Kubernetes already has a health check mechanism.
Unhealthy pods will be removed by kubernetes. (cf liveness documentation)
But I have a scenario, when my K8s node is not ready, K8s determines that the node is not ready according to the node-monitor-grace-period (default 40s), then the pod running on this node will take at least 40s before The endpoint will be removed from the service, which means that during this period of time, some requests to access the service will fail to respond.
If the health check mechanism of the traefik service is available, this problem can be avoided through the health check of the traefik service.
However, the current official document says that it does not support setting up health checks in K8s.
As mentioned above, sometimes when the node is not ready, it will take about 40s to perceive the endpoint change. If traefik can support the health check of the K8s service (eg:health-check-whoami), then the failed endpoint can be timely Move out of service list
I understand that you are asking about health check checking the condition of a service where the network traffic is redirected.
For that use case, I would suggest adding liveness and readiness checks on a pod directly, and if the pod is not ready yet the network traffic will not be routed to the 'unhealthy' pod. The function
There are two types of the health checks
Readiness.
That probe is designed to let Kubernetes know when your app (a service) is ready to service traffic. Kubernetes will make sure that the readiness check passed meaning that the service is healthy before sending traffic to the pod. If the probe will start to fail, the network traffic will not be sent.
Liveness
That probe checks whether your app is live or dead. If the app is alive then Kubernetes will no take any actions but if the pod is dead then Kubernetes will remove the failed pod and start a new instance by replacing that failed.
So those mentioned health checks should be configured on a pod level and if the pod is healthy the network traffic will be correctly routed to them.
The probe is a way to check the health of the pod, but I have an extreme scenario here. When the node is shut down, because the probe is done by kubelet, the node is shut down until it is marked notready by K8s. During the period (default 40s), the K8s endpoint will continue to include this pod.
In addition to modifying the node-monitor-grace-period configuration, I don't know that other methods can detect that the pod is unavailable more quickly and remove it from the endpoint.
We have a 2-node cluster with all our services/pods running on both nodes and a traefik ingress controller on each node too. If we take the one node down (for a reboot) the second node still blindly routes traffic to the non-existent node. It seems there is no "automatic" detection that the pods are down. Exactly 50% of ingress requests on the healthy node fail until the first node recovers.
Traefik is using Kubernetes endpoints to route network traffic directly to the pod. The endpoint will be created once the target application is healthy based on Kubernetes health check probes.
The (application) pod should have configured a health check probe (liveness, readiness) to determine whether an application is healthy and ready to accept incoming network traffic.
Can you please check how health checks probes are configured on your application?