Why the Traefik health check is not available for kubernetesCRD and kubernetesIngress providers

According to the documentation health-check

The Traefik health check is not available for kubernetesCRD and 
kubernetesIngress providers because Kubernetes already has a health check mechanism. 
Unhealthy pods will be removed by kubernetes. (cf liveness documentation)

But I have a scenario, when my K8s node is not ready, K8s determines that the node is not ready according to the node-monitor-grace-period (default 40s), then the pod running on this node will take at least 40s before The endpoint will be removed from the service, which means that during this period of time, some requests to access the service will fail to respond.
If the health check mechanism of the traefik service is available, this problem can be avoided through the health check of the traefik service.

Hi @fei
When you deploy Traefik on the Kubernetes cluster you can use liveness and readiness probes that are Kubernetes features.

Here are the examples:

 readinessProbe:
      failureThreshold: 1
      httpGet:
        path: /ping
        port: 9000
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 2
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /ping
        port: 9000
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 2

make sure that following static configuration is added to Traefik instance:

    - --entryPoints.traefik.address=:9000/tcp
    - --entryPoints.web.address=:8000/tcp
    - --entryPoints.websecure.address=:8443/tcp
    - --api.dashboard=true
    - --ping=true
    - --providers.kubernetescrd
    - --providers.kubernetesingress

You can also deploy Traefik using the official Helm chart and those health check will be automatically configured.

Thank you very much for your reply.

In addition to the health check of traefik itself, I also hope that traefik can perform health checks on routing services in K8s, for example:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: health-check-whoami-ingressroute
spec:
  entryPoints:
    - web
  routes:
  - match: Host(`health-check-whoami.traefik.demo.com`) && PathPrefix(`/notls`)
    kind: Rule
    services:
    - name: health-check-whoami
      port: 80
      healthCheck:
        path: /notls
        interval: "10s"
        timeout: "5s"

However, the current official document says that it does not support setting up health checks in K8s.
As mentioned above, sometimes when the node is not ready, it will take about 40s to perceive the endpoint change. If traefik can support the health check of the K8s service (eg:health-check-whoami), then the failed endpoint can be timely Move out of service list

2 Likes

Hey @fei

I understand that you are asking about health check checking the condition of a service where the network traffic is redirected.

For that use case, I would suggest adding liveness and readiness checks on a pod directly, and if the pod is not ready yet the network traffic will not be routed to the 'unhealthy' pod. The function

There are two types of the health checks

  1. Readiness.

That probe is designed to let Kubernetes know when your app (a service) is ready to service traffic. Kubernetes will make sure that the readiness check passed meaning that the service is healthy before sending traffic to the pod. If the probe will start to fail, the network traffic will not be sent.

  1. Liveness

That probe checks whether your app is live or dead. If the app is alive then Kubernetes will no take any actions but if the pod is dead then Kubernetes will remove the failed pod and start a new instance by replacing that failed.

So those mentioned health checks should be configured on a pod level and if the pod is healthy the network traffic will be correctly routed to them.

Hope that helps.

@jakubhajek Thank you for your reply.

The probe is a way to check the health of the pod, but I have an extreme scenario here. When the node is shut down, because the probe is done by kubelet, the node is shut down until it is marked notready by K8s. During the period (default 40s), the K8s endpoint will continue to include this pod.
In addition to modifying the node-monitor-grace-period configuration, I don't know that other methods can detect that the pod is unavailable more quickly and remove it from the endpoint.

We have a 2-node cluster with all our services/pods running on both nodes and a traefik ingress controller on each node too. If we take the one node down (for a reboot) the second node still blindly routes traffic to the non-existent node. It seems there is no "automatic" detection that the pods are down. Exactly 50% of ingress requests on the healthy node fail until the first node recovers.

Hello @cawoodm

Traefik is using Kubernetes endpoints to route network traffic directly to the pod. The endpoint will be created once the target application is healthy based on Kubernetes health check probes.

The (application) pod should have configured a health check probe (liveness, readiness) to determine whether an application is healthy and ready to accept incoming network traffic.

Can you please check how health checks probes are configured on your application?