How do I set up a healthcheck that connects to the same node and runs through Traefik?
My use case is that I had a healthcheck set up for a service, and it was passing because inside the service the webserver was reachable.
However, when one particular node was serving the request it would return 502 Bad Gateway. Restarting the service solved the problem.
Now if I just do a simple http check with the domain name included, I would get any one of the load-balanced nodes randomly, right? How can I hit the node that this particular container is running on?
I already have a healthcheck for Traefik, and I have a healthcheck for my service. Both healthchecks pass.
The problem is that I can still get a 502 Bad Gateway when actually trying to access the site.
I guess it has something to do with that container not being routed properly by either Traefik or Swarm. I don't know why this happens sometimes, but restarting the container solves it.
So, I need to be able to check, from the container of the service on each node, if that specific container is reachable on that node.
Thanks again for the reply. the service health checks are a start, but it's not quite what I'm looking for. They will remove the node from load-balancing but it will not restart the container. I am looking for a way to use the Swarm service health checks, as it will restart containers that are not working. Taking them out of the rotation is just a temporary measure, it doesn't heal the service.
So, to re-phrase my question: If I was one of those containers, how can I wget an external url that is sure to route back to this same node?
Seems that your question is specifically related to the Swarm / Docker.
If you have a container with a custom app you need to prepare the healthcheck accordingly e.g. if it is the app that opens a port 8080, you can still use wget to check whether the app inside the container will respond with the status 200.
You can refer to my example configuration for Traefik healthcheck. Traefik has a built-in endpoint to validate what is the condition of the application.
Traefik has also feature to validate the condition of the service and remove unhealthy containers from the load balancer: Services - Traefik
Again, it refers to the specific endpoint created on your app (/heatlhz).
Referring to your question on Stack Overflow, "I would like Traefik to restart unhealthy services" - Traefik will not restart unhealthy services. This is the responsibility of cluster orchestration tools to make sure whether your services are up and running - and are ready to accept incoming requests.
On Kubernetes, you can have two types of health checks: readiness and liveness.
Readiness probe is to let Kubernetes know when your application is ready to accept incoming traffic. Kubernetes will make sure that readiness probes pass before sending requests through service to a pod. If anything is wrong with readiness probes Kubernetes will stop sending traffic to it until the readiness probes will pass again.
Liveness probes have been designed to let Kubernetes know whether your application is alive or dead. if the application is not alive Kubernetes will remove the pod and start the new instance of it.
Traefik sends requests through endpoints that are exposed thanks to the Kubernetes service.
Regarding Docker Swarm you can also create Healthcheck on a containers level and also use health check on a service level, as is described in the documentation. You can also use the order: start-first as we also described in that thread.
Thank you Jakub for that detailed response. I shall investigate further then how to solve my bad response error by looking into these different healthcheck possibilities.