Thanks again for the reply. the service health checks are a start, but it's not quite what I'm looking for. They will remove the node from load-balancing but it will not restart the container. I am looking for a way to use the Swarm service health checks, as it will restart containers that are not working. Taking them out of the rotation is just a temporary measure, it doesn't heal the service.
So, to re-phrase my question: If I was one of those containers, how can I wget an external url that is sure to route back to this same node?
Seems that your question is specifically related to the Swarm / Docker.
If you have a container with a custom app you need to prepare the healthcheck accordingly e.g. if it is the app that opens a port 8080, you can still use wget to check whether the app inside the container will respond with the status 200.
You can refer to my example configuration for Traefik healthcheck. Traefik has a built-in endpoint to validate what is the condition of the application.
Traefik has also feature to validate the condition of the service and remove unhealthy containers from the load balancer: Services - Traefik
Again, it refers to the specific endpoint created on your app (/heatlhz).
Referring to your question on Stack Overflow, "I would like Traefik to restart unhealthy services" - Traefik will not restart unhealthy services. This is the responsibility of cluster orchestration tools to make sure whether your services are up and running - and are ready to accept incoming requests.
On Kubernetes, you can have two types of health checks: readiness and liveness.
Readiness probe is to let Kubernetes know when your application is ready to accept incoming traffic. Kubernetes will make sure that readiness probes pass before sending requests through service to a pod. If anything is wrong with readiness probes Kubernetes will stop sending traffic to it until the readiness probes will pass again.
Liveness probes have been designed to let Kubernetes know whether your application is alive or dead. if the application is not alive Kubernetes will remove the pod and start the new instance of it.
Traefik sends requests through endpoints that are exposed thanks to the Kubernetes service.
Regarding Docker Swarm you can also create Healthcheck on a containers level and also use health check on a service level, as is described in the documentation. You can also use the order: start-first as we also described in that thread.