I noticed this recently while working on a zero downtime deployment approach with Traefik. I don't know if this is the intended behaviour for Traefik or not, nor if this issue has already been reported, but I thought I'd raise it just in case. Hope it helps! If you need any more information, please do let me know.
Expected Behaviour
Traefik healthchecks for services do not cause persistent HTTP connections between Traefik and the back end servers to be closed.
Actual Behaviour
Traefik healthchecks for services cause persistent HTTP connections between Traefik and the back end servers to be closed.
Steps to Reproduce
I was able to reproduce this using the "basic-example" from the quickstart guide.
- Download the "basic-example"
docker-compose.yml
file from the Traefik GitHub repo. - Uncomment the DEBUG log level line in the command section.
- Start the services using
docker-compose
. - Start a shell in the traefik container.
- In the shell, check the established connections involving port 80 (e.g.
netstat -an | grep ":80 " | grep "ESTABLISHED"
) and confirm that there aren't any. - Hit http://whoami.localhost in the browser (once).
- Back in the shell, check the established connections involving port 80. Confirm that there's a connection from the host to the traefik container, and another from the traefik container to the whoami container.
- Continue to check until both connections have been closed - this should take 90 - 120 seconds.
- Close the shell and stop the services using
docker-compose
. - Modify the docker-compose file to add a simple healthcheck for the whoami service, e.g.:
"traefik.http.services.whoami.loadbalancer.healthcheck.path=/"
Best to allow the default interval to be used, which IIRC is 30 seconds. - Start the services using
docker-compose
. - Start a shell in the traefik container.
- In the shell, check the established connections involving port 80 and confirm that there aren't any.
- Monitor the logs, and wait until a health check refresh entry ("Refreshing health check for backend: whoami@docker") appears.
- Hit http://whoami.localhost in the browser(once).
- Back in the shell, check the established connections involving port 80. Confirm that there's a connection from the host to the traefik container, and another from the traefik container to the whoami container.
- Check a few more times over a few seconds to confirm that both connections remain.
- Wait for the next health check refresh.
- Check the connections again, and note that there's only one - the one from the host to the traefik container. The connection from the traefik container to the whoami container has been closed.
Of particular note here is that the default healthcheck interval is 30 seconds, which is significantly shorter than the amount of time the persistent connection between the traefik container and the whoami container remained open (~90 seconds) when the healthcheck was not enabled.
Notes
I did some research, and it seems that this may be being caused by the fact the code for the health check closes the response body without reading the entire body first. As per the go documentation for the Do function for net.http.Client:
If the Body is not both read to EOF and closed, the Client's underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent "keep-alive" request.
I wrote a tiny client server setup to check this, and confirmed that the connection does persist and is reused while the client reads the entire response body before closing it, and that the connection is closed if it simply closed the response body without reading the entire response body first.