We are running 3 Traefik instances on bare metal servers behind a provider's load balancer. We are trying to stick with "Zero Downtime Deployments", but the load balancer does not allow draining a target, which makes it complicated to update a Traefik node without interrupting requests.
Is it possible to drain Traefik itself? Meaning Traefik will not accept new connections, ongoing connections will still be handled until they are closed by client or server. The load balancer would then not route new requests to the node and after some time we could easily upgrade the node without any interrupted requests.
Hi @bluepuma77
Unfortunately its not possible at the moment. There is an open issue to track that request here so feel free to jump in the discussions and provide your suggestions on the design and how to avoid the problems that are highlighted in the thread.
@douglasdtm My intention is different from the issue.
We can drain services/containers on every node by just shutting them down, because when the app receive the shutdown signal, we close the listening port, but continue to process ongoing requests till the end (for 10 secs), then shut down completely. So Traefik is not routing new requests to those services/containers, as it can not connect anymore, but ongoing requests will not be interrupted.
What I want is to drain my traefik instance completely to update the traefik host. My traefik instances are sitting behind a load balancer. I want the same behaviour as mentioned above: traefik to do not accept any new connections, finalise ongoing requests and then stop. Afterwards I can update and reboot the host without interrupting any ongoing requests through traefik (zero downtime).
Sidenote: if people have problems "zero downtime" updating their services, they should probably look into their "update-delay" between containers. I think the Traefik Docker Swarm service discovery is polling every 15 seconds by default ( swarmModeRefreshSeconds
), so they should probably have at least 30 seconds between updates.
We mostly use those settings for few containers of a service:
docker service update \
--update-order stop-first \
--stop-grace-period 10s \
--update-delay 30s \
--update-parallelism 1 \
--update-failure-action rollback \
--rollback-order stop-first
...
I still think both use cases are related or dependent, since we need to support draining at the service level to then be able to provide a global drain feature, or api endpoint.
Thank you for the side note, I think it's vauable info for anyone getting here!