Draining Traefik itself, not accepting new connections?

bluepuma77 · August 26, 2022, 8:19am

We are running 3 Traefik instances on bare metal servers behind a provider's load balancer. We are trying to stick with "Zero Downtime Deployments", but the load balancer does not allow draining a target, which makes it complicated to update a Traefik node without interrupting requests.

Is it possible to drain Traefik itself? Meaning Traefik will not accept new connections, ongoing connections will still be handled until they are closed by client or server. The load balancer would then not route new requests to the node and after some time we could easily upgrade the node without any interrupted requests.

douglasdtm · August 30, 2022, 6:20pm

Hi @bluepuma77

Unfortunately its not possible at the moment. There is an open issue to track that request here so feel free to jump in the discussions and provide your suggestions on the design and how to avoid the problems that are highlighted in the thread.

bluepuma77 · August 31, 2022, 9:31am

@douglasdtm My intention is different from the issue.

We can drain services/containers on every node by just shutting them down, because when the app receive the shutdown signal, we close the listening port, but continue to process ongoing requests till the end (for 10 secs), then shut down completely. So Traefik is not routing new requests to those services/containers, as it can not connect anymore, but ongoing requests will not be interrupted.

What I want is to drain my traefik instance completely to update the traefik host. My traefik instances are sitting behind a load balancer. I want the same behaviour as mentioned above: traefik to do not accept any new connections, finalise ongoing requests and then stop. Afterwards I can update and reboot the host without interrupting any ongoing requests through traefik (zero downtime).

Sidenote: if people have problems "zero downtime" updating their services, they should probably look into their "update-delay" between containers. I think the Traefik Docker Swarm service discovery is polling every 15 seconds by default ( swarmModeRefreshSeconds), so they should probably have at least 30 seconds between updates.

We mostly use those settings for few containers of a service:

docker service update \
  --update-order stop-first \
  --stop-grace-period 10s \
  --update-delay 30s \
  --update-parallelism 1 \
  --update-failure-action rollback \
  --rollback-order stop-first
  ...

douglasdtm · August 31, 2022, 6:17pm

I still think both use cases are related or dependent, since we need to support draining at the service level to then be able to provide a global drain feature, or api endpoint.

Thank you for the side note, I think it's vauable info for anyone getting here!

Topic		Replies	Views
Draining backend Traefik v2 kubernetes-ingress	0	565	April 27, 2021
Is it possible to shut down Traefik gracefully (to finish ongoing requests)? Traefik v2 docker , cli	2	1869	September 2, 2022
Traefik Stops Processing Requests (Hangs) Leading to OOM Traefik v1 docker , marathon , mesos	0	803	February 2, 2021
Is there way to put server node down for maintenance over API? Traefik v2 rest-api , cli	6	1098	November 26, 2022
Update Traefik without service disruption? Traefik v2 docker	13	828	October 6, 2023

Draining Traefik itself, not accepting new connections?

Related topics