Hello everyone,
We are using traefik for quite some time now in our production deployment of https://orderlion.com.
We have 2 identical docker containers running our webapp/webserver, which are deployed behind a traefik load balancer.
The issue we are running into regularly is the following:
- Due to high load or maybe a bug, let's say instance #1 experiences a very high load over a longer period of time
- the traefik health check recognizes this, marks #1 as unhealthy and redirects traffic to instance #2
- we also have a docker healthcheck in place (with a longer timeout)
- if the CPU load stays high, also docker marks the container as unhealthy and restarts container #1
- since #1 is now offline, ALL traffic is rerouted to #2 and ALL users are now on #2
- after #1 is successfully restarted, ALL users stay in #2, because of sticky sessions
- now #1 is pretty much not used at all and #2 experiences all the load, making our app very slow
I hope this makes sense to you. --> how can we "force" traefik to, after #1 comes back online, also move some of the users back from #2 to #1 to "even out" the load?
Tech:
- Ubuntu 20.04
- up-to-date docker
- traefik 2.9.6
Thank you, best
Patrick