Load balancing active web socket connections in kubernetes

Hello community!

While working on a project we faced the issue to have a proper load balancing with a massive amount of open web socket connections within our kubernetes cluster managed with traefik. Imagine you have 4 running instances of a web socket backend running and clients can connect to an instance over a load balancer. The other direction would be to trigger messages over these active client connections also over a separate client using the same backend. After setting up everything properly two challenges appeared:

  • Have a reliable connection from the websocket client to the backend to send the websocket requests to the correct backend instance: The solution we found was sticky sessions, working for us.

  • Send from a different client requests to the same websocket backend to trigger actions over an existing websocket connection of an active client device:

  1. Possible solution would be to implement the pub/sub pattern that the backend will create an topic foreach web socket client and subscribes to it in order to handle messages published over this specific topic and send them to the connected client.

  2. Another solution we thought about was to implement a traefik middleware/plugin to directly distribute the requests to the right kubernetes instance with the existing web socket connection based on a session token, kind of a load balancing mechanism to solve this persistent connection problem. Therefore some kind of mechanism needs to be implemented to remember foreach new client connection a session token and save it and a key value store in order to retrieve the instance and port where the client connection has been opened. Afterwards based on the session token the correct instance can be retrieved and the request can be forwarded to the existing websocket connection.

The question is now: Is it possible to implement the second solution with the current possibilities in traefik? If yes can this achieved by using the plugin and middleware mechanism?

Thanks in advance for your help!

Best regards