Scaling socket.io servers on Docker Swarm with Traefik

sowmenappd · November 23, 2022, 4:56am

I'm looking for advise or accurate directions regarding the configuration of my socket.io servers running behind a Traefik load-balancer. My infrastructure is hosted on AWS, and there is 1 EC2 instance that is currently running as a manager node for Traefik service. The other EC2 instances are worker nodes and the socket.io nodejs app is running on these instances.

Currently all network is routed through the manager node (which hosts the traefik service) and hence all the socket io connections are routed through it as well (to the socket.io service cluster in other worker nodes i.e. EC2 instances). There is only traefik service running on a single manager node. My concern are:

If I were to expect a lot of websocket users/connections (millions of connections), would this configuration be scalable and okay?
I'm not exactly sure, but would having only a single manager node somehow cause a network throughput bottleneck for the websocket connections and cause the websocket service to be unstable?

Looking forward to some answers. Thanks!

bluepuma77 · November 23, 2022, 8:51am

In general you can scale Traefik and your app horizontally, meaning you can just add more servers. Docker Swarm is happy with 3+ manager nodes (fault tolerance) and can handle 1000+ worker nodes.

The real challenge from my point of view is how you handle your single-point-of-failure, that is your single node with the IP address. What happens if the node, Docker or Traefik crashes?

One solution: we use Docker Swarm, Traefik and 100+ services with different sub-domains. We use an extra load-balancer in front of the whole setup, assuming that our provider can handle the first node better than we can. I am sure AWS provides similar services.

There are many more things to consider. Open connections per node is one, but also be aware that NodeJS is usually a single-threaded application. If you have multiple CPUs or threads available on your server, your app can not necessarily use those. Either you enable your app for multi-threading or you create as many app containers as you have CPUs/threads available on the node. But of course it all depends on CPU, RAM, networking usage, it's all about the balance - and about monitoring, too, to understand what's happening and where the bottlenecks are.

system · March 20, 2023, 8:25am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Socket IO and Traefik Scale Help Traefik v2 docker	1	448	March 31, 2021
Traefik High Availability, Docker Swarm (2 manager, 1 worker) Traefik v3 (latest) docker , docker-swarm	7	231	December 15, 2024
Running treafik in docker swarm in non-manager node (only in worker node) Traefik v2 docker-swarm	3	984	January 4, 2024
Websockets don't work in docker swarm with scale > 1 Traefik v1 docker-swarm	4	2159	March 9, 2020
[SOLVED] Setup Docker Swarm + Traefik 2.4 + domain-based routing on bare metal with CLI? Traefik v2 docker-swarm	7	5462	May 21, 2021

Scaling socket.io servers on Docker Swarm with Traefik

Related topics