Documentation for scaling up traefik deployments

I have a frontend single-node kubernetes cluster (k3s) running traefik which I use to proxy requests to various services running on different kubernetes clusters (also using k3s/traefik). Everything is configured with middlewares to redirect http->https, handles all TLS certs, and other security settings (I've even got custom default certs and TLSOptions set to only accept TLS >= 1.2 across the board). Traefik is freakin amazing!

However, I am having trouble scaling up this solution - mostly on the first frontend server. I can drop in Apache 2.4 right now and have been able to configure it to handle significantly more load (the backends are not being overwhelmed in any way AFAIK). Yet, with Traefik, I am seeing the "This site cannot be reached" errors.

I love how easy it is to configure traefik (especially in k8s - really great), but I am having a hard time tracking down documentation on how to get it to scale up. I am guessing it's configuration of the host OS (I am using Debian 11 on the frontend) in some way limiting what traefik can do, but does anyone know of any good resources out there for understanding what needs to be done? I couldn't find anything in the docs/blog that really addressed this type of thing, but maybe I am just not using the right keywords?

I also realize that having multiple servers for the frontend would help with this, but shouldn't there be a way to out-compete apache in this setup? Am I just asking for too much from traefik in this scenario? Why is apache doing better at scaling up at the moment? I was able to find some documentation on how to increase apache performance (number of workers, connections, etc.). Just can't find good equivalent guides for traefik currently. What do I need to adjust to make traefik work better?

Any pointers/references/help would be appreciated.

1 Like