I'm exploring Traefik to provide ingress for a densely-packed Kubernetes cluster which hosts hundreds of thousands of websites.
Are there any case studies from companies that are operating at a similar scale? What are the factors I would need to consider when operating Traefik at this scale?
For context, here is the rough scale that I'm designing for:
- 200,000 HTTP routes, matching on Host
- Backed by ~1,000 Services
- Route configuration changes a few times every hour on average
- TLS is terminated by the CDN in front of this layer, so Traefik only needs a single SSL certificate to secure the connection from the CDN
To support this, I think we'd have ~1000 Ingress resources (1 per service). Each Ingress resource would have ~200 routes on average. There would be an Ingress resource changed every several minutes, but during maintenance operations we might experience a much higher rate of change.