Slow https performance

Hi all,

I've deployed Traefik v2 on a GCE Kubernetes cluster, and ran ab for raw performance benchmarks. Those benchmarks dissapointed, and I've been hunting down where the performance drop occurs. After some research, I suspect that the TLS handling is the problem.

Setup

Kubernetes cluster on Google Cloud, 2 nodes, each with 2vCPU and 7GB of memory. Traefik is sitting behind a LoadBalancer Service, DNS is configured to point at the LB public IP adress and all services can be reached. Traefik is configured roughly following https://docs.traefik.io/v2.0/user-guides/crd-acme/

The relevant deployed configurations can be found on github

For this app in particular, we have client -> LoadBalancer -> Traefik (2 replicas) -> nginx (2 replicas) -> django app (3 replicas)

Testing performed

All testing is done using Apache Benchmark (ab), using 1000 requests with concurrency 100, against an endpoint that returns a small amount of JSON, not hitting any database (so IO is minimal).

  1. Testing the container itself. In the django pod, running ab against itself or the service (spreading over all replicas) yielded 500-700 req/s
  2. Testing the nginx reverse proxy: running ab from an nginx pod agaqinst itself or the service (spreading over 2 replicas) yielded around 500 req/s
  3. Testing Traefik directly: running ab from a Traefik pod against the nginx service yielded still around 500 req/s
  4. Testing Traefik by hitting localhost over HTTP - the responses are (as expected) redirects to HTTPS, hitting 12000 req/s
  5. Traefik pod itself, hitting https://localhost (with correct Host: ... header) slows down to about 50 req/s
  6. Hitting the app from outside the cluster (my laptop as client) still shows about 50 req/s, over HTTPS

Parameters changed

I've tried tweaking more/less replicas for everything, assigning higher resource requests/limits for the deployments of Traefik, nginx and django, but none of those really have a big impact, performance stays around 50 req/s.

Now what?

I'm out of ideas on what can be tweaked or tested to try to improve on this. If you need any more information, please let me know.