Heavy performance impact when using traefik on kubernetes

Hey,
we are running traefik v2 2.5.4 (Chart version 10.6.2) on our kubernetes cluster as ingress. We use custom CRDs to define IngressRoutes. We noticed that our docker pushes to a registry (harbor) behind traefik were really slow so we ran a few tests that tested pure web traffic in a few different scenarios to isolate the problem. For the tests we used whoami/bench as a server and wrk as a client.

Problem description and setup

Our traefik configuration has the following overrides:

deployment:
  initContainers:
  - command:
    - sh
    - -c
    - chmod -Rv 600 /data/*
    image: busybox:1.31.1
    name: volume-permissions
    volumeMounts:
    - mountPath: /data
      name: data
env:
- name: CF_DNS_API_TOKEN
  valueFrom:
    secretKeyRef:
      key: token
      name: cloudflare-apitoken
globalArguments:
- --certificatesresolvers.le.acme.caserver=https://acme-v02.api.letsencrypt.org/directory
- --certificatesresolvers.le.acme.dnschallenge=true
- --certificatesresolvers.le.acme.dnschallenge.provider=cloudflare
- --certificatesresolvers.le.acme.storage=/data/acme.json
- --certificatesresolvers.le.acme.dnschallenge.resolvers=1.1.1.1:53,8.8.8.8:53
ingressRoute:
  dashboard:
    enabled: false
logs:
  general:
    level: INFO
persistence:
  enabled: true
  storageClass: longhorn
ports:
  web-int:
    expose: false
    port: 9080
    protocol: TCP
  websecure-int:
    expose: false
    port: 9443
    protocol: TCP
    tls:
      certResolver: ""
      domains: []
      enabled: false
      options: ""
providers:
  kubernetesCRD:
    allowCrossNamespace: true
service:
  externalIPs:
  - X.X.X.X

Nothing big except adding lets encrypt, although the following tests were performed without tls to rule that out. Also we testes the network between the agents that host the pods and it was no bottleneck. For all the tests we made sure that the server and client pod ran on differrent nodes.

The three pods involved in the tests are:

The wrk pod which is the client doing the tests

apiVersion: v1
kind: Pod
metadata:
  name: potts
  namespace: test
spec:
  containers:
    - name: web
      image: skandyla/wrk
      command:
        - sleep 
        - "400000000"

The server whoami/bench pod which handles the incoming requests

apiVersion: v1
kind: Pod
metadata:
  name: whoami
  namespace: test
  labels:
    role: whoami
spec:
  containers:
    - name: whoami
      image: containous/whoami
      ports:
        - name: web
          containerPort: 80
          protocol: TCP

A service which can expose the whoami server:

apiVersion: v1
kind: Service
metadata:
  name: my-whomi
  namespace: test
spec:
  selector:
    role: whoami
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

Tests

Baseline

To have a baseline we let the two pods talk directly to each other, not using a service or traefik but their internal cluster IP. This is as fast is it could theoretically get:
wrk -t20 -c1000 -d60s -H "Host: doesnt.matter" --latency http://(internal pod IP of whoami):80/bench

Running 1m test @ http://10.12.162.41:80/bench
  20 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.36ms    6.20ms 237.66ms   98.87%
    Req/Sec     9.71k     1.53k   28.32k    82.03%
  Latency Distribution
     50%    4.89ms
     75%    5.50ms
     90%    6.22ms
     99%   12.08ms
  8865524 requests in 1.00m, 1.04GB read
  Socket errors: connect 0, read 2869, write 1715713, timeout 0
Requests/sec: 147525.20
Transfer/sec:     17.73MB

We ran the test a few times and got values of around 16-17 MB/s every time.

Using a cluster service

This is more of an additional test. For this test we used the cluster internal DNS (over a service name). The same service that will be used by the ingress route in the traefik test.
wrk -t20 -c1000 -d60s -H "Host: doesnt.matter" --latency http:/(whoami-service):80/bench

Running 1m test @ http://my-whoami.test:80/bench
  20 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.69ms    7.58ms 872.81ms   99.12%
    Req/Sec     9.29k     1.47k   47.03k    88.16%
  Latency Distribution
     50%    5.19ms
     75%    5.71ms
     90%    6.49ms
     99%   12.43ms
  6960258 requests in 1.00m, 836.37MB read
  Socket errors: connect 0, read 3643, write 177908, timeout 0
Requests/sec: 115845.62
Transfer/sec:     13.92MB

There was decline in performance with values around 13MB/s in multiple tests.

Using traefik

We added an IngressRoute that goes to that service and used the the IP of the control plane so the traffic would be routed over traefik.
wrk -t20 -c1000 -d60s -H "Host: whoami.test" --latency http://(IP of cotnrol plane)0:80/bench

Running 1m test @ http://whoami.test:80/bench
  20 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    72.13ms   46.64ms 269.60ms   63.64%
    Req/Sec   709.05     84.82     1.59k    70.17%
  Latency Distribution
     50%   72.93ms
     75%  104.90ms
     90%  136.15ms
     99%  175.82ms
  847338 requests in 1.00m, 82.42MB read
Requests/sec:  14104.06
Transfer/sec:      1.37MB

We did expect a drop in performance but the drop looks like 10x when the traffic goes through traefik.

Conclusion

So we are not sure at the moment how to tackle this. Is this expected or did we misconfigure something or would need additional configuration to make this flow fast? Any ideas are welcome :slight_smile:
Cheers

1 Like

Hey Tom,

Thanks for reaching out! Our engineering team is looking at this and someone will be getting back to you soon.

Best,
Tiffany
Traefik OSS Community Mannager

Hello @tom-mayer

Thanks a lot for using Traefik.

We appreciate your effort you took to perform those benchmarks and posting the results here. We've tried to reproduce similar scenario on our side and our results where quite close to yours. However, we exposed the pod using standard Kubernetes Load Balancer type service and executed the benchmark against the new endpoint that has been created.

Based on the Helm values you shared we assume that you use load balancer service so there is another layer before request hit Traefik.

Would you please do the same and compare results between hitting a test application through the Ingress (Layer 7) and than through the LB service type (Layer 4)?

Kind Regards,
Jakub

Hello @jakubhajek,

thanks for taking your time to reproduce that. We actually tried to detach the entire case from the complexity of kubernetes, it's service LB and all those bits.

By that, we reproduced the setup within docker in a docker-compose stack. Please see EugenMayer/lb-benchmarks

Here you can see, in the whoami/wrk setup, that Traefik loses by far against all the other LBs: nginx, haproxy, contour. Traefik seems to play in a different league (in the bad sense) here.

Interestingly, in the minio/warp based setup, traefik performs fairly solid and is in the top tier.
If you test those, let me know about your results.

So that said, I do not think this is in any way kubernetes, provider network related or anything else. Here we have a local, no overlay network setup, just iptables/nat in a bridge, that it is right. But the results look very similar, so it looks indeed like an issue within traefik.

Do you have any other interpretations?

Hey tom-mayer and EugenMayer,

I just wanted to let you know that while we have been slow to respond due to the holidays, we are taking this seriously and our engineering team is discussing this tomorrow. We will get back to you with more info afterwards.

Best,
Tiffany

Hi, everyone. While we wait for engineering to investigate this further, I want to chime in with some thoughts about performance that may be skewing your results.

Pod to Pod and Pod to Service interactions are at the kernel level via iptables, so they're going to be very fast. Communication through Traefik (or any other application) is going to be CPU bound, and if you're also slamming the system with benchmark tests, that's going to affect performance of the ingress controller.

I recommend you run the wrk container from outside of the cluster to remove that piece from the CPU's workload. It's designed to run wrk as the entrypoint, so you can simply do something like:

docker run -it --rm skandyla/wrk -t20 -c1000 -d60s -H "Host: whoami.test" --latency http://demo-b.home.monach.us/bench

Then I recommend that you run the Traefik Pod and the whoami Pod on different nodes, and that the nodes are sized appropriately for the work that you're throwing at them. If your CPU goes to 100% across all cores on either node, then you don't really have a fair test. Use bigger nodes, or lower the concurrency and threads on the wrk workload.

Finally, I recommend that you have enough Pods behind Traefik to serve all the requests being thrown at it. In my tests, it took 10 replicas of the whoami Pod to handle the parameters in @tom-mayer's examples.

For the tests run by @EugenMayer, it seems that the only place Traefik did poorly was in the test against the whoami container. My tests showed the whoami container to be a poor choice for benchmarking. I was able to get 5x performance by switching out the target container with a stock nginx container. The whoami container seems to get saturated and start throwing errors which skew the test results.

You don't see these errors when you're using Traefik, but if you repeat the same test behind ingress-nginx, you'll see the errors appear in the wrk output:

  Non-2xx or 3xx responses: 116142

Furthermore, where I needed 10 replicas of whoami to handle the load with minimal errors, I only needed 1 replica of nginx to run the test. That implies that whoami is a poor choice for benchmarking.

Would you be willing to re-run your tests with nginx as the target workload, hitting / with the wrk workload instead of /bench?

Hello Everyone!

Thanks once again for spending time and publishing your results.

Just to keep you informed we are performing our benchmarking tests and we will publish the results as soon as it is possible.

Thank you,

@adriandotgoins i'am not sure why you start with the pod 2 pod topic here, since we removed this layer entirely, that is why we started to test on a simple docker engine only. I would suggest we keep k8s out of the picture entire, as long as we can reproduct the issue with a far more simple setup, which we did.

whoami
I cannot deny the argument that whoami might be saturated at some point. But neither it cannot be denied, that if this is the bottleneck, it would be one for all the tested loadbalancers. Which is not the case (not by far).
So in this particular case, whoami might have reduced the overall result of envoy from 12MB to potential 15MB or whatever, but not caped traefik at 1.5MB and all others at 12MB - that does not make sense to me. Do you agree?

We did not pick the whoami/wrk combination out of the blue, actually we used the one published by Traefik -
Benchmarks - Træfik | Site | v1.4. Yes, it is Traefik 1.x, but by no means it should be invalid, our of a sudden, right?

re-run tests
I would say, right now i'am not convinced just to pick "some other benchmark scenario" just because this particular one does not perform well with traefik (due to the arguments above).

Whoami would limit/cap the load-balancers the same way, so if the max through-put of whoami in this setup would be 1.5MBs, envoy would not just get 12MBs.

If you could explain to me, why this still make sense or what i might have been overlooking, i'am happy to give it as spin. Maybe just provide a PR to the repo i linked with the nginx service as the backend, so i can just run them. Just add a additional scenario by copying whoami to nginx and replacing the service.

Thank you all for actually really looking into this and trying to find the cause!

The OP reported the issue on Kubernetes, and although you both seem to share the same last name, I wouldn't presume that you are the same person or even related. Kubernetes doesn't add overhead or complexity - once things are running, they're running in the context of the CRI - Docker or containerd or whatever else. Running it on Docker doesn't make it any more or less complex or change how it performs. You can shut down the entire Kubernetes control plane, and you'll be left with standalone containers and iptables rules, just like you have with Docker.

Your argument about other engines reporting equal throughput with whoami is valid, although we don't know how those other engines respond if whoami is saturated. Changing the backend isn't really "another benchmark scenario." It's still an HTTP GET, but you're changing the engine processing that request. As long as there's another piece behind Traefik, you can't say with 100% certainty that the problem is with Traefik. I find it curious that the only place where Traefik performed worse was with the whoami container, so it's a valid troubleshooting exercise to try it with something else.

The engineering department is investigating, and they'll have more information for you soon. If you decide to re-run the tests with a different backend, please share your results.

1 Like

The way that everyone on the thread has explained thought processes and reasoning is really helpful for learning as well. Thanks for sharing all of these findings, and I will continue to watch this thread.

1 Like