High CPU usage by traefik

Hey, we've recently upgraded from Traefik v1 to v2 and are experiencing a really high CPU usage, especially if there are a bit more Websocket requests than usual, but even on a normal load traefik sometimes takes up 50-80% of the CPU. We can't really go back to v1 since we need some of the new features.
I was wondering if I was doing something wrong, or missing an important config or something similiar, heres the setup:

Traefik main node

  traefik:
    image: traefik:v2.1.2
    deploy:
      labels:
        - traefik.enable=true
        - traefik.service=traefik
        - traefik.http.services.traefik.loadbalancer.server.port=8080
        - traefik.http.routers.traefik_api.rule=Host(`status.domain`)
        - traefik.http.routers.traefik_api.entrypoints=https
        - traefik.http.routers.traefik_api.service=api@internal
        - traefik.http.routers.traefik_api.tls=true
        - traefik.http.routers.traefik_api.middlewares=global@file
        # HTTP to HTTPS redirection
        - traefik.http.routers.http_catchall.rule=HostRegexp(`{any:.+}`)
        - traefik.http.routers.http_catchall.entrypoints=http
        - traefik.http.routers.http_catchall.middlewares=https_redirect,global@file
        - traefik.http.middlewares.https_redirect.redirectscheme.scheme=https
        - traefik.http.middlewares.https_redirect.redirectscheme.permanent=true

Frontend service (all other services look similiar, just with a different name, port and without the redirect stuff)

      labels:
        - traefik.enable=true
        - traefik.service=ui
        - traefik.http.services.ui.loadbalancer.server.port=3000
        - traefik.http.routers.ui.rule=Host(`domain`, `sub1.domain`, `sub2.domain`)
        - traefik.http.routers.ui.entrypoints=https
        - traefik.http.routers.ui.tls=true
        - traefik.http.routers.ui.middlewares=global@file,sub1-redirect@docker,sub2-redirect@docker
        - traefik.http.middlewares.sub1-redirect.redirectregex.regex=^https://sub1\.domain/(.*)
        - traefik.http.middlewares.sub1-redirect.redirectregex.replacement=https://domain/section1/$${1}
        - traefik.http.middlewares.sub1-redirect.redirectregex.permanent=true
        - traefik.http.middlewares.sub2-redirect.redirectregex.regex=^https://sub2\.domain/(.*)
        - traefik.http.middlewares.sub2-redirect.redirectregex.replacement=https://domain/section2/$${1}
        - traefik.http.middlewares.sub2-redirect.redirectregex.permanent=true

This is the static config

# Enable API
[api]
dashboard = true

# Enable docker provider
[providers.docker]
endpoint = "unix://var/run/docker.sock"
swarmMode = true
exposedByDefault = false # So the other unrelated services don't get discovered
network = "domain_app"

# Dynamic config file provider
[providers.file]
filename = "/etc/traefik/dynamic.toml"

# EntryPoints
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.https]
address = ":443"

# Enable debug logs
[log]
level = "DEBUG"

And this is the dynamic file provider

# TLS config
[[tls.certificates]]
certFile = "/certs/live/domain/fullchain.pem"
keyFile = "/certs/live/domain/privkey.pem"

# Global middleware
[http.middlewares]
[http.middlewares.global.chain]
middlewares = ["global-compress", "global-retry"]
[http.middlewares.global-retry.retry]
attempts = 5
[http.middlewares.global-compress.compress]

Hello @Nakroma,

Do you think that you could enable the debug flag in the API (https://docs.traefik.io/v2.1/operations/api/#debug) and attach the CPU and heap pprof traces to this issue here?
(more info on how to do that here: https://github.com/containous/traefik/issues/2673#issuecomment-373022820)
It would allow us to debug what is causing your CPU usage.

1 Like

Here are the files under heavy load: pprof.zip
Note that the profile file was only taken over 15 seconds instead of the normal 30 because Cloudflare would time me out otherwise.