High CPU usage in double traefik setup

Hey,
I noticed a very high cpu usage with moderate load on my double traefik setup.
The 2 traefik instances are connected with wireguard and the public facing vps traefik has 50% CPU usage with a 2 core Intel Xeon Gold machine and 4GB ram.
I don't see this usage on my homelab one as its maxing out at 5% CPU usage with a 4 core Intel 14900K and 4GB ram machine.

Normal browsing is fine but when I start streaming videos from e.g. Immich it spikes and two videos at the same time arent even possible.

My config files:
VPS:
docker-compose.yml:

services:
  traefik:
    image: traefik:latest
    container_name: traefik-vps
    restart: unless-stopped
    security_opt:
      - no-new-privileges:true
    networks:
      vpsnet:
    ports:
      - 80:80
      - 443:443
    environment:
      - TZ="Europe/Berlin"
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./traefik.yml:/traefik.yml:ro
      - ./acme/acme.json:/acme/acme.json
      - ./config.yml:/config.yml:ro
      - ./logs:/var/log/traefik
      - ./certs/:/etc/traefik/certs:ro

networks:
  vpsnet:
    name: vpsnet
    external: true

config.yaml

---
tcp:

  routers:

    tunnel-domain1: 
      rule: "HostSNI(`domain1.com`) || HostSNIRegexp(`^.+\\.domain1\\.com$`)"
      tls: 
        passthrough: true
      service: tunnel-svc-opnsense
      entryPoints:  
        - http
        - https

    tunnel-domain2: 
      rule: "HostSNI(`domain2.com`) || HostSNIRegexp(`^.+\\.domain2\\.com$`)"
      tls: 
        passthrough: true
      service: tunnel-svc-opnsense
      entryPoints:  
        - http
        - https

  services:

    tunnel-svc-opnsense:
      loadBalancer:
        proxyProtocol: true
        servers:
          - address: "192.168.10.21:443" # WG Client IP - Homelab Traefik IP

traefik.yaml:

### Traefik-vps static configuration file ###
api:
  debug: true
log:
  level: "ERROR"
  filePath: "/var/log/traefik/traefik.log"
accessLog:
  filePath: "/var/log/traefik/access.log"
entryPoints:
  # Wireguard Tunnel 
  http:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
          permanent: true

  https:
    address: ":443"
    proxyProtocol:
      trustedIPs:
        - "10.0.0.0/24" # WG IPs 

providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: "vpsnet"
  file:
    filename: /config.yml

Homelab:
docker-compose.yml

services:
  traefik:
    image: traefik:latest
    container_name: traefik-1001
    restart: unless-stopped
    security_opt:
      - no-new-privileges:true
    networks:
      frontnet:
    ports:
      - 80:80
      - 443:443
      - 8080:8080
    environment:
      - TZ="Europe/Berlin"

    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./config-files/traefik.yml:/etc/traefik/traefik.yml:ro
      - ./config-files/middlewares.yml:/etc/traefik/middlewares.yml:ro
      - ./config-files/tls.yml:/etc/traefik/tls.yml:ro
      - ./config-files/config.yml:/etc/traefik/config.yml:ro
      - ./logs:/var/log/traefik
      - ./certs/:/etc/traefik/certs:ro
      - ./acme/acme.json:/acme/acme.json      

networks:
  frontnet:
    name: frontnet
    external: true

config.yaml:

---
# Traefik-1001 dynamic configuration file
http:
  routers:
    immich-rtr: 
      rule: "Host(`photos.domain1.de`)"
      entryPoints: 
        - "https"
      priority: 10
      service: immich-svc  
      middlewares:
        - immich-header-mdw

  services:
    immich-svc:
      loadBalancer:
        servers:
          - url: "http://192.168.10.106:2283"
        passHostHeader: true

traefik.yml:

### Traefik-1001 static configuration file ###

api:
  dashboard: true
  debug: true
  insecure: false

log:
  level: "DEBUG"
  filePath: "/var/log/traefik/traefik.log"

accessLog:
  filePath: "/var/log/traefik/access.log"

entryPoints:
  http:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
          permanent: true

                  
  https:
    address: ":443"
    asDefault: true
    forwardedHeaders:
      trustedIPs:
        - "10.0.0.1/32"
        - "138.201.90.200/32"
    proxyProtocol:
      trustedIPs:
        - "10.0.0.0/24" # WireGuard Subnet
    http:
      tls:
        options: modern-tls
        certResolver: "myresolver"
      middlewares:
        - crowdsec


serversTransport:
  insecureSkipVerify: false

providers:

  file:
    directory: /etc/traefik
    watch: true


experimental:
  plugins:
    bouncer:
      moduleName: "github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin"
      version: "v1.3.5"

certificatesResolvers:
  myresolver:
    acme:
      email: REDACTED@REDACTED.com
      storage: /acme/acme.json
      # caServer: https://acme-staging-v02.api.letsencrypt.org/directory
      # keyType: 'EC384'
      tlsChallenge: {}

VPS Wireguard:

[Interface]
PrivateKey = REDACTED
Address = 10.0.0.1/24
ListenPort = 51820
PostUp = sysctl net.ipv4.ip_forward=1
PostUp = iptables -A FORWARD -i eth0 -o %i -j ACCEPT
PostUp = iptables -A FORWARD -i %i -j ACCEPT
PostUp = iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = sysctl net.ipv4.ip_forward=0
PostDown = iptables -D FORWARD -i eth0 -o %i -j ACCEPT
PostDown = iptables -D FORWARD -i %i -j ACCEPT
PostDown = iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE

[Peer]
PublicKey = REDACTED
AllowedIPs = 10.0.0.0/24, 192.168.10.20/32, 192.168.10.21/32

Peer OPNSENSE

Logs show nothing special.
Homelab traefik.log:

...
2025-01-16T13:31:57Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:207 > Service selected by WRR: 73d51c23f233b85b
2025-01-16T13:31:57Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:207 > Service selected by WRR: 73d51c23f233b85b
2025-01-16T13:31:57Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:207 > Service selected by WRR: 73d51c23f233b85b
2025-01-16T13:31:59Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:207 > Service selected by WRR: d70fffcd3a1800d2
2025-01-16T13:32:01Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:207 > Service selected by WRR: d70fffcd3a1800d2
...

VPS traefik.log:

...
2025-01-16T13:27:27Z ERR Error while handling TCP connection error="readfrom tcp 172.18.0.2:42428->192.168.10.21:443: read tcp 172.18.0.2:443->128.199.10.234:54488: read: connection reset by peer"
2025-01-16T13:27:27Z ERR Error while handling TCP connection error="readfrom tcp 172.18.0.2:42440->192.168.10.21:443: read tcp 172.18.0.2:443->128.199.10.234:54504: read: connection reset by peer"
2025-01-16T13:27:29Z ERR Error while handling TCP connection error="readfrom tcp 172.18.0.2:34552->192.168.10.21:443: read tcp 172.18.0.2:443->128.199.10.234:54542: read: connection reset by peer"
2025-01-16T13:27:30Z ERR Error while handling TCP connection error="readfrom tcp 172.18.0.2:34564->192.168.10.21:443: read tcp 172.18.0.2:443->128.199.10.234:54550: read: connection reset by peer"
2025-01-16T13:27:30Z ERR Error while handling TCP connection error="readfrom tcp 172.18.0.2:34572->192.168.10.21:443: read tcp 172.18.0.2:443->128.199.10.234:54560: read: connection reset by peer"
...

The error IPs are most likely random access tries(?). Not my public IP nor the one from my VPS.

Docker container VPS stats show 50% cpu usage:


but Hetzner VPS console shows more then 100% CPU usage of the machine:

htop stats VPS are more like the docker container stats - ~50% for traefik and 0,X% for other services.

Anyone has a idea what could be wrong? What can I try except upgrading the VPS machine - 2 cores should be more then capable of only routing traffic or not?

Thanks in advance for any hints.

PS. and here my homelab proxmox traefik vm stats:

Enable and check Traefik debug log (doc) and Traefik access log in JSON format (doc).

Do you have more Internet traffic on the server than on your home lab? We run Traefik on multiple Hetzner dedicated and cloud servers and it's working fine. Note that sometimes the CPU 100% is only 1 full CPU utilization, depending on the tool.

I uploaded the redacted debug logs of my vps and homelab traefik to github as they didnt fit into this reply.

No clue where could be the problem but it's good to hear that hetzner server are fine for traefik. :slight_smile:
Another user on reddit suggested that the problem might be because I'm routing and tcp router on the vps through the udp wireguard tunnel.
Could that be the source of that problem?

I'm also routing to port 443 of my homelab - could that cause a certificate problem? I saw a lot of "IP REDACTED-MY-PUBLIC-IP is not in trusted IPs list, ignoring ProxyProtocol Headers and bypass connection"-lines in the traefik-vps.log file.

Thanks for your help. Much appreciated :slight_smile:

In the VPS log there are many error lines, maybe continuously trying to connect to the wrong IP creates high CPU. Is the config correct?

I'm not sure if my config is correct. I managed to make it work but something is definitly off. The main goal of the VPS was to have a DDOS and public IP protection for my web hosted services as I didnt want to use cloudflare anymore.

I tried my best to make it work so every logged IP is correct but that was a lot of trial and error.
To some degree it works but I'm not sure what part is wrong.

Here is the relevant part of the entrypoints if that helps:

Homelab:

entryPoints:
  http:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
          permanent: true
  https:
    address: ":443"
    asDefault: true
    forwardedHeaders:
      trustedIPs:
        - "10.0.0.1/32" # Trusting the VPS wireguard IP
        - "VPS-PUBLIC-IP/32" # Trusting the static public IP of the VPS
    proxyProtocol:
      trustedIPs:
        - "10.0.0.0/24" # Trusting the whole Wireguard net
        - "172.16.0.0/12" # Trusting the Error IP area
    http:
      tls:
        options: modern-tls
        certResolver: "myresolver"
      middlewares:
        - crowdsec

VPS:

entryPoints:
  http:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
          permanent: true
  https:
    address: ":443"
    proxyProtocol:
      trustedIPs:
        - "10.0.0.0/24" # Trusting the whole Wireguard net 

I tried to include the local IP area 172.16.0.0/12 in the proxy Protocol but this error message is still visible:

 Error while terminating TCP connection error="close tcp 172.18.0.2:41992->192.168.10.21:443: use of closed network connection"

or

Error while setting TCP connection deadline error="set tcp 172.18.0.2:39362: use of closed network connection"

I also allowed in my firewalls wireguard interface that the VPS WG IP (10.0.0.1) has access to traefik and all pings went trough without loss.


And this one from the official OPNsense tutorial:

Could additional firewall rules solve that problem?

I just noticed that the IP that throws all these errors is also included in the "WhoAmI" docker container output of one of my domains if that helps:


Red bars redact :

  • my homelab public IP (X-Real-IP and X-Forwarded-For) and
  • the domain name (X-Forwarded-Host and Host).

Edit:
172.18.0.2 ist the docker bridge IP of the traefik VPS.

If you just care about DDoS, have you thought about simply placing a Hetzner Cloud Loadbalancer in front? It works with dedicated and cloud servers.

Thanks for the tip with the Hetzner Cloud Loadbalancer but that doesnt really fit my intended purpose and also means additional costs.

I'm only a student and want to learn as much as possible but am a bit stuck at the moment. After I managed this problem I also want to host some applications on the cloud server like uptime-kuma or searxng for example.

Is my config okay for this purpose, especially the ProxyProtocol part?