Extreme Connection Delays from k8s Node to Load Balancer Address

alevan · December 13, 2022, 3:10am

I'm running into issues with my kubernetes cluster and traefik v2.6.7. I've deployed traefik using the helm chart at traefik/traefik, and everything appears to be functional.
Each node has entry points established with the following definitions:

udp-dns:
    port: 5053
    expose: true
    exposedPort: 53
    protocol: UDP
tcp-dns:
  port: 5054
  expose: true
  exposedPort: 53
  protocol: TCP
udp-dhcp:
  port: 6067
  expose: true
  exposedPort: 67
  protocol: UDP
postgres-tcp:
  port: 5432
  expose: true
  exposedPort: 5432
  protocol: TCP
docker-tcp:
  port: 5001
  expose: true
  exposedPort: 5000
  protocol: TCP
mongo-tcp:
  port: 27017
  expose: true
  exposedPort: 27017
  protocol: TCP

in addition to the standard ports at 9000, 443, and 80.
Inside my cluster I run a private docker registry that's accessible through the 5000 entry point, as well as pihole DNS, postgres, and mongo.

Each node has no trouble making ping requests to external IPs and domains, as well as to the internal ips and domain I have set for the main node in the cluster, i.e. I can call ping 192.168.86.x and ping foo.bar without issues. Additionally, making curl requests from each node to external domains and IPs returns data as expected. However, when trying to make curl requests to the internal domain OR the internal IP I have the main traefik pod running on, all requests are significantly delayed connecting to the cluster resource:

ubuntu@pi-4:~$ curl -w "@curl-format.txt" -o /dev/null -s "foo.bar:5000/v2/_catalog"
     time_namelookup:  0.005189s
        time_connect:  64.073052s
     time_appconnect:  0.000000s
    time_pretransfer:  64.073306s
       time_redirect:  0.000000s
  time_starttransfer:  64.087641s
                     ----------
          time_total:  64.087965s

This wouldn't be an issue, as the requests eventually complete, but it leads to issues spinning up new pods from internally hosted resources, as k8s times out rather quickly.

Outside of the cluster, I have no issues making these requests. Additionally, if I make the same request on the node that traefik is running on, I don't have any issues connecting to the resource. Removing some of the additional entry points, especially the ones associated with pihole, doesn't appear to solve the problem, nor does creating replicas of the traefik pod on each node. I'll add a config for the non-working node at the bottom, but would appreciate any help or tips anyone could provide! Thanks you!

Name:               pi-4
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    egress.k3s.io/cluster=true
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=pi-4
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=k3s
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"62:57:59:07:96:5d"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.86.171
                    k3s.io/hostname: pi-4
                    k3s.io/internal-ip: 192.168.86.171
                    k3s.io/node-args: ["agent","--kubelet-arg","runtime-request-timeout=15m"]
                    k3s.io/node-config-hash: XXXXXXXXXXXXXXXXXXXXXXXXX
                    k3s.io/node-env:
                      {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/88a03ff6265bd26f61b13d1c2c6b1e5d85e4a589f76ffd3ab292814246f23360","K3S_TOKEN":"********","K3S_U...
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 08 Dec 2022 18:28:17 -0500
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  pi-4
  AcquireTime:     <unset>
  RenewTime:       Mon, 12 Dec 2022 22:07:27 -0500
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 12 Dec 2022 22:06:34 -0500   Mon, 12 Dec 2022 21:35:49 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Mon, 12 Dec 2022 22:06:34 -0500   Mon, 12 Dec 2022 21:35:49 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Mon, 12 Dec 2022 22:06:34 -0500   Mon, 12 Dec 2022 21:35:49 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 12 Dec 2022 22:06:34 -0500   Mon, 12 Dec 2022 21:41:03 -0500   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.86.171
  Hostname:    pi-4
Capacity:
  cpu:                4
  ephemeral-storage:  122755656Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             7993604Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  119416702064
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             7993604Ki
  pods:               110
System Info:
  Machine ID:                 1b6d104ba836404f9bf37f2b4c201c69
  System UUID:                1b6d104ba836404f9bf37f2b4c201c69
  Boot ID:                    92d93b85-9849-492c-8840-38be8481fa8b
  Kernel Version:             5.19.0-1009-raspi
  OS Image:                   Ubuntu 22.10
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  containerd://1.6.8-k3s1
  Kubelet Version:            v1.25.4+k3s1
  Kube-Proxy Version:         v1.25.4+k3s1
PodCIDR:                      10.42.2.0/24
PodCIDRs:                     10.42.2.0/24
ProviderID:                   k3s://pi-4
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                ------------  ----------  ---------------  -------------  ---
  kube-system                 svclb-traefik-beb295ff-hpmmw                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         38m
  horizons                    horizons-api-6d6dc5cf47-kjz6j                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         4h11m
  gitlab-agent                water-bowl-camera-gitlab-agent-6d48d89479-9vkjj     0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d5h
  gitlab-agent                horizons-api-gitlab-agent-7dc4948467-ftrc7          0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d5h
  gitlab-agent                horizons-client-gitlab-agent-85686c7847-46wxb       0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d5h
  gitlab-agent                horizons-orrery-gitlab-agent-559b8fbb7f-q24cn       0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d5h
  gitlab-runner               water-bowl-camera-gitlab-runner-6b55cc585b-tjs2c    0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d5h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests  Limits
  --------           --------  ------
  cpu                0 (0%)    0 (0%)
  memory             0 (0%)    0 (0%)
  ephemeral-storage  0 (0%)    0 (0%)
  hugepages-1Gi      0 (0%)    0 (0%)
  hugepages-2Mi      0 (0%)    0 (0%)
  hugepages-32Mi     0 (0%)    0 (0%)
  hugepages-64Ki     0 (0%)    0 (0%)
Events:
  Type     Reason                   Age                From        Message
  ----     ------                   ----               ----        -------
  Normal   Starting                 26m                kube-proxy
  Normal   Starting                 31m                kube-proxy
  Warning  InvalidDiskCapacity      31m                kubelet     invalid capacity 0 on image filesystem
  Normal   Starting                 31m                kubelet     Starting kubelet.
  Normal   NodeHasNoDiskPressure    31m (x2 over 31m)  kubelet     Node pi-4 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     31m (x2 over 31m)  kubelet     Node pi-4 status is now: NodeHasSufficientPID
  Normal   NodeNotReady             31m                kubelet     Node pi-4 status is now: NodeNotReady
  Normal   NodeAllocatableEnforced  31m                kubelet     Updated Node Allocatable limit across pods
  Normal   NodeReady                31m                kubelet     Node pi-4 status is now: NodeReady
  Normal   NodeHasSufficientMemory  31m (x2 over 31m)  kubelet     Node pi-4 status is now: NodeHasSufficientMemory
  Normal   Starting                 26m                kubelet     Starting kubelet.
  Warning  InvalidDiskCapacity      26m                kubelet     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  26m                kubelet     Node pi-4 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    26m                kubelet     Node pi-4 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     26m                kubelet     Node pi-4 status is now: NodeHasSufficientPID
  Normal   NodeNotReady             26m                kubelet     Node pi-4 status is now: NodeNotReady
  Normal   NodeAllocatableEnforced  26m                kubelet     Updated Node Allocatable limit across pods
  Normal   NodeReady                26m                kubelet     Node pi-4 status is now: NodeReady

Topic		Replies	Views
Traefik Ingress times out when connecting from another host, works fine from localhost Traefik v1 kubernetes-ingress	1	966	March 10, 2020
Traefik is not routing one node Traefik v2 kubernetes-ingress	0	369	August 31, 2021
Traefik 1.7.12 ingress requests ending with a HTTP 504 gateway timeout Traefik v1 kubernetes-ingress	14	8593	July 9, 2019
Traefik an Kubernetes in cluster VPN Traefik v1 kubernetes-ingress	9	4118	October 3, 2019
Traefik as a loabalancer before a K8s cluster with traefik as ingress controler Traefik v2 file , kubernetes-ingress , tcp	2	1385	June 26, 2020

Extreme Connection Delays from k8s Node to Load Balancer Address

Related topics