Hello,
I'm running AWS EKS 1.32, IPv6. Deploying Traefik ingress controller using the official Helm chart there, version 35.3.0. Values are the default, except for the service annotation to try NLB or ELB. Plus external DNS, cert-manager and a test service, with a test.company.cloud ingress.
My problem is I can't reach the URL. A nmap
shows port 80 443 closed.
Troubleshooting wise, the loadbalancer service seems fine
traefik LoadBalancer fdcb:________::bd1b a2e1f______1d2-651198947.eu-north-1.elb.amazonaws.com 80:32113/TCP,443:32390/TCP 65m
The NLB (or ELB when I remove the annotation) shows up. Its health check are failing, as shown in the screenshot below
About the EKS worker (there's only one for this POC), the security group properly allow both random ports 32XXX in. But once connected in the worker I can't see the random ports 32XXX listening
[root@ip-10-22-43-240 ~]# ss -tlnp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 127.0.0.1:10248 0.0.0.0:* users:(("kubelet",pid=3009,fd=11))
LISTEN 0 4096 127.0.0.1:61679 0.0.0.0:* users:(("aws-k8s-agent",pid=3694,fd=11))
LISTEN 0 4096 127.0.0.1:45481 0.0.0.0:* users:(("containerd",pid=2974,fd=12))
LISTEN 0 4096 169.254.170.23:80 0.0.0.0:* users:(("eks-pod-identit",pid=3485,fd=7))
LISTEN 0 4096 127.0.0.1:2703 0.0.0.0:* users:(("eks-pod-identit",pid=3485,fd=8))
LISTEN 0 4096 127.0.0.1:50052 0.0.0.0:* users:(("controller",pid=3800,fd=10))
LISTEN 0 4096 127.0.0.1:50051 0.0.0.0:* users:(("aws-k8s-agent",pid=3694,fd=10))
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=2871,fd=3))
LISTEN 0 4096 *:61680 *:* users:(("controller",pid=3800,fd=11))
LISTEN 0 4096 *:61678 *:* users:(("aws-k8s-agent",pid=3694,fd=12))
LISTEN 0 4096 *:10256 *:* users:(("kube-proxy",pid=3454,fd=11))
LISTEN 0 4096 *:10249 *:* users:(("kube-proxy",pid=3454,fd=13))
LISTEN 0 4096 *:10250 *:* users:(("kubelet",pid=3009,fd=13))
LISTEN 0 4096 *:8163 *:* users:(("controller",pid=3800,fd=7))
LISTEN 0 4096 *:8162 *:* users:(("controller",pid=3800,fd=9))
LISTEN 0 4096 [fd00:ec2::23]:80 [::]:* users:(("eks-pod-identit",pid=3485,fd=6))
LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=2871,fd=4))
Looking at the traefik pod logs (set to json / DEBUG), I see enough mention of my test to guess that Traefik seems to be taking in account the ingress
traefik-mhvqc traefik {"level":"debug","providerName":"kubernetes","config":{"http":{"routers":{"test-test-service-test-private-ingress-company-cloud":{"service":"test-test-service-80","rule":"Host(`test.company.cloud`) \u0026\u0026 PathPrefix(`/`)"}},"services":{"test-test-service-80":{"loadBalancer":{"servers":[{"url":"http://[2a05:_______:1096::3]:80"}],"strategy":"wrr","passHostHeader":true,"responseForwarding":{"flushInterval":"100ms"}}}}},"tcp":{},"udp":{},"tls":{}},"time":"2025-05-27T15:23:23Z","caller":"github.com/traefik/traefik/v3/pkg/server/configurationwatcher.go:227","message":"Configuration received"}
traefik-mhvqc traefik {"level":"debug","routerName":"test-test-service-test-private-ingress-company-cloud","entryPointName":["metrics","web","websecure"],"time":"2025-05-27T15:23:24Z","caller":"github.com/traefik/traefik/v3/pkg/server/aggregator.go:52","message":"No entryPoint defined for this router, using the default one(s) instead"}
traefik-mhvqc traefik {"level":"debug","time":"2025-05-27T15:23:24Z","caller":"github.com/traefik/traefik/v3/pkg/tls/certificate.go:132","message":"Adding certificate for domain(s) test.company.cloud"}
traefik-mhvqc traefik {"level":"debug","entryPointName":"metrics","routerName":"test-test-service-test-private-ingress-company-cloud@kubernetes","serviceName":"test-test-service-80@kubernetes","time":"2025-05-27T15:23:24Z","caller":"github.com/traefik/traefik/v3/pkg/server/service/service.go:320","message":"Creating load-balancer"}
traefik-mhvqc traefik {"level":"debug","entryPointName":"metrics","routerName":"test-test-service-test-private-ingress-company-cloud@kubernetes","serviceName":"test-test-service-80@kubernetes","serverIndex":0,"URL":"http://[2a05:_______:1096::3]:80","time":"2025-05-27T15:23:24Z","caller":"github.com/traefik/traefik/v3/pkg/server/service/service.go:363","message":"Creating server"}
traefik-mhvqc traefik {"level":"debug","entryPointName":"metrics","routerName":"test-test-service-test-private-ingress-company-cloud@kubernetes","serviceName":"test-test-service-80@kubernetes","middlewareName":"metrics-service","middlewareType":"Metrics","time":"2025-05-27T15:23:24Z","caller":"github.com/traefik/traefik/v3/pkg/middlewares/metrics/metrics.go:82","message":"Creating middleware"}
traefik-mhvqc traefik {"level":"debug","entryPointName":"metrics","routerName":"test-test-service-test-private-ingress-company-cloud@kubernetes","serviceName":"test-test-service-80@kubernetes","middlewareName":"metrics-service","time":"2025-05-27T15:23:24Z","caller":"github.com/traefik/traefik/v3/pkg/middlewares/observability/middleware.go:33","message":"Adding tracing to middleware"}
traefik-mhvqc traefik {"level":"debug","entryPointName":"websecure","time":"2025-05-27T15:23:24Z","caller":"github.com/traefik/traefik/v3/pkg/server/router/tcp/manager.go:237","message":"Adding route for test.company.cloud with TLS options default"}
Next, long story short, I've switched all the seemingly relevant options of the chart on & off (preferdualstack, ipfamilly=ipv4,ipv6, ...), nothing. To validate whether the cluster is functional or not, I removed Traefik IG & tried AWS load balancer, which worked (though I don't want to use it because ingress means ALB means $$$).
I'm at a loss here, what can I try to narrow down the root cause, which setting to add or remove ... ? Thanks for any help