Good afternoon,
I'm deploying Traefik 2.9.6 from the official helm chart here: GitHub - traefik/traefik-helm-chart: Traefik v2 helm chart and running into an issue related to load balancer target group registration. I'm using the annotation service.beta.kubernetes.io/aws-load-balancer-type: nlb
to deploy this as a Network Load Balancer
In order to preserve the original client IP I understand that I need to set externalTrafficPolicy: Local
on the service spec. I'm setting this from values.yaml and this works: the NLB is configured, target registered, traffic hits my service, and I'm able to successfully whitelist based on client IP.
However if I terminate the EC2 instance (EKS Managed Node Group) when my Auto Scaling Group recreates the node as a new instance it is never registered as a target in the NLB.
With externalTrafficPolicy: Cluster
this appears to work as expected and the new node is registered in the target group as soon as it comes up, but then the whitelisting doesn't work.
I've looked through Troubleshoot unhealthy targets for NLB in Amazon EKS and checked that the kube-proxy has been patched with the --hostname-override
flag, so that's not the issue.
FWIW I don't think this is a Traefik issue exactly as I saw this same behavior with ingress-nginx. I've opened a ticket with AWS support, but I'm trying to cast a wide net as this is issue is a blocker for my progress on this configuration.
Any idea what might cause this behavior?
Edit with additional detail: when switching between Local
and Cluster
the target groups are recreated and have different health checks configured. Cluster
gives a TCP health check configured for "Traffic port" while Local
configures an HTTP health check for /healthz
. Both of these seem to succeed when changing the externalTrafficPolicy
back and forth.