Hey!
Been using traefik for around 2 years now, does a great job.
We have had 1 minor issue for atleast a year now that we have looked the other way, but figured i would see if theres a thread or link i have missed or any debugging guidance?
When we restart the deployment or update the helm which rolls an update of the deployment we get a blip of 502 as the traefik pod you are routing through gets shut down.
We have tried running multiple replicas or single replicas (where it keeps the old one up until the new one is ready), we have added min ready time to the pod so that there is ~30 seconds of being alive before it even starts to try and kill the old pod just incase. but as soon as the old pod is removed the first refresh of an app will have cloudfront return a 502 and then the second refresh (talking about refreshing asap after the first fail) it will be fine again.
Not a huge issue just causes false positives to canaries and im sure users notice from time to time.
Below is any context info i can think of to help picture our setup
We have traefik installed via helm, running 34.1.0 of the helmchart, so traefik docker.io/traefik:v3.3.2
, its running in AWS and flow is via
app domain (app.dev.DOMAIN) -> Cloudfront -> Traefik Load balancer DNS (alb.dev.DOMAIN) -> Kube Ingress -> traefik routers kick in here on app domain (app.dev.DOMAIN)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/backend-protocol: HTTPS
alb.ingress.kubernetes.io/certificate-arn: >-
arn:aws:acm:ap-region:12345:certificate/12345
alb.ingress.kubernetes.io/group.name: traefik-alb-external
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443, "HTTP":80}]'
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/shield-advanced-protection: 'true'
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/tags: environment=development
alb.ingress.kubernetes.io/target-type: ip
external-dns.alpha.kubernetes.io/hostname: alb.dev.DOMAIN
external-dns.alpha.kubernetes.io/ingress-hostname-source: annotation-only
external-dns.alpha.kubernetes.io/type: public
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"annotations":{"alb.ingress.kubernetes.io/backend-protocol":"HTTPS","alb.ingress.kubernetes.io/certificate-arn":"arn:aws:acm:ap-southeast-2:757724678808:certificate/8f963158-fdc5-478a-8099-67a85d4cc28e","alb.ingress.kubernetes.io/group.name":"traefik-alb-external","alb.ingress.kubernetes.io/listen-ports":"[{\"HTTPS\":443,
\"HTTP\":80}]","alb.ingress.kubernetes.io/load-balancer-attributes":"idle_timeout.timeout_seconds=4000","alb.ingress.kubernetes.io/scheme":"internet-facing","alb.ingress.kubernetes.io/shield-advanced-protection":"true","alb.ingress.kubernetes.io/ssl-redirect":"443","alb.ingress.kubernetes.io/tags":"environment=development","alb.ingress.kubernetes.io/target-type":"ip","external-dns.alpha.kubernetes.io/hostname":"alb.dev.DOMAIN","external-dns.alpha.kubernetes.io/ingress-hostname-source":"annotation-only","external-dns.alpha.kubernetes.io/type":"public","kubernetes.io/ingress.class":"default"},"labels":{"argocd.argoproj.io/instance":"traefik-development"},"name":"traefik-external","namespace":"traefik"},"spec":{"rules":[{"http":{"paths":[{"backend":{"service":{"name":"traefik","port":{"name":"websecure"}}},"path":"/*","pathType":"ImplementationSpecific"}]}}]}}
kubernetes.io/ingress.class: default
meta.helm.sh/release-name: traefik
meta.helm.sh/release-namespace: traefik
creationTimestamp: '2024-02-13T06:57:01Z'
finalizers:
- group.ingress.k8s.aws/traefik-alb-external
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
argocd.argoproj.io/instance: traefik-development
name: traefik-external
namespace: traefik
resourceVersion: '584241109'
uid: ea56d5d7-85fd-4884-96f5-7383e2c1740f
spec:
rules:
- http:
paths:
- backend:
service:
name: traefik
port:
name: websecure
path: /*
pathType: ImplementationSpecific
status:
loadBalancer:
ingress:
- hostname: >-
k8s-traefikalbexterna-1234.ap-region.elb.amazonaws.com
incase it matters, we manually set which elb as we have two setup one directs traffic over to a subnet only accessible by our vpn and cluster subnets, and the above one which is used for public traffic
- args:
- '--api.insecure=true'
- '--serverstransport.insecureskipverify=true'
- '--entryPoints.metrics.address=:9100/tcp'
- '--entryPoints.traefik.address=:8080/tcp'
- '--entryPoints.web.address=:8000/tcp'
- '--entryPoints.websecure.address=:8443/tcp'
- '--api.dashboard=true'
- '--ping=true'
- '--metrics.prometheus=true'
- '--metrics.prometheus.entrypoint=metrics'
- '--providers.kubernetescrd'
- '--providers.kubernetescrd.ingressClass=traefik'
- '--providers.kubernetescrd.allowCrossNamespace=true'
- '--providers.kubernetescrd.allowExternalNameServices=true'
- '--providers.kubernetescrd.allowEmptyServices=false'
- '--providers.kubernetesingress'
- '--providers.kubernetesingress.allowExternalNameServices=true'
- '--providers.kubernetesingress.allowEmptyServices=false'
- >-
--providers.kubernetesingress.ingressendpoint.publishedservice=traefik/traefik
- '--providers.kubernetesingress.ingressClass=traefik'
- '--entryPoints.websecure.http.tls=true'
- '--log.level=ERROR'
- '--accesslog=true'
- '--accesslog.fields.defaultmode=keep'
- '--accesslog.fields.headers.defaultmode=drop'
- '--providers.file.filename=/config/dynamic.yaml'
- >-
--providers.kubernetesingress.ingressEndpoint.hostname=alb.dev.DOMAIN
dynamic.yaml: |
http:
routers:
catchall:
# attached only to web entryPoint
entryPoints:
- 'websecure'
# catchall rule
rule: 'PathPrefix(`/`)'
service: unavailable
# lowest possible priority
# evaluated when no other router is matched
priority: 1
services:
# Service that will always answer a 503 Service Unavailable response
unavailable:
loadBalancer:
servers: {}
any help would be appreciated.
Thanks!