Using K8 Rollout strategy and using it with Helm ( or by itself), I get a small "blip" of 502 bad gateway. We really need 0 downtime deployments. We have tried the following:
- Thinking Traefik needs time to pick up the new rolling out pods, lengthed the "pauseTime" to give Traefik more time to recognize them
- Using a higher maxSurge
Is there something we are missing here? Our test rig send a request every .5 seconds and then we do a helm upgrade/rollout. That's how we're seeing the blip or a 502. I have some other thoughts:
- Maybe a pod is being taken down in the middle of a request?
- Traefik looks at the service, so perhaps there's something at the K8 level where I need to either deploy a new service as the current service may not know those pods are being taken out of rotation?