Hello everybody,
I am trying to find reason for intermittent 503s from my application from past few days. I've checked graceful shutdown of my golang application, it's fine. Even at time of deployments or pod rotations, we don't get 503s. But intermittently when some pod goes down due to cluster autoscaler, we see 503s.
We're using below metric to monitor 503:
sum(label_replace(increase(traefik_router_requests_total{kubernetes_namespace="traefik-v2",job=~".*", code=~"5.3", router=~"deploymentName(-[a-f0-9]+@kubernetescrd)"}[5m]), "router_name", "$1", "router", "([a-zA-Z0-9-]+)(-[a-f0-9]+@kubernetescrd)")) by (router, code, router_name)
I wanted to check if it's possible to get pod name which is giving 503 by above metric or using some other traefik metric.