Getting 503 when Deployment scales up

Hi,

I’m using Traefik to load balance HTTP requests to Clickhouse, which is scaled up or down using Deployment (Using stateless CH, only querying S3).

I’m noticing an odd behavior which I can’t explain: when the Deployment is scaled up (using Keda), I’m getting 503s (a few thousands), for periods up to 8 minutes (way after the scale up was completed).

From my research, Clickhouse is not designed to return 503 on /query endpoint I use, so it can’t be it.

Worth noting that it’s not immediate. We can get 503 on request duration of a few minutes, which is why it is even more surprising.

503 usually means there are no backends available. I wish I could see a metrics from Traefik exporting how many backends it sees from its end.

I was wondering if anyone has experienced this or something similar?

Enable Traefic dashboard and Traefik access log in JSON format for more details.