Hi,
I’m using Traefik to load balance HTTP requests to Clickhouse, which is scaled up or down using Deployment (Using stateless CH, only querying S3).
I’m noticing an odd behavior which I can’t explain: when the Deployment is scaled up (using Keda), I’m getting 503s (a few thousands), for periods up to 8 minutes (way after the scale up was completed).
From my research, Clickhouse is not designed to return 503 on /query endpoint I use, so it can’t be it.
Worth noting that it’s not immediate. We can get 503 on request duration of a few minutes, which is why it is even more surprising.
503 usually means there are no backends available. I wish I could see a metrics from Traefik exporting how many backends it sees from its end.
I was wondering if anyone has experienced this or something similar?