gRPC proxying issues only when there are multiple service instances

gautamg · February 20, 2023, 5:18am

Hi,
I'm running a gRPC service in Nomad (on Docker) with Consul for discovery. I'm using Traefik to proxy requests to the service over h2c (Traefik itself is behind an AWS NLB that performs TLS termination, so it's unencrypted http2 coming in to Traefik).

When running just one instance of the service, everything works great -- no issues with unary RPCs, streaming, etc. However as soon as there are multiple instances (say, n=3), clients see only 1/n requests succeed — the rest eventually time out and get a 504 from Traefik as if the upstream connection timed out.
From what I can tell, the successful requests are going to the same service instance every time. If I restart the gRPC client (causing it to re-connect to Traefik), it'll be a different instance that receives the 1/3 successful requests.

It seems like the gRPC clients are opening an HTTP2 "session" of some sort to one of the 3 service instances, then future RPC requests are getting round-robin'd by Traefik such that the other two instances never receive/respond to the request. Without knowing much about HTTP2, I'd still expect that a client with a persistent HTTP2 connection that is being kept alive will continue to talk to the same backend until its HTTP2 connection closes.

Any workarounds here?

Topic		Replies	Views
Issues with gRPC and multiple instances Traefik v2 file	8	2451	February 27, 2020
Grpc bidirectional streaming not working Traefik v2 consul-catalog	0	751	June 26, 2020
gRPC service not accessible from Traefik Traefik v2 docker	7	1185	November 10, 2022
Traefik has trouble with server-streaming gRPC Traefik v2 kubernetes-ingress	4	1900	May 12, 2020
Grpc H2 with consulcatalog provider (traefik:v1.7) Traefik v1 consul-catalog	4	515	October 25, 2019

gRPC proxying issues only when there are multiple service instances

Related topics