Traefik has trouble with server-streaming gRPC

Julien · April 29, 2020, 12:43pm

Hi, community,

First of all, thank you to all Traefik devs, it's really pleasant to work with Traefik.
Let me draw a bit the context:

Traefik tested version 2.0, 2.1, 2.2
We use Traefik as a Kubernetes Ingress controller
Our Kubernetes flavor is AWS EKS version 1.14

Now the issue we are facing:
Traefik has trouble handling multiple consecutive server-streaming gRPC calls in a single channel. More precisely, when the client cancels a call and starts a new one, Traefik sometimes fails to propagate the cancel to the backend, leaving the call open on the backend and running forever.

To avoid false-positive we removed Traefik from the route, and the issue disappeared.
We also used h2c protocol as mentioned here: https://docs.traefik.io/user-guides/grpc/

If someone has a starting point for this problem, it will be appreciated.
I wish you all a very good day.
Best regards,
Julien.

notsureifkevin · April 29, 2020, 2:42pm

Julien

Would you happen to have an sample application and configuration that could be used to replicate this issue? Some items that would be useful for our team to investigate would be:

Client / Server application and deployment manifest
Instructions on how to replicate the issue via the Client
Steps to observe the issue on the server, such as metrics or tracing

I realize that's quite a bit of an ask, but your help is greatly appreciated in helping us track this down.

Julien · May 6, 2020, 7:45am

Ok, here we are. One of our developer just create this repo to enlighten this issue:

If we run just few connection everything is working as expected, but if we increase the amount we raise this issue regarding gRPC and Traefik.

Also, we localy did some test with and without Traefik, and we was not able to reproduce when Traefik is not on the path.

If you have some questions, feel free to ask.

notsureifkevin · May 11, 2020, 2:44pm

@Julien Thank you for the update and reproducible build. I've confirmed what appears to be a bug and opened an issue on the official Treafik repo here: https://github.com/containous/traefik/issues/6791

Julien · May 12, 2020, 4:47am

Good morning @notsureifkevin. Glad to help improving Traefik.
I look forward to deploy a patched version

By the way, If you need us to test a patch, just let me know.

Topic		Replies	Views
TCP router stops working after upstream server crash? Traefik v2 kubernetes-crd , kubernetes-ingress , tcp	2	894	March 10, 2021
GRPC not working with Traefik Traefik v2 kubernetes-crd	1	1060	October 12, 2023
Implement grpc server in kubernetes with Traefik as ingress and SSL termination point Traefik v2 kubernetes-ingress	0	348	November 28, 2023
gRPC number of concurrent stream block at 250 Traefik v2 kubernetes-ingress , tcp	0	530	February 11, 2022
gRPC Setup for K3s Traefik v2 kubernetes-ingress	1	717	August 11, 2023

Traefik has trouble with server-streaming gRPC

Related topics