10s delays in forwarding requests

Hey,

My team is using traefik:v2.2.1 and we observe unusual behavior in our end-to-end tests. Sometimes requests are not forwarded for 10s.

What we see in logs:

172.23.0.1 - - [09/Mar/2021:16:12:31 +0000] "POST /staff?operationNames=GetChrome HTTP/1.1" 200 379 "-" "-" 2724 "gg@file" "http://gg:5000/" 10046ms

What we observe in the downstream service (gg) is that it is hit with the request exactly 10s later than it should. Not at 16:12:31 but at 16:12:41 . The request processing took a few miliseconds, so I can imagine that the whole roundtrip took 46ms, and the 10000ms is some artificial delay...

Around 5 months ago we observed a very similar behavior but the requests were stuck even way longer. We worked around it by changing log level from DEBUG to ERROR. Reading the logs I even managed to asses that the problem must lie in some code here oxy/rr.go at master · vulcand/oxy · GitHub but I could not find anything obvious to nail the bug. It seemed like due to some kind of deadlock/timeout or a similar mechanism in either traefik or go itself, everything stops for some requests for 10s.

Did any of you observe any similar behavior in the past? Are there any known issues related to DEBUG log-level?

Hey Robert,

Would it possible to upgrade Traefik to the latest version 2.4.7 and validate whether the issue still occurred?

@jakubhajek We are in the process of doing this.

In the meantime, I think we found a potential candidate:

It seems that just too much logs causes these issues in docker-compose, and the problem is not specific to traefik. But it might be increased by using DEBUG log level which is quite verbose indeed.

Upgrading docker-compose to version >= 1.28.5 fixed the issue for us.

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.