Hey,
My team is using traefik:v2.2.1 and we observe unusual behavior in our end-to-end tests. Sometimes requests are not forwarded for 10s.
What we see in logs:
172.23.0.1 - - [09/Mar/2021:16:12:31 +0000] "POST /staff?operationNames=GetChrome HTTP/1.1" 200 379 "-" "-" 2724 "gg@file" "http://gg:5000/" 10046ms
What we observe in the downstream service (gg) is that it is hit with the request exactly 10s later than it should. Not at 16:12:31 but at 16:12:41 . The request processing took a few miliseconds, so I can imagine that the whole roundtrip took 46ms, and the 10000ms is some artificial delay...
Around 5 months ago we observed a very similar behavior but the requests were stuck even way longer. We worked around it by changing log level from DEBUG to ERROR. Reading the logs I even managed to asses that the problem must lie in some code here oxy/rr.go at master · vulcand/oxy · GitHub but I could not find anything obvious to nail the bug. It seemed like due to some kind of deadlock/timeout or a similar mechanism in either traefik or go itself, everything stops for some requests for 10s.
Did any of you observe any similar behavior in the past? Are there any known issues related to DEBUG log-level?