Traefik Stops Processing Requests (Hangs) Leading to OOM

kh019267 · February 2, 2021, 9:28pm

We are periodically experiencing a major service disruption related to an issue where Traefik isn't resilient to a downstream application server failure. We are currently running Traefik version 1.6 within our container platform running in AWS. Whenever there is an underlying AWS EC2 failure on a downstream instance not hosting Traefik (hosting a container instance Traefik is forwarding to), we see Traefik stop forwarding all requests to all backend instances even though only 1 out of 200+ servers failed.

Here are a few observations that occur:

The only error/message traefik throws when this happens is the following:

2021/02/02 18:10:18 reverseproxy.go:321: httputil: ReverseProxy read error during body copy: http2: server sent GOAWAY and closed the connection; LastStreamID=1999, ErrCode=NO_ERROR, debug=""

Traefik "ping" health checks still return an HTTP 200 status code, so our container platform doesn't automatically restart the Traefik service.
We have a service that rate limits backend member connections (Traefik returns a HTTP 429), and Traefik starts to return the HTTP 429s for those requests during the event, which confirms the requests are at least making it to the Traefik frontend.
All other requests just hang. No response code is returned, and the request connection isn't refused/reset.
Memory usage eventually becomes exhausted as the number of hung connections build
Restarting Traefik resolves the issue

Appreciate any help in-advance to help figure out why Traefik hangs and stops routing requests to all other available and healthy instances. Any pointers to remediate or reduce the time of impact would be much appreciated. Thanks!

Topic		Replies	Views
Traefik hangs indefinitely on some http requests when running in docker Traefik v3 (latest) docker , rest-api	2	142	April 10, 2025
Traefik sometimes randomly stops reacting Traefik v3 (latest) docker	6	558	January 21, 2025
Traefik hanging until restart Traefik v1 docker , docker-swarm	0	578	June 9, 2020
Suddenly getting 503 for all HTTP providors - Restarting Traefik restores services Traefik v2 file	1	893	January 9, 2021
Traefik 2.11.2 crashing due to OOM Traefik v2 kubernetes-ingress	4	225	October 24, 2024

Traefik Stops Processing Requests (Hangs) Leading to OOM

Related topics