Hello Traefik followers,
I need your swarm knowledge today and hope you can help me a bit.
I use Leaflet to display 2D maps in an app. Similar to Google Maps, an image is split into small tiles in several zoom levels. If you zoom in and pan back and forth, more tiles are dynamically reloaded. Since we switched to Traefik, this no longer works properly. Everything works initially, but after about 3000 requests, not all tiles are loaded. Chrome Developer Network shows that there are several open connections in the "pending" state. After that, nothing really works. If I restart Traefik, it works again immediately.
From the structure:
Client > Traefik > NodeJS (working as HTTP to S3 public gateway) > S3 (Minio)
In the traefik log at level DEBUG I see the following error messages:
traefik | time="2023-06-05T15:05:30Z" level=debug msg="Request has been aborted [my.ip.add.res:51420 - /projects/2/maps/tiles/1047679055122BXZQpCoHAdg5i3s8yKx0YLqaS4MDh9nFktIGOuUEm.pdf/3/5/1.png]: net/http: abort Handler" middlewareName=traefik-internal-recovery middlewareType=Recovery
traefik | time="2023-06-05T15:05:31Z" level=debug msg="Request has been aborted [my.ip.add.res:51420 - /projects/2/maps/tiles/1047679055122BXZQpCoHAdg5i3s8yKx0YLqaS4MDh9nFktIGOuUEm.pdf/2/1/0.png]: net/http: abort Handler" middlewareName=traefik-internal-recovery middlewareType=Recovery
traefik | time="2023-06-05T15:05:31Z" level=debug msg="Request has been aborted [my.ip.add.res:51420 - /projects/2/maps/tiles/1047679055122BXZQpCoHAdg5i3s8yKx0YLqaS4MDh9nFktIGOuUEm.pdf/2/0/0.png]: net/http: abort Handler" middlewareName=traefik-internal-recovery middlewareType=Recovery
traefik | time="2023-06-05T15:05:31Z" level=debug msg="Request has been aborted [my.ip.add.res:51420 - /projects/2/maps/tiles/1047679055122BXZQpCoHAdg5i3s8yKx0YLqaS4MDh9nFktIGOuUEm.pdf/2/2/0.png]: net/http: abort Handler" middlewareName=traefik-internal-recovery middlewareType=Recovery
traefik | time="2023-06-05T15:05:31Z" level=debug msg="Request has been aborted [my.ip.add.res:51420 - /projects/2/maps/tiles/1047679055122BXZQpCoHAdg5i3s8yKx0YLqaS4MDh9nFktIGOuUEm.pdf/2/0/1.png]: net/http: abort Handler" middlewareName=traefik-internal-recovery middlewareType=Recovery
My attempts and findings:
Traefik restart solves the problem in short term
Access from other computers or Chrome Incognito, for example, continues to work independently initially
If I stop Traefik and start nginx, this problem does not occure
The problem occurs with both Traefik binary and Docker
It doesn't matter whether NodeJS is addressed with http or https (with 1.1 or h2)
When I open NodeJS directly - without Traefik - the problem does not occur
If I force Chrome to use HTTP 1.1 instead of H2, the problem does not occur
I tried Traefik 3.0 beta 2 without change
Do you have an explanation or clue about these error messages?
Just an update: I've removed H2/HTTP2 support in traefik code and tried my compiled version which works fine. Any ideas?
The source of the problem is that with HTTP/2, the browser can now send all HTTP requests concurrently over a single connection.
In most cases the backend servers cannot handle to many concurrent requests and this leads to pending requests.
You can customize the number of concurrent request according to the backend servers performance using the
http2.maxconcurrentstreams static parameter:
I am having a similar problem as described and changing the maximum concurrent streams as suggested by @ldaroczi does not resolve it.
I have a somewhat similar structure for serving tiles:
Client > Traefik > C# .Net > Mapcache > Mapserver
I am using the Docker version of Traefik as a reverse proxy for all my running containers.
Here is a list of my findings besides the ones described in the original post:
It only occurs in browsers based on the Chromium. I have not been able to reproduce the error state in Firefox. It shows up in both Chrome and Edge.
It does not occur when using HTTP instead of HTTPS. Indicating it has something to do with TLS.
According to the Traefik and Container logs the request does get to the container and a response is send back. However the browser ignores it and the request stays in the "pending state".
Any help to resolve this problem would be appreciated.
Even if I'm sorry for you, I'm happy that I'm not the only one who feels this problem. In the meantime I have given up the analysis, from my point of view the cause is clearly identified related to HTTP2. Other reasons could be excluded if we use an NGINX container instead of Traefik.
My approach, which I'm currently using very well:
- Download Traefik source code
- Disable HTTP2
- Create your own Traefik binary (really easy)
- Exchange the file in the Docker container image
This solves my problem stably since weeks.
BTW: The same problem occurs in Traefik v3.
I suspect your MapServer sounds similar to leaflet = many requests that are started and may be canceled on the client side.
I ran into an issue expressing itself in the same way recently (requests get through to the API container but chrome never accepts or gets their answer) and seem to have solved it thanks to this thread. Takeaways from my side:
- You do not need to re-compile traefik as long as all your traffic goes through TLS connections and comes from clients supporting ALPN. Traefik allows you to limit the usable protocols for these connections. Traefik TLS Documentation - Traefik
- TLS and HTTP/2 may be hard to separate in looking for errors, as both Google and Mozilla only use HTTP/2 when connecting via HTTPS, so using an insecure connection will also not use HTTP/2 for their browsers. How To Set Up Nginx with HTTP/2 Support on Ubuntu 20.04 | DigitalOcean (@frevi on this one in particular)
maxconcurrentstreams: 50 solution did not work initially, so I pivoted to disabling HTTP/2 for the time being and may investigate further later on.