We have multiple containers serving backend api (django) for a frontend (angular).
This has been working with traefik in between for about a month but since early this week we started having a problem.
If I remain on frontend and do not access the backend services directly the frontend will (for the most part) remain stable.
If I CTRL+F5 to refresh the backend (stage-api), I may get a 200 OK after a few seconds. Then going back to the frontend (stage) I now see 404 when refreshing (it flip flops pretty consistently) -- though sometimes we may see the 404 when accessing all containers. Simultaneously while getting 404's in Chrome, Firefox may be working fine for a while. Eventually the issue is seen in either browser.
The HTTP request does not seem to make it to the containers when we get the 404, so we can confirm that it's coming from traefik, but we're not seeing anything in the traefik access log that would seem to help.
I suspect it has something to do with session management. Specifically because:
I can have three tabs open accessing different resources behind traefik, two tabs were returning 404 on ctrl+f5, the third was working fine. Closing Chrome completely or switching to Firefox will typically alleviate the issue until it pops up again shortly after.
My thinking is that traefik is binding to the client ip:port to establish a session with one particular backend server.
Since all things work intermittently, we're at a loss to understand where the break actually is. Are there any logs that would clearly indicate why traefik is sending a 404 response--we enabled debug logging but see nothing new but a generic 404 returning from traefik.
Downgraded from v2.2.2 -> v2.2.1 and everything works fine with the same configuration, only change is the downgrade.
Consider this post closed, but I'll leave it up in case anyone else runs into the problem.