Websockets throwing 502 after a while when running Gotify

I've been trying to get Websockets to work reliably.

I'm trying to run Gotify, which I have running without any issues behind nginx, but I'd like to move this to traefik.

My current nginx proxy configuration:

server {
     listen 443 ssl http2;
     server_name push.domain.tld;
     location / {
       # We set up the reverse proxy
       proxy_pass         http://gotify;
       proxy_http_version 1.1;

       # Ensuring it can use websockets
       proxy_set_header   Upgrade $http_upgrade;
       proxy_set_header   Connection "upgrade";
       proxy_set_header   X-Real-IP $remote_addr;
       proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
       proxy_set_header   X-Forwarded-Proto http;
       proxy_redirect     http:// $scheme://;

       # The proxy must preserve the host because gotify verifies the host with the origin
       # for WebSocket connections
       proxy_set_header   Host $http_host;

       # These sets the timeout so that the websocket can stay alive
       proxy_connect_timeout   7m;
       proxy_send_timeout      7m;
       proxy_read_timeout      7m;
  }
}

Now, I've been trying to find documentation around websockets for Traefik v2 and have only come across this post with some examples.

As a result, I have compiled the following docker-compose.yml:

version: '3.3'
networks:
  traefik_default:
    external: true
services:
    server:
        networks:
            - 'traefik_default'
        expose:
            - "80"
        container_name: gotify
        volumes:
            - '/home/florian/docker/gotify:/app/data'
        image: gotify/server
        labels:
            - traefik.http.middlewares.redirect-https.redirectScheme.scheme=https
            - traefik.http.middlewares.redirect-https.redirectScheme.permanent=true
            - "traefik.enable=true"
            - "traefik.http.routers.gotify-server.rule=Host(`push.domain.tld`)"
            - "traefik.http.routers.gotify-server.entrypoints=websecure"
            - "traefik.http.routers.gotify-server.tls=true"
            - "traefik.http.routers.gotify-server.tls.certresolver=lewildcardresolver"
            - "traefik.http.routers.gotify-server.middlewares=redirect-https"
            - "traefik.http.routers.gotify-server.service=gotify-server"
            - "traefik.http.services.gotify-server.loadbalancer.passhostheader=true"
            - "traefik.http.services.gotify-server.loadbalancer.server.port=80"

The service is reachable just fine via the browser, but the websocket connection is not reliable. After a while, my client gets a 502 back.

Here's the log from the Gotify Android client showing the 502 error (it's in reverse order):

2020-10-05T12:36:29.242Z INFO: WebSocket: scheduling a restart in 1200 second(s) (via alarm manager)
2020-10-05T12:36:29.217Z ERROR: WebSocket(86): failure StatusCode: 502 Message: Bad Gateway
java.net.ProtocolException: Expected HTTP 101 response but was '502 Bad Gateway'
	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
	at java.lang.Thread.run(Thread.java:919)

2020-10-05T12:36:29.066Z INFO: WebSocket(86): starting...
2020-10-05T12:36:29.064Z INFO: WebSocket(85): closing existing connection.
2020-10-05T12:25:39.588Z INFO: Entering LogsActivity

It seems like somehow Traefik closes the websocket too early. How can I leave that connection open for longer? Are there any other ideas on why this wouldn't work?

I also ran traefik in debug mode. What's funny is that I don't see any request around the time of the 502 bad gateway error.
Do Websocket errors not get logged in debug mode? Where would I be able to find more details about this 502 error?

Usually this is related to container networking. Traefik cannot connect to the container/service.

I've seen this when traefik has multiple networks to choose from and one is not defined for the service or the wrong port is selected(this is the port bound in the container not the published port).

You should not need this. Does it conflict with traefik?

Check the traefik dashboard and/or taefik log too.

It can connect just fine - as everything but long-lasting websockets works fine.

As mentioned above, I don't see anything in the logs relating to websockets, even when running it in debug mode.

I've tested this once more with removing of the expose, but the same situation happens.

So, after a few days of trying out various configurations, I think I've found the solution.

I think the key flags that were missing were:

      -"--entryPoints.websecure.transport.respondingTimeouts.readTimeout=420"
      - "--entryPoints.websecure.transport.respondingTimeouts.writeTimeout=420"
      - "--entryPoints.websecure.transport.respondingTimeouts.idleTimeout=420"

I've posted a more detailed writeup with the full docker-compose file here.