File uploads fail after a a certain threshold

Hello. I'm using a DigitalOcean Droplet (VPS) with docker-compose to host a frontend and backend app. These apps work perfectly except for one detail: File uploads.

When I upload files 20mb or below from the frontend to the backend, it succeeds without issues, but after trying a 80mb GIF. In the latter case, the frontend logs this in the browser: "Failed to load resource: net::ERR_SSL_BAD_RECORD_MAC_ALERT".

If I try this same upload via some other method (I tried HTTPie), sometimes it would succeed, sometimes it would fail, this seems totally at random and HTTPie simply says "Connection closed by remote server".

Both scenarios produce different Traefik logs:
When using fetch in browser: github.com/traefik/traefik/v3/pkg/proxy/httputil/proxy.go:121 > 499 Client Closed Request error="context canceled"
When using HTTPie: 2025-05-23T00:27:05Z DBG github.com/traefik/traefik/v3/pkg/proxy/httputil/proxy.go:121 > 502 Bad Gateway error="readfrom tcp x.x.x.x:48338->x.x.x.x:3001: local error: tls: bad record MAC"

Here's my docker-compose file (removed some irrelevant stuff):

version: '3.8'
services:
  watchtower:
    image: containrrr/watchtower
    command:
      - "--label-enable"
      - "--interval"
      - "30"
      - "--rolling-restart"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /home/user/.docker/config.json:/config.json
  reverse-proxy:
    image: traefik:v3.4.0
    command:
      - "--providers.docker"
      - "--providers.docker.exposedbydefault=false"
      - "--entryPoints.websecure.address=:443"
      - "--certificatesresolvers.myresolver.acme.tlschallenge=true"
      - "--certificatesresolvers.myresolver.acme.email=user@example.net"
      - "--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--log.level=DEBUG"
      - "--accesslog=true"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - letsencrypt:/letsencrypt
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      my_network:
        aliases:
          - api.example.net
    depends_on:
      - my-backend
      - my-frontend
  my-backend:
    image: ghcr.io/user/backend-image:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.backend.rule=Host(`api.example.net`)"
      - "traefik.http.routers.backend.entrypoints=websecure"
      - "traefik.http.routers.backend.tls.certresolver=myresolver"
      - "com.centurylinklabs.watchtower.enable=true"
      - "traefik.http.services.backend.loadbalancer.server.port=3001"
    deploy:
      mode: replicated
      replicas: 2
    restart: always
    networks:
      - my_network
  my-frontend:
    image: ghcr.io/user/frontend-image:latest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.frontend.rule=Host(`example.net`)"
      - "traefik.http.routers.frontend.entrypoints=websecure"
      - "traefik.http.routers.frontend.tls.certresolver=myresolver"
      - "com.centurylinklabs.watchtower.enable=true"
      - "traefik.http.services.frontend.loadbalancer.server.port=3000"
    deploy:
      mode: replicated
      replicas: 2
    restart: always
    networks:
      - my_network

volumes:
  letsencrypt:

networks:
  my_network:
    external: true

I've confirmed that both the frontend and backend are accessible and run as intended and the frontend can make any requests to the backend, but the only thing I can't do is upload that 80mb file (and any file around that size). I even tried making a request to a non-existent endpoint in my backend and sometimes it returns 404 (as it should) and other times it refuses to give a response in the same way as outlined earlier.

I'm pretty stumped as to what to do or try. Any help would be appreciated.

HTTP status 499 means "Client closed request".

So it could be your client/browser running in a timeout, or a proxy or load balancer on the way to Traefik.

If you can control client and server software, I would switch to partial uploads, so a single connection does not take so much time.

I haven't modified anything related to Traefik's timeout settings, yet these requests always fail within 1-5 seconds max, while successfully getting a 200 status code takes around two minutes and the occasional times the endpoint works it handles that time no problem.

Also, adding onto what I said, the timeout issue doesn't really explain the fact that there seems to be something related to "bad record MAC" (which I'm not sure what it actually means).

To quote a chatbot:

ERR_SSL_BAD_RECORD_MAC_ALERT is an SSL/TLS error indicating that the data integrity check (MAC: Message Authentication Code) during a secure connection failed. This typically means data sent between your browser and the server was corrupted or tampered with.

From this I would assume it’s a network issue, maybe data gets corrupted during transmission.

In general changing file size exponentially (like from 8 to 80MB) might break some systems on the way, either through timeout or size limit. A standard web server like Apache or nginx usually has some default size limits like 10MB. Traefik has some default timeouts.

Any clue if there is an option in Traefik causing this issue? I've tried to debug this issue extensively with AI and I either added options that did nothing to solve the issue or it would hallucinate non-existent options.

Just tried adding all these options. None of them fixed the issue:

- "--entryPoints.web.transport.respondingTimeouts.readTimeout=0"
- "--entryPoints.web.transport.respondingTimeouts.writeTimeout=0"
- "--entryPoints.web.transport.respondingTimeouts.idleTimeout=600"
- "--entryPoints.web.transport.keepAliveMaxRequests=0"
- "--entryPoints.web.transport.keepAliveMaxTime=0"

I also tried to add these options to the websecure entrypoint and still nothing.

Yet another thing I want to add is that I'm moving from Render to using a VPS. This is relevant because when the frontend and backend were hosted on Render, it would work perfectly fine.