Docker Registry uploads fail with HTTP 499 errors when running behind Traefik (v3.6) in Docker Swarm. Buffering, or timeout behavior during large image pushes

I’m running a private Docker Registry (registry:2) behind Traefik in a Docker Swarm setup. Traefik handles HTTPS via Let’s Encrypt, and the registry itself runs without TLS behind the proxy.

Small Docker images push successfully, confirming Traefik routing and authentication are working. However, larger images fail to upload. Traefik logs show HTTP 499 responses on the PUT /v2/.../blobs/uploads/ endpoint, indicating that the client connection is being closed during the upload.

I’ve tried using Traefik’s buffering middleware to limit request body sizes and buffer large uploads, but it doesn’t seem to resolve the issue. It appears Traefik’s default timeouts or other internal limits are still interfering with large Docker layer uploads.

Traefik

services:
  traefik:
    image: traefik:v3.6

    command:
      # Providers
      - "--providers.swarm=true"
      - "--providers.swarm.exposedbydefault=false"
      - "--providers.swarm.network=traefik-net"
      - "--providers.file.directory=/dynamic"
      - "--providers.file.watch=true"
      # EntryPoints
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      # ACME
      - "--certificatesresolvers.letsencrypt.acme.email=mymail1@gmail.com"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
      # API
      - "--api.dashboard=true"
      - "--api.insecure=false"
      # Logs
      - "--log.level=DEBUG"
      - "--accesslog=true"

    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /dir/traefik-stack/acme:/letsencrypt
      - /dir/traefik-stack/dynamic:/dynamic:ro

    networks:
      - traefik-net

    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      labels:
        - "traefik.enable=true"

        # Dashboard router
        - "traefik.http.routers.dashboard.rule=Host(`traefik.domain.io`)"
        - "traefik.http.routers.dashboard.entrypoints=websecure"
        - "traefik.http.routers.dashboard.service=api@internal"
        - "traefik.http.routers.dashboard.tls=true"

        # Basic‑auth middleware
        - "traefik.http.routers.dashboard.middlewares=dashboard-auth@file,vpn-only@file"

        # Service hint
        - "traefik.http.services.traefik.loadbalancer.server.port=8080"
networks:
  traefik-net:
    external: true

Registry

services:
  registry:
    image: registry:2
    environment:
      - REGISTRY_AUTH=htpasswd
      - REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd
      - REGISTRY_AUTH_HTPASSWD_REALM=Registry Realm
      - REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY=/var/lib/registry
      - REGISTRY_HTTP_ADDR=0.0.0.0:5000
    volumes:
      - /dir/registry-stack/data:/var/lib/registry
      - /dir/registry-stack/auth:/auth
      - /dir/registry-stack/logs:/var/log/registry
      # Uncomment if using direct SSL:
      # - /home/admin/infra/registry-stack/certs:/certs
    networks:
      - traefik-net
      - registry-net
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      labels:
        # Enable Traefik
        - "traefik.enable=true"
        # HTTP Router
        - "traefik.http.routers.registry.rule=Host(`reg.domain.io`)"
        - "traefik.http.routers.registry.entrypoints=websecure"
        - "traefik.http.routers.registry.tls.certresolver=letsencrypt"
        - "traefik.http.routers.registry.service=registry"
        - "traefik.http.routers.registry.middlewares=limit@file,vpn-only@file"
        # Service
        - "traefik.http.services.registry.loadbalancer.server.port=5000"
        # Redirect HTTP to HTTPS
        - "traefik.http.routers.registry-http.rule=Host(`registry.domain.io`)"
        - "traefik.http.routers.registry-http.entrypoints=web"
        - "traefik.http.routers.registry-http.middlewares=redirect-to-https"
        - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
        - "traefik.http.middlewares.redirect-to-https.redirectscheme.permanent=true"
    secrets:
      - registry-auth

networks:
  traefik-net:
    external: true
  registry-net:
    external: true

secrets:
  registry-auth:
    external: true

Middleware

http:
  middlewares:
    limit:
      buffering:
        maxRequestBodyBytes: 0
        memRequestBodyBytes: 0
        maxResponseBodyBytes: 0
        memResponseBodyBytes: 0

Use 3 backticks before and after config to format it correctly, preserving spacing.

Check the doc for increasing the various timeout settings.

Hello again,

Thanks for the quick reply. I know these timeout settings exist at the Traefik entrypoint, but here’s the problem: setting all :443 timeouts to 0s does allow large uploads, but it’s risky because a slow or malicious client could hold connections indefinitely and tie up resources. Unlimited keep-alive can also monopolize Traefik’s connection pool.

Using buffering middleware alone doesn’t solve long upload times, which is why my big Docker pushes were failing. Since my registry is VPN-only, the risk is lower, but right now these settings apply globally to all services. I was really hoping for a way to apply them per service rather than globally.

p.s
Adding these to command section work

services:
  traefik:
    image: traefik:v3.6

    command:
      # Providers
      - "--providers.swarm=true"
      - "--providers.swarm.exposedbydefault=false"
      - "--providers.swarm.network=traefik-net"
      - "--providers.file.directory=/dynamic"
      - "--providers.file.watch=true"
      # EntryPoints
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      # Set timeouts to allow big uploads
      - "--entrypoints.websecure.transport.respondingtimeouts.readtimeout=0s"
      - "--entrypoints.websecure.transport.respondingtimeouts.writetimeout=0s"
      - "--entrypoints.websecure.transport.respondingtimeouts.idletimeout=0s"
      - "--entrypoints.websecure.transport.keepalivemaxtime=0s"
      - "--entrypoints.websecure.transport.keepalivemaxrequests=0"
      # ACME
      - "--certificatesresolvers.letsencrypt.acme.email=mymail@gmail.com"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
      # API
      - "--api.dashboard=true"
      - "--api.insecure=false"
      # Logs
      - "--log.level=DEBUG"
      - "--accesslog=true"

I had similar timeout issues with nginx, registry and large Docker images. I just disabled the timeout, measured the time it takes to upload, then added 20% and set that as timeout value.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.