Seemingly random timeouts for some clients

Hello,

I have traefik setup with docker-compose.yaml (or rather docker stack) on a Ubuntu host.

For some users trying to reach my application (https://my-app.com:443/) results in an timeout on some browsers/devices. (Browsers/Devices I know that had this issue before: Samsung Galaxy S23 with Samsung Internet (works now), MacOs Safari)

The only errors I see in the log are TLS handshake errors.

My docker-compose file:

services:
  my-app: # based on python:3.12-slim + uvicorn/fastapi
    image: ghcr.io/me/my-app:${GIT_COMMIT_HASH:-latest}
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.my-app.rule=PathPrefix(`/api`) && (Host(`my-app.com`) || Host(`www.my-app.com`))"
      - "traefik.http.routers.my-app.entrypoints=websecure"
      - "traefik.http.routers.my-app.tls.certresolver=myresolver"
      - "traefik.http.routers.my-app.middlewares=strip-api, daily-ratelimit"
      - "traefik.http.services.my-app.loadbalancer.server.port=8000"
      - "traefik.http.middlewares.strip-api.stripprefix.prefixes=/api"
      - "traefik.http.middlewares.daily-ratelimit.ratelimit.average=20"
      - "traefik.http.middlewares.daily-ratelimit.ratelimit.period=24h"
      - "traefik.http.middlewares.daily-ratelimit.ratelimit.burst=20" 
    environment:
      - OPENAI_API_KEY_FILE=/run/secrets/openai-key
    secrets:
      - regcred
      - openai-key
    deploy:
      update_config:
        order: start-first
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      
  my-app-ui: # nginx serving static files
    image: ghcr.io/me/my-app-ui:${GIT_COMMIT_HASH:-latest}
    depends_on:
      - my-app
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.my-app-ui.rule=PathPrefix(`/`) && (Host(`my-app.com`) || Host(`www.my-app.com`))"
      - "traefik.http.routers.my-app-ui.entrypoints=websecure"
      - "traefik.http.routers.my-app-ui.tls.certresolver=myresolver"
      - "traefik.http.routers.my-app-ui.middlewares=mywwwredirect"
      - "traefik.http.middlewares.mywwwredirect.redirectregex.regex=^https://www\\.(.*)"
      - "traefik.http.middlewares.mywwwredirect.redirectregex.replacement=https://$${1}"
      - "traefik.http.services.my-app-ui.loadbalancer.server.port=80"
    secrets:
      - regcred
    deploy:
      update_config:
        order: start-first
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:80"]
      interval: 30s
      timeout: 10s
      retries: 3

  traefik:
    image: traefik:v3.2
    command:
      - "--accesslog=true"
      - "--accesslog.format=json"
      - "--log.level=DEBUG"
      - "--log.format=json"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.docker.watch=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--certificatesresolvers.myresolver.acme.email=dominik@my-app.com"
      - "--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.myresolver.acme.tlschallenge=true"
      - "--entrypoints.websecure.address=:443"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    depends_on:
      - my-app
      - my-app-ui
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "letsencrypt:/letsencrypt"
    deploy:
      update_config:
        order: start-first

volumes:
  letsencrypt:

secrets:
  regcred:
    external: true
  openai-key:
    external: true

Let’s start by sorting things out. You use simple Docker, which enables depends_on, or you use Docker Swarm, which enables deploy?

Enable and check Traefik debug log and Traefik access log in JSON format, check during requests.

I am using docker stack, which uses docker swarm behind the scenes.

Traefik debug log is not written to, during these requests. (I currently get them on a device that worked before)

Is the access log also in the sys output for docker images or do I need to inspect that independently?

If you use docker stack deploy and Docker Swarm, then you need to use Traefik v3 providers.swarm, not providers.docker and the labels need to be placed within deploy section.

Did you check Traefik dashboard if the target services are recognized? Maybe check simple Traefik Swarm example.

Note that depends_on does not work with Docker Swarm.

Doc for Traefik access log, by default it goes to container standard output.

Thanks, that seems to have worked.

Do you have an idea why the old version worked for most calls but not all?

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.