Docker swarm, stack deploy options yml. Attemping to reach 100% uptime. Weird issue with certs

Hi, I am working with Docker stack deployment in a cluster with 3 manager nodes. After a lot of unnecessary pain and suffering, I have the thing working. But I just noticed somethings that breaks Traefik, and I wonder if this can be improved.
This is the part of my deployment in docker-stack.yml. Please read the comments because they contain what I have discovered and some questions I have. Capital letters are things that break Traefik.

    deploy:
      mode: global
      resources:
        limits:
          cpus: "0.5"
          memory: "256M"
          pids: 100
      restart_policy:
 #I change it from on-failure attemping to reach 100% uptime
        condition: any
        delay: 1s
        max_attempts: 3
        window: 90s 

      update_config:
        parallelism: 1
        delay: 60s
        failure_action: rollback
        monitor: 120s  
# IF MAX_FAILURE_RATIO:0.5 IS NOT SET TRAEFIK JUST BREAKS.
        max_failure_ratio: 0.5  
#START-FIRST MAKES TRAEFIK GENERATE THE LETSENCRYPT CERTS AND 
#THEN IT GIVES DEFAULT CERTS.
        order: stop-first 

#This healthcheck only worked on docker compose, but in stack (swarm)breaks everything
#    healthcheck:
#      test: ["CMD-SHELL", "curl -f http://localhost:80 || exit 1"]
#      interval: 30s
#      timeout: 10s
#      retries: 3
#      start_period: 60s

With this setup if I update my services they break but just for a few seconds. But I wonder if I am missing something to make it better.
The "order: start-first" caused a weird issue. I had my Traefik working with stop-first and I changed it to stop-first, this caused traefik to be in the pending state without starting, so I reverted the change to start-first. Nevertheless, I don't know why but Traefik generated the let's encrypt certs and then It gave me the default certs, even though Traefik was already in the start-first option. I solved the issue by stopping the Traefik stack for some hours and when I deployed it again it worked. The acme.json never changed, so it was weird.

Share your full Traefik static and dynamic config, and full docker-compose.yml.

Check simple Traefik Swarm example.

This is my full docker-stack.yml, I am not using any other file for traefik. And I am deploying with docker stack deploy -c docker-stack.yml MYTRAFIK

networks:
  proxy:
    name: proxy
    driver: overlay
    attachable: true
  internal:
    name: internal
    driver: overlay

secrets:
  cf_tok_110824:
    external: true

services:
  traefik:
    image: traefik:v3.2.0
    hostname: '{{.Node.Hostname}}'
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host
      - target: 11111
        published: 11111
        protocol: tcp
        mode: host
      - target: 2222
        published: 2222
        protocol: tcp
        mode: host
      - target: 3333
        published: 3333
        protocol: tcp
        mode: host
      - target: 4444
        published: 4444
        protocol: tcp
        mode: host
      - target: 5555
        published: 5555
        protocol: tcp
        mode: host

    networks:
      - proxy
      - internal

    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /mnt/config/traefik/acme.json:/acme.json
    environment:
      CF_DNS_API_TOKEN_FILE: /run/secrets/cf_tok_110824
      TRAEFIK_DASHBOARD_CREDENTIALS: myuser:mypassword
  
    secrets:
      - cf_tok_110824

    command:
      - --global.checkNewVersion=false
      - --global.sendAnonymousUsage=false
      - --log.level=DEBUG
      - --api.dashboard=true
      - --api.debug=true
      - --ping=true
      #- --log.filepath=/var/log/traefik.log
      - --accesslog=true
      #- --accesslog.filepath=/var/log/traefik-access.log
      - --providers.swarm.exposedByDefault=false
      - --providers.swarm.network=proxy
      - --providers.swarm.endpoint=unix:///var/run/docker.sock
      - --entrypoints.http.address=:80
      - --entrypoints.https.address=:443
      - --entrypoints.entryone.address=:2222
      - --entrypoints.entrytwo.address=:3333
      - --entrypoints.entrythree.address=:11111
      - --entrypoints.entryfour.address=:4444
      - --entrypoints.entryfive.address=:5555

      - --entrypoints.http.http.redirections.entrypoint.to=https
      - --entrypoints.http.http.redirections.entrypoint.scheme=https
      - --entrypoints.https.asDefault=true
      - --entrypoints.https.http.tls.certresolver=cloudflare
      - --certificatesresolvers.cloudflare.acme.email=my@mail.com
      - --certificatesresolvers.cloudflare.acme.storage=acme.json
      - --certificatesresolvers.cloudflare.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory
      - --certificatesresolvers.cloudflare.acme.dnsChallenge.provider=cloudflare
      - --certificatesresolvers.cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
      - --serversTransport.insecureSkipVerify=true

    deploy:
      mode: global

      placement:
        constraints:
          - node.role==manager

      resources:
        limits:
          cpus: "0.5"
          memory: "256M"
          pids: 100

      restart_policy:
        condition: any 
        delay: 1s
        max_attempts: 3
        window: 90s 

      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
        monitor: 120s  
        max_failure_ratio: 0.5  
        order: stop-first

      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.traefik.entrypoints=http"
        - "traefik.http.routers.traefik.rule=Host(`subdomain.domain`)"
        - "traefik.http.middlewares.traefik-auth.basicauth.users=myuser:mypassword"
        - "traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https"
        - "traefik.http.middlewares.sslheader.headers.customrequestheaders.X-Forwarded-Proto=https"
        - "traefik.http.routers.traefik.middlewares=traefik-https-redirect"
        - "traefik.http.routers.traefik-secure.entrypoints=https"
        - "traefik.http.routers.traefik-secure.rule=Host(`subdomain.domain`)"
        - "traefik.http.routers.traefik-secure.middlewares=traefik-auth"
        - "traefik.http.routers.traefik-secure.tls=true"
        - "traefik.http.routers.traefik-secure.tls.certresolver=cloudflare"
        - "traefik.http.routers.traefik-secure.tls.domains[0].main=subdomain.domain"
        - "traefik.http.routers.traefik-secure.tls.domains[0].sans=*.subdomain.domain"
        - "traefik.http.routers.traefik-secure.service=api@internal"
        - "traefik.http.services.api@internal.loadbalancer.server.port=1337"

#    healthcheck:
#      test: ["CMD-SHELL", "curl -f http://localhost:80 || exit 1"]
#      interval: 30s
#      timeout: 10s
#      retries: 3
#      start_period: 60s

And this is how I use traefik in one of myservices

      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=proxy"
        - "traefik.http.routers.myservice-secure.rule=Host(`myservice.subdomain.domain`) && Path(`/loginpage/`)"
        - "traefik.http.routers.myservice-secure.entrypoints=https"
        - "traefik.http.routers.myservice-secure.service=myservice"
        - "traefik.http.routers.myservice-secure.tls=true"

        - "traefik.http.routers.myservice-secure.middlewares=myservice-auth"

        - "traefik.http.middlewares.myservice-auth.basicauth.users=myserviceuser:passwordforservice"

        - "traefik.http.routers.myservice-other.rule=Host(`myservice.subdomain.domain`) && PathPrefix(`/`)"
        - "traefik.http.routers.myservice-other.entrypoints=https"
        - "traefik.http.routers.myservice-other.service=myservice"
        - "traefik.http.routers.myservice-other.tls=true"

        - "traefik.http.services.myservice.loadbalancer.server.port=5555"
        - "traefik.http.services.myservice.loadbalancer.sticky=true"

Hi @bluepuma77 , I uploaded my full traefik config. Do you know something? I don't want to bother you, but I will really apreciate your help, and your comments about it.

Start-First does not work when using ports in host mode. You need to stop listening first, then another process can start listening.

Which URL does not work? Make sure to set labels under deploy section.

Your config looks not clean.

labels:
        - "traefik.enable=true"
        - "traefik.docker.network=proxy"
        - "traefik.http.routers.myservice-secure.rule=Host(`myservice.subdomain.domain`) && Path(`/loginpage/`)"
        - "traefik.http.routers.myservice-secure.entrypoints=https"
^ not needed, default
        - "traefik.http.routers.myservice-secure.service=myservice"
        - "traefik.http.routers.myservice-secure.tls=true"
^ why overwrite TLS from entrypoint?
        - "traefik.http.routers.myservice-secure.middlewares=myservice-auth"
        - "traefik.http.middlewares.myservice-auth.basicauth.users=myserviceuser:passwordforservice"

        - "traefik.http.routers.myservice-other.rule=Host(`myservice.subdomain.domain`) && PathPrefix(`/`)"
^ Prefix is useless here
        - "traefik.http.routers.myservice-other.entrypoints=https"
^ not needed
        - "traefik.http.routers.myservice-other.service=myservice"
        - "traefik.http.routers.myservice-other.tls=true"
^ why overwrite TLS from entrypoint?
        - "traefik.http.services.myservice.loadbalancer.server.port=5555"
        - "traefik.http.services.myservice.loadbalancer.sticky=true"
1 Like

Also not needed, you got redirect on entrypoint.

And Forward headers are set automatically.

1 Like

Thanks @bluepuma77 all your advices worked