Failover service + Docker Swarm

Imagine the following compose file for swarm:

version: '3.8'

networks:
  public:
    external: true

volumes:
  traefik-acme:

configs:
  traefik-failover:
    name: traefik-failover-${COMMIT_SHA}
    file: ./failover.traefik.yml

services:
  traefik:
    image: traefik:2.10.7
    command:
      - --providers.file.directory=/etc/traefik/config
      - --providers.file.watch=true
      - --providers.docker.swarmMode=true
      - --providers.docker.network=public
      - --providers.docker.exposedByDefault=false
      - --api
      - --log.level=ERROR
      - --entrypoints.http.address=:80
      - --entrypoints.https.address=:443
    ports:
      - "80:80"
      - "443:443"
    networks:
      - public
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - traefik-acme:/acme
    configs:
      - source: traefik-failover
        target: /etc/traefik/config/failover.traefik.yml
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=public"

  my-app:
    image: my-frontend
    healthcheck:
      test: [ "CMD", "wget", "--spider", "-q", "http://localhost:5173/container-health" ]
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 10s
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: start-first
      placement:
        constraints: [ node.role == worker ]
      labels:
        - 'traefik.enable=true'
        - 'traefik.http.routers.my-app.rule=Host(`app.${PUBLIC_DOMAIN}`)'
        - 'traefik.http.routers.my-app.service=my-app-failover@file'
        - 'traefik.http.services.my-app.loadbalancer.server.port=5173'
        - 'traefik.http.services.my-app.loadbalancer.healthcheck.path=/health'
        - 'traefik.http.services.my-app.loadbalancer.healthcheck.interval=5s'
        - 'traefik.http.services.my-app.loadbalancer.healthcheck.timeout=5s'
  
  maintenance:
    image: my-maintenance-page-frontend
    deploy:
      mode: replicated
      replicas: 1
      update_config:
        order: start-first
      placement:
        constraints: [ node.role == worker ]
      labels:
        - 'traefik.enable=true'
        - 'traefik.http.services.maintenance.loadbalancer.server.port=5173'

Traefik config (failover.traefik.yml):

http:
  services:
    my-app-failover:
      failover:
        service: my-app@docker
        fallback: maintenance@docker

With this configuration, I can return 503 from /health route of my-frontend app and the failover will work correctly. But when my-frontend container is dead (/container-health is failing) - the service discovery says "The service does not exist" and failover doesn't work - probably because this is treated the same as if I had simply not specified the my-app service in the compose file. But I want the maintenance page to be shown in this case. And I don't want to switch to k8s xD

Why is there no ability to use failover when the service isn't detected?

Have you tried adding failover healthcheck (doc)?

PS: you set traefik.docker.network=public, but it seems both target services are not attached to it.

The missing network is just the typo

The real config is working except for case when container is dead :slight_smile:

I'll check failover healthcheck asap, but I don't think it will help, as I understand it still won't be working because of missing docker service

The docs says: If HealthCheck is enabled for a given service, but any of its descendants does not have it enabled, the creation of the service will fail.

Managed to solve this problem by using allowEmptyServices setting. This way the service remains registered even though its healthcheck is failing

1 Like

Nope :confused: This is not the fix, just was lucky and it looked like the problem had gone away... I had to move my traefik configuration for the service from docker compose file to dynamic config, so it won't disappear when the docker service is not available. It's a little complicated and ugly, but at least works. Now I wonder, can this problem be solved on the Traefik level (improving docker provider), or is it impossible to read labels from the dead service by design. I'll be grateful if someone give the answer to this question, but I doubt that anyone other than traefik developers can answer it.

Dear @ldez, who would be the best person to ask about this?