Hi,
We are increasingly seeing a weird traefik issue on one of our docker swarm setups. We have 3 identical docker swarms but this issue is only happening on one of them.
Basically, if we restart a Jenkins container or remove/re-deploy a service, traefik always sees the service (referenced in the api) but routing to it may not work - even though the service is up and available. This is intermittent and our configs haven't changed in +12 months.
Removing the service, waiting 1-2 mins and re-deploying usually fixes it.
Has anyone seen this behaviour before? Could it be load related? We have ~30 services (jenkins containers) sitting on a multi node docker swarm (19.03). All sharing the same docker network that traefik is listening on.
I've added chunks of the relevant config below and there doesn't seem to be any errors in the logs relating to this so we're a bit stumped. Any insight would be greatly appreciated
# traefik.yml
entryPoints:
web:
address: ":80"
http:
redirections:
entrypoint:
to: websecure
scheme: https
permanent: true
websecure:
address: ":443"
accessLog:
filePath: "/etc/traefik/logs/access.log"
bufferingSize: 100
log:
filePath: "/etc/traefik/logs/error.log"
# format: json
level: ERROR
providers:
file:
filename: /etc/traefik/traefik.yml
watch: true
docker:
endpoint: "unix:///var/run/docker.sock"
swarmMode: true
swarmModeRefreshSeconds: 30
watch: true
constraints: "Label(`traefik.enable`, `true`)"
exposedByDefault: false
Traefik service
version: '3.5'
networks:
traefik:
external: true
services:
reverse-proxy:
image: "traefik:v2.10.1"
volumes:
- /traefik/traefik.yml:/etc/traefik/traefik.yml
- /traefik/ssl:/etc/traefik/ssl
- /traefik/logs:/etc/traefik/logs
ports:
- "80:80" # The HTTP port (redirected to 443)
- "443:443"
networks:
- traefik
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
labels:
- traefik.enable=true
- traefik.docker.network=traefik
- traefik.http.services.dummyservice.loadbalancer.server.port=1111
restart_policy:
condition: any
delay: 5s
max_attempts: 3
window: 120s
placement:
constraints:
- node.labels.environment == prod
- node.labels.node.role == manager
and lastly, the services are labelled as follows
deploy:
labels:
- traefik.enable=true
- traefik.docker.lbswarm=true
- traefik.http.services.{{ SERVICE_NAME }}_jenkins_ci.loadbalancer.server.port=8080
- traefik.http.routers.{{ SERVICE_NAME }}_jenkins_ci.rule=Host(`region.location.com`) && PathPrefix(`/jenkins/{{ SERVICE_NAME }}`)
- traefik.http.routers.{{ SERVICE_NAME }}_jenkins_ci.entrypoints=websecure
- traefik.http.routers.{{ SERVICE_NAME }}_jenkins_ci.tls=true
- traefik.http.routers.{{ SERVICE_NAME }}_jenkins_ci.service={{ SERVICE_NAME }}_jenkins_ci
- traefik.docker.network=traefik