Hi folks,
I'm a traefik novice. We are seeing cluster degradation on the dashboard (as shown below). We are running Traefik EE on Docker Swarm with an ansible playbook. The logs from TEE is being picked up by greylog
Q1: Can this be a false positive?
When searching our greylog
logs, I couldn't find anything in container_name: traefikee_con*
with errors that relates to the timeframe where we were seeing 504
errors.
Q2: If this is not a false positive, how can we recover the cluster?
From my understanding of traefik (and reading the docs), there can be a few possible solutions:
- Create a new HA cluster with the same configuration and swap over to the new cluster.
- Redeploy the full TEE stack from the ansible playbook.
I don't have the instructions for option #1 but can someone maybe help me with that?
For option #2, here is the documentation I have but I've never run through them before. I'm worried that this may make our services non-operational. So, can someone also help me figure out if these steps are okay?
Steps to Reset Traefik EE Stack
Removing the stacks
docker stack rm traefikee
Remove the config
Two config items will be in the swarm. They will be prefixed with the environment name.
{{env}}-controller
{{env}}-proxy
Example for test:
docker config rm test-controller test-proxy
Remove or clean volume
docker run --rm -it --volume vol0:/data bash rm -fr /data/controller-0
Redeploy controller using ansible
ansible-playbook -i inventory-test/ stack-traefikee.yaml
Wait until LetsEncrypt challenge is completed and the controller is up before going to the next steps.
Redeploy proxies using ansible
ansible-playbook -i inventory-test/ stack-traefikee-proxies.yaml