Hi all,
i've a strange issue with renewing certificates. Creating new certificates for new domains works, renewing certs has stopped working and i don't know why. I've running traefik 2.10 on a docker swarm and i had 3 managers nodes and on each one traefik instance was running, acme.json is shared via glusterfs on all 3 nodes. Most of the domains are behind Cloudflare, some not. But no difference between them, except on a container (domain behind Cloudflare) that has this additional configuration is renewed regularly:
- "traefik.http.routers.node2.tls.domains[0].main=Redacted.TLD"
- "traefik.http.routers.node2.tls.domains[0].sans=*.Redacted.TLD"
I don't know what specific change has led to this issue, the only bigger change we had was docker update from 24 to 25. But after that some certs were renewed.
I now get the following error:
Error renewing certificate from LE: {Redacted [Redacted]}" error="error: one or more domains had a problem:\n[Redacted] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: During secondary validation: 2606:4700:3032::6815:3bb1: Invalid response from http://Redacted/.well-known/acme-challenge/5Rt5Xm024_p3n9zh9oPoHlQhucm1pcqGgMXwXasYGOE: 404\n" providerName=leresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory"
and lots of these lines:
time="2024-04-26T17:54:58Z" level=error msg="Cannot retrieve the ACME challenge for Redacted (token \"MsmFFcds4QXY3KveynCGr-Y6JfDgkQGWNLHpvRivwRc\")" providerName=acme
traefik.yaml
version: "3.3"
services:
traefik:
image: "traefik:v2.10"
command:
- "--log.level=DEBUG"
- --api.dashboard=false
- --entrypoints.websecure.http3
- --experimental.http3=true
- --certificatesresolvers.leresolver.acme.caserver=https://acme-v02.api.letsencrypt.org/directory
- --certificatesresolvers.leresolver.acme.email=Redacted
- --certificatesresolvers.leresolver.acme.storage=/le/acme.json
- --certificatesresolvers.leresolver.acme.httpchallenge=true
- --certificatesresolvers.leresolver.acme.httpchallenge.entrypoint=web
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- --providers.docker
- --providers.docker.exposedbydefault=false
- --providers.docker.swarmmode=true
- --providers.docker.network=public
- --providers.docker.watch
- --entrypoints.web.proxyProtocol.trustedIPs=Redacted
- --entrypoints.web.forwardedHeaders.trustedIPs=Redacted
- --entrypoints.websecure.proxyProtocol.trustedIPs=Redacted
- --entrypoints.websecure.forwardedHeaders.trustedIPs=Redacted
ports:
- "80:80"
- "443:443/tcp"
- "443:443/udp"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
- "/opt/mnt/traefik/acme.json:/le/acme.json"
networks:
- public
deploy:
mode: global
placement:
constraints: [node.role == manager]
labels:
- "traefik.enable=true"
networks:
public:
external: true
I have now changed the configuration so that traefik is running on one single node only, but still the same error.
Sample config from one container:
deploy:
labels:
- 'traefik.enable=true'
- 'traefik.http.routers.node1.rule=Host(`${DOMAIN1}`) || Host(`${DOMAIN2}`)‘
- "traefik.http.routers.node1.service=node1"
- "traefik.http.services.node1.loadbalancer.server.port=3000"
- 'traefik.http.routers.node1.entrypoints=websecure'
- "traefik.http.middlewares.node1.forwardauth.trustForwardHeader=true"
- "traefik.http.routers.node1.tls=true"
- 'traefik.http.routers.node1.tls.certresolver=leresolver'
placement:
constraints: [node.role == worker]
replicas: 4
mode: replicated
update_config:
parallelism: 2
delay: 10s
failure_action: rollback
order: start-first
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
acme.json is writable and traefik is writting new certificates to the file. To make it more complicated, restarting traefik sometimes helps to renew a certificate that had errors before in the log and wasn't renewed before restart.
I hope I have added all the necessary information.
Update:
I've done some further investigations and checked the certificates in acme.json, in acme.json are new and updated certificates but they are not used by traefik (but traefik has written them).
Now i have an additional error in the log:
time="2024-04-27T08:10:02Z" level=debug msg="legolog: [INFO] [Redacted.TLD] acme: authorization already valid; skipping challenge"
time="2024-04-27T08:10:02Z" level=debug msg="legolog: [INFO] [*.Redacted.TLD] acme: Could not find solver for: dns-01"
time="2024-04-27T08:10:02Z" level=debug msg="legolog: [INFO] Skipping deactivating of valid auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/redacted"
time="2024-04-27T08:10:02Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/redacted"
time="2024-04-27T08:10:02Z" level=error msg="Unable to obtain ACME certificate for domains \"Redacted.TLD,*.Redacted.TLD\"" rule="Host(`sub.Redacted.TLD`)" ACME CA="https://acme-v02.api.letsencrypt.org/directory" providerName=leresolver.acme routerName=redacted@docker error="unable to generate a certificate for the domains [Redacted.TLD *.Redacted.TLD]: error: one or more domains had a problem:\n[*.Redacted.TLD] [*.Redacted.TLD] acme: could not determine solvers\n"
Help is highly appreciated.
Thanks