For reasons unknown to me, the fix was deleting my ACME DNS (TXT) record, removing my acme.json
file, and getting a new certificate from scratch. This was the process:
- Delete ACME record from DNS - I use Cloudflare and my domain is registered with Porkbun - so I logged into each and removed the ACME DNS record.
- Stop Traefik and delete the
acme.json
file
- Restart Traefik
- Wait a few minutes for new certificates to be generated.
I have no idea what went wrong, but this was the fix. Also sharing my configuration and the error messages in case someone else stumbles upon this issue:
docker-compose.yml
:
traefik:
image: "traefik:v2.10.4"
container_name: "traefik"
privileged: true
command:
- "--log.level=DEBUG"
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--entrypoints.web.http.redirections.entrypoint.to=websecure"
- "--certificatesresolvers.myresolver.acme.dnschallenge=true"
- "--certificatesresolvers.myresolver.acme.dnschallenge.provider=cloudflare"
# - "--certificatesresolvers.myresolver.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
- "--certificatesresolvers.myresolver.acme.email=[my Cloudflare email]"
- "--certificatesresolvers.myresolver.acme.storage=/certs/acme.json"
- "--certificatesresolvers.myresolver.acme.dnschallenge.resolvers=1.1.1.1:53"
ports:
- "80:80"
- "443:443"
- "8080:8080"
environment:
- CF_API_KEY=[my API key]
- CF_API_EMAIL=[my Cloudflare email]
- CF_DNS_API_TOKEN=[my API token]
volumes:
- "./certs:/certs"
- "/var/run/docker.sock:/var/run/docker.sock"
restart: unless-stopped
labels:
- "traefik.enable=true"
- "traefik.http.routers.domain.entrypoints=websecure"
- "traefik.http.routers.domain.rule=Host(`my-domain.com`)"
- "traefik.http.routers.domain.tls.certresolver=myresolver"
- "traefik.http.routers.domain.middlewares=domain"
- "traefik.tls.stores.default.defaultgeneratedcert.resolver=myresolver" # New label #1
- "traefik.tls.stores.default.defaultgeneratedcert.domain.main=my-domain.com" # New label #2
# The last two labels were added to troubleshoot this issue - everything worked fine without them before hand
As mentioned above, this is the message that tipped me off to the error:
level=debug msg="No default certificate, fallback to the internal generated certificate" tlsStoreName=default
I could see my ACME certificate was OK by checking with curl
:
root@My-SRV:/# curl http://localhost:8282/api/http/routers/[my-service]@docker
{"entryPoints":["websecure"],"service":"[my-service]","rule":"Host(`[service.my-domain.com]`)","tls":{"options":"default","certResolver":"myresolver"},"status":"enabled","using":["websecure"],"name":"[my-service]@docker","provider":"docker"}
# The 'certresovlver` is the one I set up in the compose above to use ACME validation
In addition, I could tell the certificate was OK by reading the /certs/acme.json
file from inside the container:
/ # cat /certs/acme.json | grep valid
"status": "valid",
I could also see this message repeating for all my subdomains:
level=debug msg="No ACME certificate generation required for domains [\"service.domain.com\"]." rule="Host(`service.domain.com`)" ACME CA="https://acme-v02.api.letsencrypt.org/directory" providerName=myresolver.acme routerName=my-service@docker
All of these (eventually) led me to conclude nothing is wrong with Traefik nor with my certificate, hence deleting the DNS record.
I hope this helps - will gladly provide more info if needed.