Netlify Renewal suddenly starts failing

Today my TLS certificates expired. This is in the logs:

time="2023-04-26T16:22:42Z" level=error msg="Error renewing certificate from LE: {b49.cloudserver.click []}" ACME CA="https://acme-v02.api.letsencrypt.org/directory" error="error: one or more domains had a problem:\n[b49.cloudserver.click] [b49.cloudserver.click] acme: error presenting token: netlify: failed to create TXT records: fqdn=_acme-challenge.b49.cloudserver.click., authZone=b49.cloudserver.click: invalid status code: 404 Not Found: {\"code\":404,\"message\":\"Not Found\"}\n" providerName=le_main.acme

What I tried:

  • Restart
  • Upgrade to 2.10
  • Use a new Personal Access Token from Netlify

I am still getting the same error and nothing is working because it can't renew.

Did you change any configuration the last 90 days?

If all routers have Host() with their domain names, you could just switch to tlsChallenge.

1 Like

not sure, definetely added some routers and services, but I do not think the main tls config has been changed.
The reason I have to use DNS challenges ist that the services are running on a private network that can not be accessed by LetsEncrypt.

The explicitly named domain exists and points to the right IP?

In the public internet it is set to be the address of my VPN and in the local network dns it is set to be the local address. None of these addresses are routable by LE, that is why I am using DNS challenges.

just tried generating certs with lego directly, same error:

2023/04/26 18:59:49 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/redacted
2023/04/26 18:59:50 Could not obtain certificates:
        error: one or more domains had a problem:
[b49.cloudserver.click] [b49.cloudserver.click] acme: error presenting token: netlify: failed to create TXT records: fqdn=_acme-challenge.b49.cloudserver.click., authZone=b49.cloudserver.click: invalid status code: 404 Not Found: {"code":404,"message":"Not Found"}

EDIT:
I tried with the top level domain (cloudserver.click) and it works. So maybe something broke on netlifys side that prevents lego from determining the correct zone for the subdomain?

EDIT2:

UPDATE:
I have switched my DNS to cloudflare and I am facing the same issue, but I get a better error now:

time="2023-04-26T21:18:15Z" level=error msg="Unable to obtain ACME certificate for domains \"passwords.b49.cloudserver.click\": unable to generate a certificate for the domains [passwords.b49.cloudserver.click]: error: one or more domains had a problem:\n[passwords.b49.cloudserver.click] [passwords.b49.cloudserver.click] acme: error presenting token: cloudflare: failed to find zone b49.cloudserver.click.: zone could not be found\n" routerName=vaultwarden@docker providerName=le_main.acme ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory" rule="Host(`passwords.b49.cloudserver.click`)"

So for some reason it wants to use b49.cloudserver.click as the zone name for all of my Services (the domains all follow the pattern [service].b49.cloudserver.click).

I need to find a way to tell it that cloudserver.click is the actual zone name.

I think I found the reason:

In my local network b49.cloudserver.click is actually its own zone! It must have stopped working because I switched my local dns server. So traefik does the lookup of the zone with my local dns. It tells it that b49.cloudserver.click has a SOA record.
In the public DNS however, b49.cloudserver.click is a simple A record. So I just need to find a way to tell traefik to use a public dns when doing certificates.