DNS-01 challenge fails

Im trying to set up Lets Encrypt with a dnschallenge for teale.cloud. I can see on the dns providers site that the TXT records are correctly created, and I can retrieve them from the docker host Traefik is on, if I query those nameservers directly. I've set those nameservers as the resolvers in Traefik.

docker-compose.yml:

version: '3.4'
services:
  reverse-proxy:
    image: traefik:v3.0
    container_name: traefik
    command: 
      - --log.level=DEBUG
      - --api.insecure=true
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --providers.docker.network=blue_traefik
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.myresolver.acme.email=bill@auspilot.net
      - --certificatesresolvers.myresolver.acme.storage=acme.json
      - --certificatesresolvers.myresolver.acme.dnschallenge.provider=bunny
      - --certificatesresolvers.myresolver.acme.dnschallenge.resolvers=91.200.176.1:53,185.85.196.65:53
      - --certificatesresolvers.myresolver.acme.dnschallenge.delaybeforecheck=0
      - --certificatesresolvers.myresolver.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
    environment:
      BUNNY_API_KEY: nope
      BUNNY_TTL: 3600
      BUNNY_PROPAGATION_TIMEOUT: 300
      BUNNY_POLLING_INTERVAL: 5
      LEGO_DISABLE_CNAME_SUPPORT: true
    networks:
      - traefik
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./acme.json:/acme.json
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.traefik.rule=Host(`traefik.teale.cloud`)'
      - 'traefik.http.services.traefik.loadbalancer.server.port=8080'
    restart: unless-stopped
  nginx:
    image: nginxinc/nginx-unprivileged
    container_name: nginx
    restart: unless-stopped
    networks:
      - traefik
    volumes:
      - './web/src:/usr/share/nginx/html'
      - './web/nginx.conf:/etc/nginx/nginx.conf:ro'
    labels:
      - 'traefik.enable=true'
      - 'traefik.http.routers.nginx.rule=Host(`www.teale.cloud`)'
      - 'traefik.http.routers.nginx.tls=true'
      - 'traefik.http.routers.nginx.tls.certresolver=myresolver'
      - 'traefik.http.routers.nginx.middlewares=redirect-web-secure'
      - 'traefik.http.middlewares.redirect-web-secure.redirectscheme.scheme=https'
      - 'traefik.http.services.nginx.loadbalancer.server.port=8080'

Adguard Home has the following rule applied:
||teale.cloud^$dnstype=A,dnsrewrite=NOERROR;A;192.168.50.4

The effect of the above is to rewrite the A records for the teale.cloud domain locally.

When I start the traefik container, it attempts to conduct the dns challenge, but claims that it is:

> Unable to obtain ACME certificate for domains error="unable to generate a certificate for the domains [www.teale.cloud]: error: one or more domains had a problem:\n[www.teale.cloud] propagation: time limit exceeded: last error: NS kiki.bunny.net. did not return the expected TXT record [fqdn: teale.cloud., value: nahIllJustReplaceThat]: \n" ACME CA=https://acme-staging-v02.api.letsencrypt.org/directory acmeCA=https://acme-staging-v02.api.letsencrypt.org/directory domains=["www.teale.cloud"] providerName=myresolver.acme routerName=nginx@docker rule=Host(`www.teale.cloud`)

I can run the following on the host during the checks:

$ dig @kiki.bunny.net _acme-challenge.www.teale.cloud TXT

; <<>> DiG 9.18.19-1~deb12u1-Debian <<>> @kiki.bunny.net _acme-challenge.www.teale.cloud TXT
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35863
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_acme-challenge.www.teale.cloud. IN    TXT

;; ANSWER SECTION:
_acme-challenge.www.teale.cloud. 18000 IN CNAME teale.cloud.
_acme-challenge.www.teale.cloud. 3600 IN TXT    "nahIllJustReplaceThat"

;; Query time: 11 msec
;; SERVER: 91.200.176.1#53(kiki.bunny.net) (UDP)
;; WHEN: Fri Dec 15 23:04:59 AWST 2023
;; MSG SIZE  rcvd: 141

I've tried a number of variations on the TTL, polling interval, delay before checking, and propagation timeout, with no change.

Trying to challenge certificate for domain [www.teale.cloud] found in HostSNI rule ACME CA=https://acme-staging-v02.api.letsencrypt.org/directory acmeCA=https://acme-staging-v02.api.letsencrypt.org/directory providerName=myresolver.acme routerName=nginx@docker rule=Host(`www.teale.cloud`)
2023-12-15T15:04:43Z DBG github.com/traefik/traefik/v3/pkg/provider/acme/provider.go:849 > Looking for provided certificate(s) to validate ["www.teale.cloud"]... ACME CA=https://acme-staging-v02.api.letsencrypt.org/directory acmeCA=https://acme-staging-v02.api.letsencrypt.org/directory providerName=myresolver.acme routerName=nginx@docker rule=Host(`www.teale.cloud`)
2023-12-15T15:04:43Z DBG github.com/traefik/traefik/v3/pkg/provider/acme/provider.go:895 > Domains need ACME certificates generation for domains "www.teale.cloud". ACME CA=https://acme-staging-v02.api.letsencrypt.org/directory acmeCA=https://acme-staging-v02.api.letsencrypt.org/directory domains=["www.teale.cloud"] providerName=myresolver.acme routerName=nginx@docker rule=Host(`www.teale.cloud`)
2023-12-15T15:04:43Z DBG github.com/traefik/traefik/v3/pkg/provider/acme/provider.go:623 > Loading ACME certificates [www.teale.cloud]... ACME CA=https://acme-staging-v02.api.letsencrypt.org/directory acmeCA=https://acme-staging-v02.api.letsencrypt.org/directory providerName=myresolver.acme routerName=nginx@docker rule=Host(`www.teale.cloud`)
2023-12-15T15:04:43Z DBG github.com/traefik/traefik/v3/pkg/provider/acme/provider.go:254 > Building ACME client... providerName=myresolver.acme
2023-12-15T15:04:43Z DBG github.com/traefik/traefik/v3/pkg/provider/acme/provider.go:260 > https://acme-staging-v02.api.letsencrypt.org/directory providerName=myresolver.acme
2023-12-15T15:04:44Z DBG github.com/traefik/traefik/v3/pkg/provider/acme/provider.go:296 > Using DNS Challenge provider: bunny providerName=myresolver.acme
2023-12-15T15:04:44Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] [www.teale.cloud] acme: Obtaining bundled SAN certificate lib=lego
2023-12-15T15:04:44Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:188 > Service selected by WRR: 2b135a97ab7da542
2023-12-15T15:04:45Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] [www.teale.cloud] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/10064463224 lib=lego
2023-12-15T15:04:45Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] [www.teale.cloud] acme: Could not find solver for: tls-alpn-01 lib=lego
2023-12-15T15:04:45Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] [www.teale.cloud] acme: Could not find solver for: http-01 lib=lego
2023-12-15T15:04:45Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] [www.teale.cloud] acme: use dns-01 solver lib=lego
2023-12-15T15:04:45Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] [www.teale.cloud] acme: Preparing to solve DNS-01 lib=lego
[INFO] [www.teale.cloud] acme: Trying to solve DNS-01 lib=lego
2023-12-15T15:04:46Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] [www.teale.cloud] acme: Checking DNS record propagation using [91.200.176.1:53 185.85.196.65:53] lib=lego
2023-12-15T15:04:51Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:188 > Service selected by WRR: 72304802201234d5
2023-12-15T15:04:51Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] Wait for propagation [timeout: 5m0s, interval: 5s] lib=lego
[INFO] [www.teale.cloud] acme: Cleaning DNS-01 challenge lib=lego
2023-12-15T15:09:54Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] Deactivating auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/10064463224 lib=lego
2023-12-15T15:09:55Z ERR github.com/traefik/traefik/v3/pkg/provider/acme/provider.go:399 > Unable to obtain ACME certificate for domains error="unable to generate a certificate for the domains [www.teale.cloud]: error: one or more domains had a problem:\n[www.teale.cloud] propagation: time limit exceeded: last error: NS kiki.bunny.net. did not return the expected TXT record [fqdn: teale.cloud., value: stillNope]: \n" ACME CA=https://acme-staging-v02.api.letsencrypt.org/directory acmeCA=https://acme-staging-v02.api.letsencrypt.org/directory domains=["www.teale.cloud"] providerName=myresolver.acme routerName=nginx@docker rule=Host(`www.teale.cloud`)

Where should I be looking?

I would assume that LE tries to verify the token itself first, but can’t find it because you manipulate the DNS record for local queries.

Are you sure .resolvers is correct? Usually I would expect I just works with the defaults.

It didn't work with the defaults, but I can always try removing that line and attempting it again.

as I understand it, LE isnt yet involved, as lego - the acme library in use - doesnt attempt to talk to LE until it is convinced DNS propagation has succeeded.

Edit: Without the resolvers being explicitly set, it has defaulted to trying to use 127.0.0.11:53 which I am not certain will work.

With LE I meant the acme component in Traefik.

in that case, it should be able to verify it, shouldnt it? As only the A record is manipulated locally. I provided above the output of $dig @kiki.bunny.net _acme-challenge.www.teale.cloud TXT
from the host to demonstrate that it was being seen.

I saw another thread wherein someone suggested that having a DNS caching server could cause similar behavior to this, and adguard home does function this way. I'll try disabling this functionality and see if that makes any difference.

Try removing unnecessary stuff to see if it works. By the way, you can place http-to-https and certresolver global on entrypoint, see simple Traefik example (link).

I will just get rid of that for now.

Omitted from the file is the 20 or so other services currently on http, and I didn't want to redirect them all to https before it's actually working.

No luck with disabling DNS caching on adguard home, but I'll leave it off for now anyway.

Can’t you just disable AdGuard? At least to test.

With adguard disabled, it doesnt seem to have made any difference, although it has a new error at least - although this has only shown up the once so far. Im not sure its relevant.

2023-12-16T01:08:32Z DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:188 > Service selected by WRR: 72304802201234d5
2023-12-16T01:08:32Z DBG github.com/go-acme/lego/v4@v4.14.0/log/logger.go:48 > [INFO] retry due to: acme: error: 400 :: POST :: https://acme-staging-v02.api
.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: "mMpbWOlLL_D1wM-a_deWWrUtuwUCwi0Y5fChSYoJF8_
q8bD5Yxs" lib=lego

Lets Debug shows that teale.cloud lacks a CAA response (has SERVFAIL), but this isnt mentioned anywhere in Traefik's docs that I can see. Is that important?

It also shows the same for the _acme-challenge.teale.cloud/TXT as well.

There is only an A record for the apex domain, A records for a handful of cpanel-related domains that can probably be removed, a wildcard CNAME record (CNAME * teale.cloud), an MX record and a handful of TXT records. Should there be others?

I've made progress, and this so far appears to not be a fault or bug in either Traefik or lego, but the result of the DNS configuration of the zone and perhaps my providers nameservers.

I found discussions stating that you cannot have both a TXT record and a CNAME record for the same domain, and _acme-challenge.teale.cloud would match *.teale.cloud CNAME teale.cloud. I tried removing the CNAME record and lo and behold: it stopped timing out.

It then went to SERVFAIL when trying to get the CAA record, which was easy enough to work around - just add a CAA record. Following this, I was able to get a (staging) wildcard certificate. Ill next try adding adguard back into the mix to see if this can be replicated through there.