Has anything broke with Traefik v3 in terms of dnsChallenge with cloudflare?

Hey I've used traefik since version 2 via docker currently on version 3.2.3. My configuration has been so boring that I haven't changed it for months and months and months.

Out of the blue I'm now receiving errors during certificate renewal such as the following:

2024-12-17T00:53:36-06:00 INF Traefik version 3.2.3 built on 2024-12-16T10:31:50Z version=3.2.3
2024-12-17T00:53:36-06:00 INF
Stats collection is disabled.
Help us improve Traefik by turning this feature on :)
More details on: https://doc.traefik.io/traefik/contributing/data-collection/

2024-12-17T00:53:36-06:00 INF Starting provider aggregator *aggregator.ProviderAggregator
2024-12-17T00:53:36-06:00 INF Starting provider *file.Provider
2024-12-17T00:53:36-06:00 INF Starting provider *traefik.Provider
2024-12-17T00:53:36-06:00 INF Starting provider *docker.Provider
2024-12-17T00:53:36-06:00 INF Starting provider *acme.ChallengeTLSALPN
2024-12-17T00:53:36-06:00 INF Starting provider *acme.Provider
2024-12-17T00:53:36-06:00 INF Testing certificate renew... acmeCA=https://acme-v02.api.letsencrypt.org/directory providerName=le.acme
2024-12-17T00:53:37-06:00 INF Renewing certificate from LE : {Main:whoami.domain.com SANs:[]} acmeCA=https://acme-v02.api.letsencrypt.org/directory providerName=le.acme
2024-12-17T00:53:44-06:00 ERR Error renewing certificate from LE: {whoami.domain.com []} error="error: one or more domains had a problem:\n[whoami.domain.com] [whoami.domain.com] acme: error presenting token: cloudflare: could not find zone for domain \"whoami.domain.com\": [fqdn=_acme-challenge.whoami.domain.com.] unexpected response for '_acme-challenge.whoami.domain.com.' [question='_acme-challenge.whoami.domain.com. IN  SOA', code=SERVFAIL]\n" acmeCA=https://acme-v02.api.letsencrypt.org/directory providerName=le.acme

Sooo -- here is my docker compose file for traefik:

x-healthcheck-parameters: &healthcheck-parameters
  interval: "30s"
  timeout: "3s"
  start_period: "5s"
  retries: 3

x-logging: &log-parameters
  logging:
    driver: "json-file"
    options:
      max-size: "200k"
      max-file: "10"

secrets:
  CF_DNS_API_TOKEN_secret:
    file: /etc/docker/compose/CF_DNS_API_TOKEN.secret

  traefik:
    build:
      context: .
      dockerfile: Dockerfile
    pull_policy: build
    container_name: traefik
    hostname: traefik
    restart: unless-stopped
    secrets:
      - CF_DNS_API_TOKEN_secret
    networks:
      - docker-net
      - docker-api
    ports:
      - 80:80
      - 443:443
      - 8080:8080
      - 8082:8082
      - 3000:3000
    healthcheck:
      test: traefik healthcheck --ping
      <<: *healthcheck-parameters
    <<: *log-parameters
    labels:
      - "com.centurylinklabs.watchtower.enable=false"
      - "traefik.enable=true"
      - "traefik.docker.network=docker-net"
      - "traefik.http.routers.dashboard.rule=Host(`traefik.domain.com`) && (PathPrefix(`/api`) || PathPrefix(`/dashboard`))"
      - "traefik.http.routers.dashboard.tls=true"
      - "traefik.http.routers.dashboard.tls.options=modern@file"
      - "traefik.http.routers.dashboard.tls.certresolver=le"
      - "traefik.http.routers.dashboard.tls.domains[0].main=traefik.domain.com"
      - "traefik.http.routers.dashboard.tls.domains[0].sans=traefik.domain.com"
      - "traefik.http.routers.dashboard.service=api@internal"
      - "traefik.http.routers.dashboard.middlewares=auth"
      - "traefik.http.middlewares.auth.basicauth.usersfile=/etc/traefik/basicauth.pass"
      - "traefik.http.routers.dashboard.entrypoints=web,websecure"
    environment:
      - TZ
      - CLOUDFLARE_EMAIL="<User>@gmail.com"
      - CLOUDFLARE_DNS_API_TOKEN="<token>l"
    volumes:
      - /etc/traefik:/etc/traefik:ro
      - /etc/letsencrypt/certificates:/etc/letsencrypt
      - /etc/ssl/self-signed-certs/ubuntumc.domain.com/client:/etc/ssl/self-signed-certs/ubuntumc.domain.com/client:ro
      - /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro

and the portion of my static configuration file:

certificatesResolvers:
  le:
    acme:
      email: <name>@gmail.com
      #Staging Server
#      caServer: https://acme-staging-v02.api.letsencrypt.org/directory
      #Production Server
      caServer: https://acme-v02.api.letsencrypt.org/directory
      storage: /etc/letsencrypt/acme.json
      keyType: 'EC384'
      preferredChain: 'ISRG Root X1'
      dnsChallenge:
        provider: cloudflare
        delayBeforeCheck: 0
        resolvers:
          - "1.1.1.1:53"
          - "9.9.9.9:53"

I went ahead and tried to renew the CF_DNS_API_TOKEN - Zone/Zone/Read, Zone/DNS/EDIT -- but no difference. I'm stumped not sure what's going on at this point..

Maybe check this discussion.

go-acme is the library used for LetsEncrypt, I think it can can also be used on command line to test directly.

Well the problem wasn't was traefik buy DNS --argh.

But I'll give some hints to anyone that maybe runs into a similar problems

The source of my problem was that all port 53 requests (DNS request) were being rerouted to my pfSense box, so essentially I was sabatoging myself.

In order to troubleshoot dns at a minimum you will need the tools dig/drill.
You can troubleshoot like:

drill _acme-challenge.<domain>.com. TXT @9.9.9.9
drill _acme-challenge.<domain>.com. SOA @1.1.1.1
drill <domain>.com. SOA @1.1.1.1

The ip address on the command is the external dns resolver, and I would recommend this the exact same resolver which traefik was configured with in the static configuration:


certificatesResolvers:
  letsencrypt:
    acme:
      email: <mail>@gmail.com
##     Staging Server
#      caServer: https://acme-staging-v02.api.letsencrypt.org/directory
##      Production Server
      caServer: https://acme-v02.api.letsencrypt.org/directory
      storage: /etc/letsencrypt/acme.json
      keyType: 'EC384'
      preferredChain: 'ISRG Root X1'
      dnsChallenge:
        provider: cloudflare
        delayBeforeCheck: 0
        resolvers:
          - "1.1.1.1:53"
          - "9.9.9.9:53"

You need to get a result from your external dns provider similar to:

$ drill main.<domain>.com @1.1.1.1 SOA
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 21647
;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; QUESTION SECTION:
;; main.<domain>.com.	IN	SOA

;; ANSWER SECTION:

;; AUTHORITY SECTION:
<domain>.com.	1800	IN	SOA	connie.ns.cloudflare.com. dns.cloudflare.com. 2359932256 10000 2400 604800 1800

;; ADDITIONAL SECTION:

;; Query time: 35 msec
;; SERVER: 1.1.1.1
;; WHEN: Tue Dec 17 13:28:43 2024
;; MSG SIZE  rcvd: 96

If there are firewalls blocking DNS (port 53/853) or NAT redirections which are redirecting the port 53/853 requests to a local source -- then you need to add exceptions to this rule (rules) to allow the machine running traefik to be able to contact your DNS provider (such as cloudflare) to be able to perform a DNS challenge appropriately. DNS challenge requires a DNS record to be temporarily written to the DNS records to prove ownership -- and hence the DNS provider must be reachable from the machine requesting acme TLS certificates.