Issue with acme-challenge using Cloudflare DNS

hello everyone,

since my new workplace is using it and it seems a good fit for my setup i wanted to look into traefik.
after reading multiple guides and watching hours of youtube videos i came to the following configuration:

docker-compose.yaml

this script is used in a portainer stack, if that makes any difference

version: "3.5"

services:

  traefik:
    image: "traefik"
    container_name: "traefik"
    restart: unless-stopped
    networks: 
      - web
    ports:
      - 80:80
      - 443:443
      - 8080:8080 
    environment:
      - CF_DNS_API_TOKEN=${CF_API_TOKEN}
      # If you choose to use an API Key instead of a Token, specify your email as well
      #- CF_API_EMAIL=${EMAIL}
      #- CF_API_KEY=${CF_API_KEY}
    command:
      #- "--log.level=DEBUG"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entryPoints.console.address=:8080"
      - "--entrypoints.websecure.address=:443" 
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - /etc/localtime:/etc/localtime:ro
      - /home/docker/deployments/traefik/traefik.yml:/traefik.yml:ro
      - /home/docker/deployments/traefik/acme.json:/acme.json
      - /home/docker/deployments/traefik/config.yml:/config.yml:ro
      - /home/ubuntu/docker/traefik/logs:/var/log/traefikF"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.entrypoints=http"
      - "traefik.http.routers.traefik.rule=Host(`traefik.${DOMAIN}`)"
      - "traefik.http.middlewares.traefik-auth.basicauth.users=${TRAEFIK_USER}:${TRAEFIK_PASSWORD_HASH}"
      - "traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https"
      - "traefik.http.middlewares.sslheader.headers.customrequestheaders.X-Forwarded-Proto=https"
      - "traefik.http.routers.traefik.middlewares=traefik-https-redirect"
      - "traefik.http.routers.traefik-secure.entrypoints=https"
      - "traefik.http.routers.traefik-secure.rule=Host(`traefik.${DOMAIN}`)"
      - "traefik.http.routers.traefik-secure.middlewares=traefik-auth"
      - "traefik.http.routers.traefik-secure.tls=true"
      - "traefik.http.routers.traefik-secure.tls.certresolver=cloudflare"
      - "traefik.http.routers.traefik-secure.tls.domains[0].main=${DOMAIN}"
      - "traefik.http.routers.traefik-secure.tls.domains[0].sans=*.${DOMAIN}"
      - "traefik.http.routers.traefik-secure.service=api@internal"

networks:
  web:
    name: web
    external: true

the .env file for the variables: (i included both CF_API_TOKEN and CF_API_KEY for faster testing)

DOMAIN=domain.com
EMAIL=my@email.com
TRAEFIK_USER=admin
TRAEFIK_PASSWORD_HASH=*the hash generated though the apache utils*
CF_API_TOKEN=*the api token for zone read and dns edit*
CF_API_KEY=*the global api key*

traefik.yml

api:
  dashboard: true
  debug: true
entryPoints:
  http:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
  https:
    address: ":443"
serversTransport:
  insecureSkipVerify: true
providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
  file:
    filename: /config.yml
certificatesResolvers:
  cloudflare:
    acme:
      email: my@email.com
      storage: acme.json
      dnsChallenge:
        provider: cloudflare
        #disablePropagationCheck: true # uncomment this if you have issues pulling certificates through cloudflare, By setting this flag to true disables the need to wait for the propagation of the TXT record to all authoritative name servers.
        resolvers:
          - "1.1.1.1:53"
          - "1.0.0.1:53"

acme.json & config.yml

both files are empty, i set chmod 600 on acme.json as required

the problem(s)

this is the output when starting the container with this configuration:

time="2024-03-24T18:11:29Z" level=info msg="Configuration loaded from file: /traefik.yml"
time="2024-03-24T18:15:32Z" level=error msg="Unable to obtain ACME certificate for domains \"domain.com,*.domain.com\"" ACME CA="https://acme-v02.api.letsencrypt.org/directory" providerName=cloudflare.acme routerName=traefik-secure@docker error="unable to generate a certificate for the domains [domain.com *.domain.com]: error: one or more domains had a problem:\n[*.domain.com] [*.domain.com] acme: error presenting token: cloudflare: could not find zone for domain \"domain.com\" (_acme-challenge.domain.com.): could not find the start of authority for _acme-challenge.domain.com.: read udp 172.25.0.2:59444->1.0.0.1:53: i/o timeout\n[domain.com] [domain.com] acme: error presenting token: cloudflare: could not find zone for domain \"domain.com\" (_acme-challenge.domain.com.): could not find the start of authority for _acme-challenge.domain.com.: read udp 172.25.0.2:57661->1.0.0.1:53: i/o timeout\n" rule="Host(`traefik.domain.com`)"

i could "resolve" the zone issue by using the traefik:v2.6.1 image, but the "could not find the start of authority" error persists:

time="2024-03-24T18:33:35Z" level=info msg="Configuration loaded from file: /traefik.yml"
time="2024-03-24T18:37:38Z" level=error msg="Unable to obtain ACME certificate for domains \"domain.com,*.domain.com\" : unable to generate a certificate for the domains [domain.com *.domain.com]: error: one or more domains had a problem:\n[*.domain.com] [*.domain.com] acme: error presenting token: cloudflare: could not find the start of authority for _acme-challenge.domain.com.: read udp 172.25.0.2:36668->1.0.0.1:53: i/o timeout\n[domain.com] [domain.com] acme: error presenting token: cloudflare: could not find the start of authority for _acme-challenge.domain.com.: read udp 172.25.0.2:53050->1.0.0.1:53: i/o timeout\n" providerName=cloudflare.acme

i have tried playing around with different dns records (currently 2 A records for * and domain.com which point to my current IP Address), enabling/disabling proxy. enabling/disables "Always use HTTPS".
nothing should block outgoing traffic. i opened ports 80 and 443 to my docker server running the traffic container.

i have wasted way to much time on this. does anyone have any idea what could be wrong or where i should investigate?

any help is very much appreciated :wink:

That belongs into the top 10 Q&A:

You can not have Traefik static config in traefik.yml and command:, decide for one (doc).

i commented the whole commands block out. since everything was already set in the traefik.yml i didnt change anything there.

i have tried with the latest version and v2.6.1, with the API token and API key + email, the results were unchanged to the previous compose/config files.
i also removed the network to see if thats changing anything, but again, same result.

the cloudflare API token is still set as environment variable, but i assume this is correct.

/edit:
i just tried calling "dig @1.1.1.1 SOA google.com" and i also get a timeout. is this a issue with my network after all? without the @address i get a result

Running

dig @1.1.1.1 SOA google.com

works for me on host and inside container after installing dnsutils.

So it might be a network issue.

yeah i found the issue

my router is using my pihole as dns server, and my pihole uses either cloudflare OR google and was interfering with port 53.

i changed the settings and now it starts up without any error and on the latest version :+1: