DNS Challenge Resolution Issue

We have an issue when trying to use Traefik for auto certificate issuing.

Error Message

unable to generate a certificate for the domains [portainer.corp.eastx.com]: error: one or more domains had a problem:
[app.sub.domain.com] [app.sub.domain.com] acme: error presenting token: cloudflare: could not find zone for domain \"app.sub.domain.com\": [fqdn=_acme-challenge.app.sub.domain.com.] could not find the start of authority for '_acme-challenge.app.sub.domain.com.': DNS call error: read udp 172.20.0.6:39658->1.1.1.1:53: i/o timeout [ns=1.1.1.1:53, question='com. IN  SOA']
DNS call error: read udp 172.20.0.6:60551->1.0.0.1:53: i/o timeout [ns=1.0.0.1:53, question='com. IN  SOA']

Compose for Reference

traefik:
  container_name: traefik
  image: proget.sub.domain.com/docker-hub/traefik:latest
  ports:
    - 80:80
    - 443:443
    - 8080:8080
  networks:
    - default
  restart: unless-stopped
  command:
    #- "--log.level=DEBUG"
    - "--api.insecure=true"
    - "--providers.docker=true"
    - "--providers.docker.exposedbydefault=false"
    - "--entrypoints.http.address=:80"
    - "--entryPoints.http.http.redirections.entryPoint.to=https"
    - "--entryPoints.http.http.redirections.entryPoint.scheme=https"
    - "--entrypoints.http.http.redirections.entrypoint.priority=10"
    - "--entrypoints.https.address=:443"
    - "--serversTransport.insecureSkipVerify=true"
    - "--certificatesresolvers.cloudflare.acme.dnschallenge=true"
    - "--certificatesresolvers.cloudflare.acme.dnschallenge.provider=cloudflare"
    - "--certificatesresolvers.cloudflare.acme.dnschallenge.resolvers=1.1.1.1:53,1.0.0.1:53"
    - "--certificatesresolvers.cloudflare.acme.storage=/etc/certs/acme.json"
    - "--certificatesresolvers.cloudflare.acme.email=${CLOUDFLARE_EMAIL}"
    - "--certificatesresolvers.cloudflare.acme.caServer=https://acme-v02.api.letsencrypt.org/directory"
    - "--certificatesresolvers.cloudflare.acme.keytype=EC256"
  environment:
    - CLOUDFLARE_DNS_API_TOKEN=${CLOUDFLARE_API_TOKEN}
    - HTTP_PROXY=http://proxy.sub2.domain2.co:3128
    - HTTPS_PROXY=http://proxy.sub2.domain2.co:3128
    - NO_PROXY=127.0.0.1,localhost,.sub.domain.com,.domain2.co,.amazonaws.com,169.254.169.254,${CORE_SUBNET}
  volumes:
    - ./traefik:/etc/traefik/dynamic/
    - ./certs:/etc/certs/
    - /var/run/docker.sock:/var/run/docker.sock:ro

The Problem

The server is sat behind a corporate proxy (squid) and outbound access is restricted but the correct domains/ IPs have been allowed for this process. The corporate proxy only supports TCP traffic - the error read udp 172.0.0.06:39658->1.1.1.1:53: i/o timeout suggests that the DNS lookup is failing over UDP which is expected as the proxy would not allow this.

Is there anyway to force the DNS lookup to happen over TCP?

Have you tried setting the dnschallenge.resolvers to internal DNS server and use a longer delayBeforeCheck?

When the internal DNS does not know the requested entity, it should automatically ask upstream, so that could work.

Both domains mentioned in the config above do have zones on internal DNS so that doesn't seem to work for other reasons.

Hello,

This message means that lego (the lib used by Traefik for ACME challenge) was not able to find SOA (Start Of Autority) records.
Lego checks the presence of SOA records recursively to find the zone to create TXT record:

  1. app.sub.domain.com.
  2. sub.domain.com.
  3. domain.com.
  4. com.

None of those elements has an SOA record, so lego cannot find the start of the authority.

The problem is inside your local network, it can be related to your corporate proxy, a firewall, a local DNS, etc.

To check the DNS call, you can use dig or drill, ex:

drill app.sub.domain.com. SOA
drill sub.domain.com. SOA
...

The rcode should be NOERROR and the Answer Section of the response should not be empty.

So the SOA records exist in Active Directory DNS - my understanding is that can't be changed. So drill command for SOA on each of these returns:

  1. app.sub.domain.com. > dc01.sub.domain.com
  2. sub.domain.com. > dc01.sub.domain.com
  3. domain.com. > harlee.ns.cloudflare.com. dns.cloudflare.com
  4. com. > a.gtld-servers.net. nstld.verisign-grs.com

I thought that using dnschallenge.resolvers=1.1.1.1:53,1.0.0.1:53 would negate the need to make any additional changes to internal DNS as it would only use these for the lookup?

Additionally when trying to assign a cert for app.domain.com - meaning the SOA would correctly end up with Cloudflare we still get the same read udp 172.20.0.6:60551->1.0.0.1:53: i/o timeout error.

If your problem is related to UDP you can try to set the env var LEGO_EXPERIMENTAL_DNS_TCP_ONLY to true.

Ok so the issue here is that we are blocking DNS traffic to external DNS servers - the squid proxy does not proxy the DNS traffic. Is it possible to have ACME within Traefik using DNS over HTTPS for the DNS challenge? I can't find anything in the ACME or Traefik docs.

Currently, DNS over HTTPS is not supported.

Ok, looks like new domain for internal apps and split zone R53 it is. Thanks for the assistance.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.