ACME DNS Challenge issues

Hi all,

I currently have the setup OPNsense redirecting all DNS queries over port 53 to AdGuard which has Unbound DNS (on OPNsense) as the DNS upstream, and ports 80 & 443 forwarded to my VM running Docker.

I previously had an internal domain that I manually created SSL certificates for, and issued them but I am wanting to use my external domain and have Traefik issue the SSL certificates. I previously had this setup before using OPNsense, and had no issues.

Now when I am trying to pull down the certificate, I am getting the following error

acme: cleaning up failed: cloudflare: could not find the start of authority for _acme-challenge.ragenetwork.me.: read udp 10.80.0.2:56467->1.0.0.1:53: i/o timeout
acme: error presenting token: cloudflare: could not find the start of authority for _acme-challenge.ragenetwork.me.: read udp 10.80.0.2:60737->1.0.0.1:53: i/o timeout

Inside the acme.json file I can see the ACME account etc, but under Certificates it just says "null".

If I try to visit the traefik dashboard, I can login and the certificate in the browser says its a Cloudflare certificate, but if I comment out the acme-staging command, I then get Invalid SSL errors when trying to visit other subdomains.

I have tested the DNS inside the docker container and resolves with no issues, as well as DNS on the Ubuntu server. I do have firewall rules that redirects DNS queries to my AdGuard instance running on OPNsense, and also firewall rules that block any DNS queries that try to circumvent the above. I have tried disabling the block rule, which still gives me the same error. I can see the DNS queries show up in the AdGuard log, so I am kinda stumped at what I am missing here.

Any help would be greatly appreciated!

Docker-Compose for Traefik

# Traefik - Reverse Proxy
  traefik:
    <<: *common-keys-core
    image: traefik:2.7
    container_name: traefik
    hostname: traefik
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host
    networks:
      traefik:
        ipv4_address: 10.80.0.2
    # dns:
    #   - "10.20.0.1" #internal DNS ip
    # dns_search: ragenetwork.lan #namespace used in internal DNS
    volumes:
      - $DOCKERDIR/traefik/rules:/rules
      - $DOCKERDIR/traefik/acme/acme.json:/acme.json
      - /home/docker/logs:/logs
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /etc/localtime:/etc/localtime:ro
    environment:
      - TZ=$TZ
      - CF_API_EMAIL_FILE=/run/secrets/cf_email
      - CF_API_KEY_FILE=/run/secrets/cf_api_key
      - HTPASSWD_FILE=/run/secrets/htpasswd
      - DOMAINNAME
    secrets:
      - cf_email
      - cf_api_key
      - htpasswd
    command: # CLI arguments
      - --global.checkNewVersion=true
      - --global.sendAnonymousUsage=false
      - --entryPoints.http.address=:80
      - --entryPoints.https.address=:443
      # Allow these IPs to set the X-Forwarded-* headers - Cloudflare IPs: https://www.cloudflare.com/ips/
      - --entrypoints.https.forwardedHeaders.trustedIPs=$CF_IPS,$LOCAL_IPS
      - --entryPoints.traefik.address=:8080
      - --api=true
      # - --api.insecure=true
      - --api.dashboard=true
      # - --serversTransport.insecureSkipVerify=true
      - --log=true
      - --log.filePath=/logs/traefik.log
      - --log.level=DEBUG # (Default: error) DEBUG, INFO, WARN, ERROR, FATAL, PANIC
      - --accessLog=true
      - --accessLog.filePath=/logs/access.log
      - --accessLog.bufferingSize=100 # Configuring a buffer of 100 lines
      - --accessLog.filters.statusCodes=204-299,400-499,500-599
      - --providers.docker=true
      - --providers.docker.endpoint=unix:///var/run/docker.sock
      - --providers.docker.exposedByDefault=false
      - --entrypoints.https.http.tls.options=tls-opts@file
      # Add dns-cloudflare as default certresolver for all services. Also enables TLS and no need to specify on individual services
      - --entrypoints.https.http.tls.certresolver=dns-cloudflare
      - --entrypoints.https.http.tls.domains[0].main=$DOMAINNAME
      - --entrypoints.https.http.tls.domains[0].sans=*.$DOMAINNAME
      - --providers.docker.network=traefik
      - --providers.docker.swarmMode=false
      - --providers.file.directory=/rules 
      # - --providers.file.filename=/path/to/file 
      - --providers.file.watch=true # Only works on top level files in the rules folder
      - --certificatesResolvers.dns-cloudflare.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory # LetsEncrypt Staging Server - uncomment when testing
      - --certificatesResolvers.dns-cloudflare.acme.email=$CLOUDFLARE_EMAIL
      - --certificatesResolvers.dns-cloudflare.acme.storage=/acme.json
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.provider=cloudflare
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.delayBeforeCheck=100 # To delay DNS check and reduce LE hitrate
    labels:
      #- "autoheal=true"
      - "traefik.enable=true"
      # HTTP-to-HTTPS Redirect
      - "traefik.http.routers.http-catchall.entrypoints=http"
      - "traefik.http.routers.http-catchall.rule=HostRegexp(`{host:.+}`)"
      - "traefik.http.routers.http-catchall.middlewares=redirect-to-https"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
      # HTTP Routers
      - "traefik.http.routers.traefik-rtr.entrypoints=https"
      - "traefik.http.routers.traefik-rtr.rule=Host(`traefik.$DOMAINNAME`)"
      - "traefik.http.routers.traefik-rtr.tls=true" # Some people had 404s without this
      - "traefik.http.routers.traefik-rtr.tls.certresolver=dns-cloudflare" 
      - "traefik.http.routers.traefik-rtr.tls.domains[0].main=$DOMAINNAME"
      - "traefik.http.routers.traefik-rtr.tls.domains[0].sans=*.$DOMAINNAME"
      ## Services - API
      - "traefik.http.routers.traefik-rtr.service=api@internal"
      ## Middlewares
      # - "traefik.http.routers.traefik-rtr.middlewares=chain-no-auth@file" # For No Authentication
      - "traefik.http.routers.traefik-rtr.middlewares=chain-basic-auth@file" # For Basic HTTP Authentication
      #- "traefik.http.routers.traefik-rtr.middlewares=chain-authelia@file" # For Authelia Authentication

Anyone have any ideas? I am pretty stumped.

Try using a current Traefik version.

I was using the latest, I only tried 2.7 as per a guide, just to see if it solved anything. On the latest version, I got the same error

Do you have SSL turned on in cloudflare? If so traefik.$DOMAINNAME will show the cloudflare cert, not yours.

You should also try removing the ssl stuff from your router. It is already presented in the entrypoint, you shouldn't need it both places. The dashboard should really only need:

    labels:
      - traefik.enable=true
      - traefik.http.routers.dashboard.entrypoints=websecure
      - traefik.http.routers.dashboard.rule=Host(`traefik.$DOMAINNAME`)
      - traefik.http.routers.dashboard.service=api@internal

Those things don't explain why you can't get a LE cert though. Can you try with the forced redirect through ad-guard off?

I have tried with the forced redirection rule off, and I still get the same error. I have done a fresh install of ubuntu and docker, which still resulted in the same error. I have even tried creating a macvlan network for the traefik container, which still resulted in the same error :frowning:

This error is related to a network issue: traefik (lego) tries to find the zone of your domain, and then sends SOA call to a DNS.

So you have to check your network (firewall, local DNS, etc...), you can also use the resolvers option

Ah my bad, I forgot to update my issue >_<

I managed to solve that issue by commenting out the

- --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53

Which then lead me to the issue I am currently having of getting an i/o timeout when trying to verify the DNS challenge.

time limit exceeded: last error: read udp 10.20.0.10:45018->108.162.192.227:53: i/o timeout\n"

Again, I have disabled the firewall rule to block other DNS queries, but what is strange is that I can perform lookups from within the container with no issues. I can also see the DNS queries for the challenge leaving my firewall and coming back, but I keep getting the above error

Did you try a lookup with the specific DNS server from within the container?

dig @108.162.192.227 traefik.io

I tried to do the dig within the container but I got

/ # dig @108.162.192.227 traefik.io
/bin/sh: dig: not found

But I tried to do the dig command on my docker host and got the following

dig @108.162.192.227 traefik.io

; <<>> DiG 9.18.1-1ubuntu1.3-Ubuntu <<>> @108.162.192.227 traefik.io
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

If it doesn't work from the host, then it will probably not work from within the container, either.

Can you ping the server from your host?

Yeah, I can ping it with no issues

ping 108.162.192.227
PING 108.162.192.227 (108.162.192.227) 56(84) bytes of data.
64 bytes from 108.162.192.227: icmp_seq=1 ttl=59 time=1.92 ms
64 bytes from 108.162.192.227: icmp_seq=2 ttl=59 time=2.93 ms
64 bytes from 108.162.192.227: icmp_seq=3 ttl=59 time=1.96 ms
64 bytes from 108.162.192.227: icmp_seq=4 ttl=59 time=1.77 ms
^C
--- 108.162.192.227 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 1.772/2.143/2.928/0.458 ms

I tried to run the same dig command from my Mac and got the following

dig @108.162.192.227 traefik.io
;; reply from unexpected source: 10.30.0.1#53, expected 108.162.192.227#53
;; reply from unexpected source: 10.30.0.1#53, expected 108.162.192.227#53
;; reply from unexpected source: 10.30.0.1#53, expected 108.162.192.227#53

; <<>> DiG 9.10.6 <<>> @108.162.192.227 traefik.io
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

So I am assuming it must be the DNS redirection rules on OPNsense that is causing this?

I have disabled the block rule for other DNS queries, but that doesnt seem to help :frowning:

Edit: I can also ping the address from within the container

Then you have a network or firewall issue, the command works fine for me on Debian.

Ok, now I am stumped for reals. I have disabled all DNS forwarding and blocking firewall rules but I still cannot get this working. I can perform the above dig command, I can see the DNS query and the response, and I can see the acme txt record on my Cloudflare dashboard. I even get a response if I do

dig @1.1.1.1 _acme-challenge.domain.com

But now I am getting the following error within traefik

time limit exceeded: last error: NS chance.ns.cloudflare.com. returned REFUSED for ragenetwork.me.\n[ragenetwork.me] time limit exceeded: last error: NS chance.ns.cloudflare.com. returned REFUSED for _acme-challenge.ragenetwork.me.\n" rule="Host(`traefik.ragenetwork.me`)" ACME CA="https://acme-v02.api.letsencrypt.org/directory" 

Hey! I get this exact problem too. Im also behind OPNsense and also tried turning of the forwarding of DNS to AdGuard plugin but still the same problem. I figured the problem is in OPNsense because the certificates were issued just fine once i connected to the netwrok in front of OPNsense.
Also it could be something in traefik itself too because i can ping 1.1.1.1 from other containers while traefik can ping 1.1.1.1 only if it is connected in front of OPNsense - what is weird is that traceroute command does work in both cases.
works anyhow
docker exec traefik sh -c "traceroute 1.1.1.1"
docker exec qbittorrent sh -c "traceroute 1.1.1.1"

works anyhow
docker exec qbittorrent sh -c "ping 1.1.1.1"

works only in front of OPNsense
docker exec traefik sh -c "ping 1.1.1.1"

so the issue is somewhere in between traefik and OPNsense, i came here in hopes of finding some help

time limit exceeded: last error: read udp 172.20.0.15:57821->108.162.192.101:53: i/o timeout\n" providerName=myresolver.acme ACME CA="https://acme

Also, before running dns challenge i was testing my setup with http challenge and tls and open ports and that actually worked just fine, this issue became only once i introduced the dns-challenge and im resorting to that because im not willing to expose my services but still want to have valid cartificates.

Any help for this traefik noob is appreciated!!

Dude, I am still in the same boat....I have yet to find anyone who can help. I redirected DNS queries through to Unbound on OPNsense, and I can ping 1.1.1.1 from within traefik, and my Ubuntu server but I still cant get the SSL certificate to pull for my local domains. At the moment, I am just running without SSL certificates for my local domains, which is annoying me.

Sorry I couldnt be of much assistance.

To find the "zone" SOA calls are performed.
So you have to check if SOA calls work.

 dig @108.162.192.227 TXT _acme-challenge.ragenetwork.me

; <<>> DiG 9.18.1-1ubuntu1.3-Ubuntu <<>> @108.162.192.227 TXT _acme-challenge.ragenetwork.me
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9650
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;_acme-challenge.ragenetwork.me.        IN      TXT

;; AUTHORITY SECTION:
ragenetwork.me.         3461    IN      SOA     chance.ns.cloudflare.com. dns.cloudflare.com. 2304181108 10000 2400 604800 3600

;; Query time: 0 msec
;; SERVER: 108.162.192.227#53(108.162.192.227) (UDP)
;; WHEN: Mon Mar 13 16:24:22 AWST 2023
;; MSG SIZE  rcvd: 123

This is what I am getting, not sure if its correct or not

The DNS calls must be SOA calls:

dig @108.162.192.227 SOA _acme-challenge.ragenetwork.me
 dig @108.162.192.227 SOA _acme-challenge.ragenetwork.me

; <<>> DiG 9.18.1-1ubuntu1.3-Ubuntu <<>> @108.162.192.227 SOA _acme-challenge.ragenetwork.me
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56133
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;_acme-challenge.ragenetwork.me.        IN      SOA

;; AUTHORITY SECTION:
ragenetwork.me.         1936    IN      SOA     chance.ns.cloudflare.com. dns.cloudflare.com. 2304181108 10000 2400 604800 3600

;; Query time: 0 msec
;; SERVER: 108.162.192.227#53(108.162.192.227) (UDP)
;; WHEN: Mon Mar 13 16:49:47 AWST 2023
;; MSG SIZE  rcvd: 123

Seems to be pretty much the same output?

The answer section is empty, then try:

dig @108.162.192.227 SOA ragenetwork.me

It's a recursive algo that expects a non-empty answer.