Help needed over setting up DNS-01 challenge with self hosted BIND9 server and Step CA ACME sever using RFC2136

I have a test environment using Docker, in which I am utilizing Traefik as a reverse proxy to manage network traffic within my intranet. I am using step-ca as my ACME server. Since this a test setup I'm just importing root-crt generated by step-ca to all my devices and trusting it, which will give that sweet green lock icon. I'm facing issues with DNS challenge and wildcard cert generation, which I have detailed below

My Traefik docker-compose.yaml

---
services:
  traefik:
    image: docker.io/library/traefik:latest
    container_name: traefik
    restart: unless-stopped
    networks:
      proxy:
        ipv4_address: 172.18.0.10
    ports:
      - 80:80
      - 443:443
    environment:
      # root cert the traefik should trust for acme to work
      - LEGO_CA_CERTIFICATES=/usr/local/share/ca-certificates/root_ca.crt
      # Env variables as per documentation
      - RFC2136_TSIG_KEY=keyname.
      - RFC2136_TSIG_SECRET= < Key generated defined by certbot documentation>
      - RFC2136_TSIG_ALGORITHM=hmac-sha512.
      - RFC2136_NAMESERVER=192.168.1.50
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./data/config/traefik.yml:/traefik.yml:ro
      - ./data/config/config.yml:/config.yml:ro
      - ./data/config/acme.json:/acme.json
      - ./data/logs:/var/log/traefik
      # root certs from step-ca docker container bind
      - ./data/step/certs:/usr/local/share/ca-certificates
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.entrypoints=http"
      - "traefik.http.routers.traefik.rule=Host(`dash.silverdev.fun`)"
      - "traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https"
      - "traefik.http.middlewares.sslheader.headers.customrequestheaders.X-Forwarded-Proto=https"
      - "traefik.http.routers.traefik.middlewares=traefik-https-redirect"
      - "traefik.http.routers.traefik-secure.entrypoints=https"
      - "traefik.http.routers.traefik-secure.rule=Host(`dash.silverdev.fun`)"
      - "traefik.http.routers.traefik-secure.tls=true"
      - "traefik.http.routers.traefik-secure.tls.certresolver=dnsresolver"
      - "traefik.http.routers.traefik-secure.tls.domains[0].main=silverdev.fun"
      - "traefik.http.routers.traefik-secure.tls.domains[0].sans=*.silverdev.fun"
      - "traefik.http.routers.traefik-secure.service=api@internal"


networks:
  proxy:
    external: true
    name: proxy

And my traefik.yaml

api:
  dashboard: true
  debug: true
entryPoints:
  http:
    address: ":80"
  https:
    address: ":443"

serversTransport:
  insecureSkipVerify: true
  
providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
  file:
    filename: /config.yml

certificatesResolvers:
  myresolver:
    acme:
      email: admin@silver1618.fun
      storage: acme.json
# step-ca and traefik are in the same network
      caServer: https://step-ca:9000/acme/acme/directory
      httpChallenge:
        entryPoint: http

  dnsresolver:
    acme:
      email: admin@silver1618.fun
      storage: acme.json
      caServer: https://step-ca:9000/acme/acme/directory
      dnsChallenge:
        provider: rfc2136
        disablePropagationCheck: true
        resolvers:
          - "192.168.1.50:53"    # BIND9 selfhosted DNS server, will resolve *.silverdev.fun
log:
  level: "DEBUG"
  filePath: "/var/log/traefik/traefik.log"
accessLog:
  filePath: "/var/log/traefik/access.log"

While I am able to obtain certificates for all individual *.silverdev.fun internal domains using the HTTP-01 challenge, my goal was to secure a wildcard certificate using the DNS challenge. The most relevant solution I found in the documentation was the RCF2136 provider. Consequently, I set up a BIND9 server and correctly configured all the records as follows:

named.conf

cl home {
    192.168.1.0/24;
    172.18.0.0/24;
};

options {
    version "not currently available";
    forwarders {
        1.1.1.1;
        1.0.0.1;
    };
    allow-query { home; };
};

key "keyname." {
        algorithm hmac-sha512;
        secret "key as generated by tsig-keygen -a hmac-sha512 keyname.";
};

zone "silverdev.fun." IN {
  type master;
  file "/etc/bind/silver-dev.zone";
  update-policy {
    grant keyname. zonesub any;
  };
};

and /etc/bind/silver-dev.zone

$ORIGIN .
$TTL 120        ; 2 minutes
silverdev.fun           IN SOA  ns.silverdev.fun. admin.silverdev.fun. (
                                2024040419 ; serial
                                43200      ; refresh (12 hours)
                                900        ; retry (15 minutes)
                                1814400    ; expire (3 weeks)
                                7200       ; minimum (2 hours)
                                )
                        NS      ns.silverdev.fun.
                        A       192.168.1.19
$ORIGIN silverdev.fun.
*                       CNAME   silverdev.fun.
ns                      A       192.168.1.19

With this configuration, I can successfully obtain a wildcard certificate when I employ certbot in the following manner:

docker run -it --rm --name certbot -e REQUESTS_CA_BUNDLE=/home/root.crt -p 80:80 --network="proxy" \
            -v "./data/etc:/etc/letsencrypt" \
            -v "./data/var:/var/lib/letsencrypt" \
            -v "./data/secrets:/home" \
            -v "./data/certs/root_ca.crt:/home/root.crt" \
            certbot/dns-rfc2136 certonly --dns-rfc2136 --dns-rfc2136-credentials /home/dns.ini \
            --agree-tos --email admin@silverdev.fun \
            --server https://step-ca:9000/acme/acme/directory -d "silverdev.fun" -d "*.silverdev.fun"

However, when I execute the Traefik container, I observe in the bind9 logs that Traefik is generating _acme challenge CNAME records. It doesn't succeed in obtaining the wildcard certificate, but it does manage to secure the certificate for the base domain silverdev.fun. I need help regarding this

Bind9 logs
bind9    | 05-Apr-2024 12:33:09.127 client @0x7f6f8c1a51a8 192.168.1.19#33150/key keyname: updating zone 'silverdev.fun/IN': deleting rrset at 'silverdev.fun' TXT
bind9    | 05-Apr-2024 12:33:09.127 client @0x7f6f8c1a51a8 192.168.1.19#33150/key keyname: updating zone 'silverdev.fun/IN': adding an RR at 'silverdev.fun' TXT "RQ3qmr5eNGhArjvNHNZLD8YQWMpNv08JMP25q3KaBzc"
bind9    | 05-Apr-2024 12:41:02.534 client @0x7f6f8c1a51a8 192.168.1.19#57644/key keyname: updating zone 'silverdev.fun/IN': deleting an RR at silverdev.fun TXT
bind9    | 05-Apr-2024 12:41:02.574 client @0x7f6f8c1a51a8 192.168.1.19#58325/key keyname: updating zone 'silverdev.fun/IN': deleting rrset at 'silverdev.fun' TXT
bind9    | 05-Apr-2024 12:41:02.574 client @0x7f6f8c1a51a8 192.168.1.19#58325/key keyname: updating zone 'silverdev.fun/IN': adding an RR at 'silverdev.fun' TXT "VISe2Q1m1v0FGkRWsORIOw-N41jI884NNXPgtL6Bg7Q"

traefik logs
time="2024-04-05T12:33:06+05:30" level=debug msg="https://step-ca:9000/acme/acme/directory" providerName=dnsresolver.acme
time="2024-04-05T12:33:06+05:30" level=debug msg="Using DNS Challenge provider: rfc2136" providerName=dnsresolver.acme
time="2024-04-05T12:33:06+05:30" level=debug msg="legolog: [INFO] [silverdev.fun, *.silverdev.fun] acme: Obtaining bundled SAN certificate"
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] [silverdev.fun] AuthURL: https://step-ca:9000/acme/acme/authz/hYVAdEI2UYwpTA0ISbdfy99tsj079aDI"
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] [*.silverdev.fun] AuthURL: https://step-ca:9000/acme/acme/authz/zoraVKcehBHDVP1yX210YMoRdmtzA4Sg"
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] [silverdev.fun] acme: Could not find solver for: http-01"
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] [silverdev.fun] acme: use dns-01 solver"
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] [*.silverdev.fun] acme: use dns-01 solver"
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] [silverdev.fun] acme: Preparing to solve DNS-01"
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.silverdev.fun.\": \"silverdev.fun.\""
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] [silverdev.fun] acme: Trying to solve DNS-01"
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.silverdev.fun.\": \"silverdev.fun.\""
time="2024-04-05T12:33:09+05:30" level=debug msg="legolog: [INFO] [silverdev.fun] acme: Checking DNS record propagation using [192.168.1.50:53]"
time="2024-04-05T12:33:11+05:30" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 1m0s, interval: 2s]"
time="2024-04-05T12:41:02+05:30" level=debug msg="legolog: [INFO] [silverdev.fun] acme: Cleaning DNS-01 challenge"
time="2024-04-05T12:41:02+05:30" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.silverdev.fun.\": \"silverdev.fun.\""
time="2024-04-05T12:41:02+05:30" level=debug msg="legolog: [INFO] [*.silverdev.fun] acme: Preparing to solve DNS-01"
time="2024-04-05T12:41:02+05:30" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.silverdev.fun.\": \"silverdev.fun.\""
time="2024-04-05T12:41:02+05:30" level=debug msg="legolog: [INFO] [*.silverdev.fun] acme: Trying to solve DNS-01"
time="2024-04-05T12:41:02+05:30" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.silverdev.fun.\": \"silverdev.fun.\""
time="2024-04-05T12:41:02+05:30" level=debug msg="legolog: [INFO] [*.silverdev.fun] acme: Checking DNS record propagation using [192.168.1.50:53]"
time="2024-04-05T12:41:04+05:30" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 1m0s, interval: 2s]"

Maybe try go-acme first on CLI (rfc2136), which is the library used by Traefik for LetsEncrypt TLS.

I tried go-acme to generate cert and it fails too. I tried the base domain first which also failed

Steps I followed

RFC2136_TSIG_KEY=keyname. \
RFC2136_TSIG_SECRET=<key> \
RFC2136_TSIG_ALGORITHM=hmac-sha512. \
RFC2136_NAMESERVER=64.227.153.101 \
lego --server https://step-ca:9000/acme/acme/directory --accept-tos --email admin@silverdev.fun --dns rfc2136 --dns.resolvers 64.227.153.101 --domains silverdev.fun run

I have to create a separate lego container where I trusted my self hosted step-ca's root.crt

The container docker file
# Start from a base image
FROM ubuntu:jammy

# Install necessary packages
RUN apt-get update && \
    apt-get install -y ca-certificates wget

# Copy certificates
COPY ./certs/ /usr/local/share/ca-certificates/

# Update certificates
RUN update-ca-certificates

# Change working directory
WORKDIR /home

# Download lego
RUN wget https://github.com/go-acme/lego/releases/download/v4.16.1/lego_v4.16.1_linux_amd64.tar.gz

# Extract lego
RUN tar -xzvf lego_v4.16.1_linux_amd64.tar.gz

# Move lego to bin
RUN mv /home/lego /usr/bin

# Clean up
RUN rm -rf /home/*

go-acme logs

!!!! HEADS UP !!!!

Your account credentials have been saved in your Let's Encrypt
configuration directory at "/home/.lego/accounts".

You should make a secure backup of this folder now. This
configuration directory will also contain certificates and
private keys obtained from Let's Encrypt so making regular
backups of this folder is ideal.
2024/04/06 15:30:58 [INFO] [silverdev.fun] acme: Obtaining bundled SAN certificate
2024/04/06 15:30:58 [INFO] [silverdev.fun] AuthURL: https://step-ca:9000/acme/acme/authz/N7WkbniwZKpvyafibrmk2qWfs8DfUZF7
2024/04/06 15:30:58 [INFO] [silverdev.fun] acme: Could not find solver for: http-01
2024/04/06 15:30:58 [INFO] [silverdev.fun] acme: use dns-01 solver
2024/04/06 15:30:58 [INFO] [silverdev.fun] acme: Preparing to solve DNS-01
2024/04/06 15:30:58 [INFO] Found CNAME entry for "_acme-challenge.silverdev.fun.": "silverdev.fun."
2024/04/06 15:30:58 [INFO] [silverdev.fun] acme: Trying to solve DNS-01
2024/04/06 15:30:58 [INFO] Found CNAME entry for "_acme-challenge.silverdev.fun.": "silverdev.fun."
2024/04/06 15:30:58 [INFO] [silverdev.fun] acme: Checking DNS record propagation. [nameservers=64.227.153.101:53]
2024/04/06 15:31:01 [INFO] Wait for propagation [timeout: 1m0s, interval: 2s]
2024/04/06 15:31:01 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:03 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:05 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:07 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:09 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:11 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:13 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:15 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:17 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:19 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:21 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:23 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:25 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:27 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:29 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:31 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:33 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:35 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:37 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:39 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:41 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:43 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:45 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:47 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:49 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:52 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:54 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:56 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:31:58 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:32:00 [INFO] [silverdev.fun] acme: Waiting for DNS record propagation.
2024/04/06 15:32:02 [INFO] [silverdev.fun] acme: Cleaning DNS-01 challenge
2024/04/06 15:32:02 [INFO] Found CNAME entry for "_acme-challenge.silverdev.fun.": "silverdev.fun."
2024/04/06 15:32:02 [INFO] Deactivating auth: https://step-ca:9000/acme/acme/authz/N7WkbniwZKpvyafibrmk2qWfs8DfUZF7
2024/04/06 15:32:02 [INFO] Unable to deactivate the authorization: https://step-ca:9000/acme/acme/authz/N7WkbniwZKpvyafibrmk2qWfs8DfUZF7
2024/04/06 15:32:02 Could not obtain certificates:
        error: one or more domains had a problem:
[silverdev.fun] propagation: time limit exceeded: last error: DNS call error: read udp 172.18.0.3:50812->192.168.1.19:53: read: connection refused [ns=ns.silverdev.fun.:53, question='silverdev.fun. IN  TXT']

And atlast my bind9 logs

bind9    | 06-Apr-2024 20:09:25.415 checkhints: b.root-servers.net/A (170.247.170.2) missing from hints
bind9    | 06-Apr-2024 20:09:25.415 checkhints: b.root-servers.net/A (199.9.14.201) extra record in hints
bind9    | 06-Apr-2024 20:09:25.415 checkhints: b.root-servers.net/AAAA (2801:1b8:10::b) missing from hints
bind9    | 06-Apr-2024 20:09:25.415 checkhints: b.root-servers.net/AAAA (2001:500:200::b) extra record in hints
bind9    | 06-Apr-2024 20:10:12.736 client @0x7f50541a3f18 <my-ip>#46986/key keyname: updating zone 'silverdev.fun/IN': deleting rrset at 'silverdev.fun' TXT
bind9    | 06-Apr-2024 20:10:12.736 client @0x7f50541a3f18 <my-ip>#46986/key keyname: updating zone 'silverdev.fun/IN': adding an RR at 'silverdev.fun' TXT "Xbw9wBoEDH4u_x-DJ6n8dXCvc7uRD6H1Xze1Oo0bzaY"
bind9    | 06-Apr-2024 20:11:15.925 client @0x7f50541a3f18 <my-ip>#33000/key keyname: updating zone 'silverdev.fun/IN': deleting an RR at silverdev.fun TXT
bind9    | 06-Apr-2024 20:13:33.167 client @0x7f50541a7bd8 23.26.60.162#56454 (sl): query (cache) 'sl/ANY/IN' denied (allow-query-cache did not match)
bind9    | 06-Apr-2024 20:37:53.601 client @0x7f50541a3f18 104.223.129.153#51626 (sl): query (cache) 'sl/ANY/IN' denied (allow-query-cache did not match)
bind9    | 06-Apr-2024 20:46:15.724 client @0x7f50541a7bd8 104.223.129.149#55514 (sl): query (cache) 'sl/ANY/IN' denied (allow-query-cache did not match)
bind9    | 06-Apr-2024 21:00:58.958 client @0x7f50541a3f18 <my-ip>#48105/key keyname: updating zone 'silverdev.fun/IN': deleting rrset at 'silverdev.fun' TXT
bind9    | 06-Apr-2024 21:00:58.958 client @0x7f50541a3f18 <my-ip>#48105/key keyname: updating zone 'silverdev.fun/IN': adding an RR at 'silverdev.fun' TXT "5cs0Uj3jQFfogfGZbptEm33syPu4AYpDXf_zbcKkIA8"
bind9    | 06-Apr-2024 21:02:02.199 client @0x7f50541a3f18 <my-ip>#49052/key keyname: updating zone 'silverdev.fun/IN': deleting an RR at silverdev.fun TXT

I'm stuck here, I don't know what else I can do

Probably a topic for the go-acme forum, they should know more.

Alternatively you can create the TLS certs with certbot and just load them in Traefik. Created a proof of concept a while ago.

Thank you for sharing the PoC, @bluepuma77, I read your post before going rc2136 route without knowing you are the one who is posting here, I will try the method you mentioned, But with my limited knowledge in this area I can see traefik is able to fetch cert for base domain (silverdev.fun) but not the SANS while go-acme is not generating the certs for the base domain
I tried to use acme-dns server (again self hosted) link but I couldn't resolve the domains as I could with Bind9, It would be helpful If I could get any leads on how to use acme-dns server too. Thank

The one thing we don’t self host is DNS server :sweat_smile:

Well, and first in line load balancer, the single-point-of-failure.

Yes, I agree with your view, @bluepuma77 , that operating a self-hosted DNS server may not be the most efficient approach. However, in my case, I'm using a Bind9 Docker container that manages only my internal domains in my home lab setup, which doesn't have a significant user base. Initially, I was using AdGuard as my DNS server, But I couldn't make it work with traefik DNS challenge for internal domains, so I went to a full blown real DNS setup Bind9 which up-streams to adguard for adblocking.

After spending several weeks tweaking and experimenting with this setup, I've reached a point where, I'm not moving anywhere, I haven't been able to find a solution. I believe it might be time for me to consider other options.

Sorry, @bluepuma77. I haven't marked your PoC as the solution yet. I'd like to keep this discussion open in case someone else has a different solution.

Did you ask in go-acme discussions or create an issue?

Sorry I couldn't post it earlier, I have started a discussion in go-acme/lego
Help needed over setting up DNS-01 challenge with self hosted BIND9 server and Step CA ACME sever using RFC2136 · go-acme/lego · Discussion #2159 (github.com)