Cannot obtain ACME certificate possibly due to inability to perform SOA DNS query

Hey, so I have been experiencing this error when trying to set up my LetsEncrypt certificate:

traefik  | time="2022-09-24T22:57:40+01:00" level=error msg="Unable to obtain ACME certificate for domains \"*.loki.[REDACTED DOMAIN NAME],loki.[REDACTED DOMAIN NAME]\"" routerName=https-service@file ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory" error="unable to generate a certificate for the domains [*.loki.[REDACTED DOMAIN NAME] [REDACTED DOMAIN NAME]]: error: one or more domains had a problem:\n[*.[REDACTED DOMAIN NAME]] [*.loki.[REDACTED DOMAIN NAME]] acme: error presenting token: cloudflare: failed to find zone [REDACTED DOMAIN NAME].: ListZonesContext command failed: HTTP status 400: Invalid request headers (6003)\n[[REDACTED DOMAIN NAME]] [loki.[REDACTED DOMAIN NAME]] acme: error presenting token: cloudflare: failed to find zone [REDACTED].: ListZonesContext command failed: HTTP status 400: Invalid request headers (6003)\n" providerName=letsEncrypt.acme rule=

Edit: Relevant part of the log

traefik  | time="2022-09-26T09:14:49+01:00" level=debug msg="Looking for provided certificate(s) to validate [\"*.loki.[REDACTED DOMAIN]\" \"loki.[REDACTED DOMAIN]\"]..." providerName=letsEncrypt.acme ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory"
traefik  | time="2022-09-26T09:14:49+01:00" level=debug msg="No ACME certificate generation required for domains [\"*.loki.[REDACTED DOMAIN]\" \"loki.[REDACTED DOMAIN]\"]." ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory" providerName=letsEncrypt.acme
traefik  | time="2022-09-26T09:14:49+01:00" level=debug msg="Looking for provided certificate(s) to validate [\"*.loki.[REDACTED DOMAIN]\" \"loki.[REDACTED DOMAIN]\"]..." providerName=letsEncrypt.acme ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory"
traefik  | time="2022-09-26T09:14:49+01:00" level=debug msg="No ACME certificate generation required for domains [\"*.loki.[REDACTED DOMAIN]\" \"loki.[REDACTED DOMAIN]\"]." providerName=letsEncrypt.acme ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory"
traefik  | time="2022-09-26T09:14:50+01:00" level=debug msg="Using DNS Challenge provider: cloudflare" providerName=letsEncrypt.acme
traefik  | time="2022-09-26T09:14:50+01:00" level=debug msg="legolog: [INFO] [*.loki.[REDACTED DOMAIN], loki.[REDACTED DOMAIN]] acme: Obtaining bundled SAN certificate"
traefik  | time="2022-09-26T09:14:51+01:00" level=debug msg="legolog: [INFO] [*.loki.[REDACTED DOMAIN]] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/3751467894"
traefik  | time="2022-09-26T09:14:51+01:00" level=debug msg="legolog: [INFO] [loki.[REDACTED DOMAIN]] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/3751467904"
traefik  | time="2022-09-26T09:14:51+01:00" level=debug msg="legolog: [INFO] [*.loki.[REDACTED DOMAIN]] acme: use dns-01 solver"
traefik  | time="2022-09-26T09:14:51+01:00" level=debug msg="legolog: [INFO] [loki.[REDACTED DOMAIN]] acme: Could not find solver for: tls-alpn-01"
traefik  | time="2022-09-26T09:14:51+01:00" level=debug msg="legolog: [INFO] [loki.[REDACTED DOMAIN]] acme: Could not find solver for: http-01"
traefik  | time="2022-09-26T09:14:51+01:00" level=debug msg="legolog: [INFO] [loki.[REDACTED DOMAIN]] acme: use dns-01 solver"
traefik  | time="2022-09-26T09:14:51+01:00" level=debug msg="legolog: [INFO] [*.loki.[REDACTED DOMAIN]] acme: Preparing to solve DNS-01"
traefik  | time="2022-09-26T09:14:52+01:00" level=debug msg="legolog: [INFO] [loki.[REDACTED DOMAIN]] acme: Preparing to solve DNS-01"
traefik  | time="2022-09-26T09:14:52+01:00" level=debug msg="legolog: [INFO] [*.loki.[REDACTED DOMAIN]] acme: Cleaning DNS-01 challenge"
traefik  | time="2022-09-26T09:14:53+01:00" level=debug msg="legolog: [WARN] [*.loki.[REDACTED DOMAIN]] acme: cleaning up failed: cloudflare: failed to find zone [REDACTED DOMAIN].: ListZonesContext command failed: HTTP status 400: Invalid request headers (6003) "
traefik  | time="2022-09-26T09:14:53+01:00" level=debug msg="legolog: [INFO] [loki.[REDACTED DOMAIN]] acme: Cleaning DNS-01 challenge"
traefik  | time="2022-09-26T09:14:54+01:00" level=debug msg="legolog: [WARN] [loki.[REDACTED DOMAIN]] acme: cleaning up failed: cloudflare: failed to find zone [REDACTED DOMAIN].: ListZonesContext command failed: HTTP status 400: Invalid request headers (6003) "
traefik  | time="2022-09-26T09:14:54+01:00" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/3751467894"
traefik  | time="2022-09-26T09:14:55+01:00" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/3751467904"
traefik  | time="2022-09-26T09:14:55+01:00" level=error msg="Unable to obtain ACME certificate for domains \"*.loki.[REDACTED DOMAIN],loki.[REDACTED DOMAIN]\"" ACME CA="https://acme-staging-v02.api.letsencrypt.org/directory" providerName=letsEncrypt.acme routerName=https-service@file rule= error="unable to generate a certificate for the domains [*.loki.[REDACTED DOMAIN] loki.[REDACTED DOMAIN]]: error: one or more domains had a problem:\n[*.loki.[REDACTED DOMAIN]] [*.loki.[REDACTED DOMAIN]] acme: error presenting token: cloudflare: failed to find zone [REDACTED DOMAIN].: ListZonesContext command failed: HTTP status 400: Invalid request headers (6003)\n[loki.[REDACTED DOMAIN]] [loki.[REDACTED DOMAIN]] acme: error presenting token: cloudflare: failed to find zone [REDACTED DOMAIN].: ListZonesContext command failed: HTTP status 400: Invalid request headers (6003)\n"

Note: I can successfully hit CF's API with the same email and API token.

https https://api.cloudflare.com/client/v4/zones names:[REDACTED DOMAIN] [Auth-headers...]

At first glance, it looks related to the CF token, but that shouldn't be the case as I'm using the Global one (I know it's not recommended). Digging further, I found this thread that mentions Lego needs to be able to perform a SOA DNS query on the domain, so I tried to access the container and did it using nslookup:

/ # nslookup -type=soa adguard.loki.[REDACTED DOMAIN]
Server:		127.0.0.11
Address:	127.0.0.11:53

Non-authoritative answer:

I noticed the image is Alpine based, so I also tried with a pure Alpine one, yielding the same results. Yet, that same command works in a vanilla Ubuntu-based container and the host system. Returning the following:

root@a0d6a19ea344:/# nslookup -type=soa adguard.loki.[REDACTED DOMAIN]
Server:		127.0.0.11
Address:	127.0.0.11#53

Non-authoritative answer:
*** Can't find adguard.loki.[REDACTED DOMAIN]: No answer

Authoritative answers can be found from:
[REDACTED DOMAIN]
	origin = nena.ns.cloudflare.com
	mail addr = dns.cloudflare.com
	serial = 2289543556
	refresh = 10000
	retry = 2400
	expire = 604800
	minimum = 3600

Traefik docker-compose.yml

version: '3.3'
networks:
  proxied_services:
    external: true
services:
  traefik:
    image: "traefik:v2.9"
    container_name: "traefik"
      #dns:
      #- 1.1.1.1 #Tried both with and without this
    networks:
      - proxied_services
    volumes:
      - /home/cubi/docker/network_infra/traefik:/etc/traefik
      - /home/cubi/docker/network_infra/traefik/acme:/letsencrypt
      - /home/cubi/docker/network_infra/traefik/traefik.yml:/etc/traefik/traefik.yml:ro
      - /home/cubi/docker/network_infra/traefik/dynamic_config.yml:/etc/traefik/dynamic_config.yml:ro
        #- /var/run/docker.sock:/var/run/docker.sock:ro
        #- /etc/localetime:/etc/localtime:ro
    ports:
      - 80:80
      - 8080:8080
      - 443:443
    environment:
      - TZ=[REDACTED]
      - CLOUDFLARE_EMAIL=[REDACTED]
      - CLOUDFLARE_DNS_API_TOKEN=[REDACTED]

Command to wire up both the alpine and ubuntu containers:

docker run -dit --name ubuntu1 --network proxied_services ubuntu bash
docker run -dit --name alpine1 --network proxied_services alpine ash

I'm no DNS expert, so this might make no sense, but it seems both the Alpine container and Traefik are not following on what is returned by the DNS server? I will dig through with Wireshark tomorrow to compare the query and response.

So my questions are:

  • How can I be sure this SOA DNS thing is the actual root cause?
  • Is it expected for a pure alpine based container to fail that query?
  • On the other hand, I'm not sure if we can equate nslookup failing a query to the system resolver doing the same. Is it supposed to bypass the default resolver entirely, I guess.

Thank you!

Looks to me like it might be some of your cloudflare credentials and not a dns problem. Hard to know since the error you posted was just all one line

Hey @kevdog thanks. I've edited the post with the relevant part of the log. On the suggested root cause of the issue, I would suspect it's not related as I can hit CF's API with the same credentials.

Hello,

failed to find zone [REDACTED DOMAIN].: ListZonesContext command failed: HTTP status 400: Invalid request headers (6003) 

The problem is not related to DNS calls but to the API. Based on the error message, it's probably an error inside your credentials because Invalid request headers (6003) comes from the Cloudflare API.
You have to check the content of the env vars that contains your credentials.

Hey @ldez. Is there a way to inspect the actual request made by Traefik? I can successfully reach Cloudflare's API using curl and httpie with the same credentials.

It's not possible: Traefik uses lego to handle ACME and lego uses the official CF client.

The only information that is sent with headers is the credentials.

I still recommend checking the content of your env vars, it can be a hidden character.

Ok, sorry guys. It's completely laid out in the doc, it works using these env vars:

Cloudflare 	cloudflare 	
CF_API_EMAIL, CF_API_KEY - The Global API Key needs to be used, not the Origin CA Key

Instead of:

      - CLOUDFLARE_EMAIL=[REDACTED]
      - CLOUDFLARE_DNS_API_TOKEN=[REDACTED]

I convinced myself it was due to the DNS query, thank you :slight_smile:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.