Unable to obtain acme certificate for domains: timeout

Hi,

I've some issues with the renewal of my certificates, I probably have broken something because they were able to generate on the first place.

In my logs, for each domain, I see the following:

time="2024-03-31T07:48:45Z" level=debug msg="Creating middleware" routerName=ddnsupdater@docker middlewareName=pipelining middlewareType=Pipelining serviceName=ddnsupdater entryPointName=websecure
time="2024-03-31T07:48:45Z" level=debug msg="Creating load-balancer" entryPointName=websecure routerName=ddnsupdater@docker serviceName=ddnsupdater
time="2024-03-31T07:48:45Z" level=debug msg="Creating server 0 http://10.0.2.10:8000" serviceName=ddnsupdater serverName=0 entryPointName=websecure routerName=ddnsupdater@docker
time="2024-03-31T07:48:45Z" level=debug msg="child http://10.0.2.10:8000 now UP"
time="2024-03-31T07:48:45Z" level=debug msg="Propagating new UP status"
time="2024-03-31T07:48:45Z" level=debug msg="Added outgoing tracing middleware ddnsupdater" middlewareType=TracingForwarder entryPointName=websecure routerName=ddnsupdater@docker middlewareName=tracing
time="2024-03-31T07:48:45Z" level=debug msg="Creating middleware" middlewareType=Pipelining entryPointName=websecure routerName=traefik@docker serviceName=traefik middlewareName=pipelining
time="2024-03-31T07:48:45Z" level=debug msg="Creating load-balancer" routerName=traefik@docker serviceName=traefik entryPointName=websecure
time="2024-03-31T07:48:45Z" level=debug msg="Creating server 0 http://10.0.2.30:8080" routerName=traefik@docker serviceName=traefik serverName=0 entryPointName=websecure
time="2024-03-31T07:48:45Z" level=debug msg="Adding route for ddns.mydomain.com with TLS options default" entryPointName=websecure
time="2024-03-31T07:48:45Z" level=debug msg="Trying to challenge certificate for domain [ddns.mydomain.com] found in HostSNI rule" rule="Host(`ddns.mydomain.com`)" providerName=myresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory" routerName=ddnsupdater@docker
time="2024-03-31T07:48:45Z" level=debug msg="Looking for provided certificate(s) to validate [\"ddns.mydomain.com\"]..." rule="Host(`ddns.mydomain.com`)" providerName=myresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory" routerName=ddnsupdater@docker
time="2024-03-31T07:48:45Z" level=debug msg="No ACME certificate generation required for domains [\"ddns.mydomain.com\"]." providerName=myresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory" routerName=ddnsupdater@docker rule="Host(`ddns.mydomain.com`)"
time="2024-03-31T07:48:54Z" level=debug msg="legolog: [INFO] [ddns.mydomain.com] acme: Waiting for DNS record propagation."
time="2024-03-31T07:49:06Z" level=debug msg="legolog: [INFO] [ddns.mydomain.com] acme: Waiting for DNS record propagation."
time="2024-03-31T07:49:18Z" level=debug msg="legolog: [INFO] [ddns.mydomain.com] acme: Waiting for DNS record propagation."
time="2024-03-31T07:49:30Z" level=debug msg="legolog: [INFO] [ddns.mydomain.com] acme: Waiting for DNS record propagation."
time="2024-03-31T07:49:42Z" level=debug msg="legolog: [INFO] [ddns.mydomain.com] acme: Waiting for DNS record propagation."
time="2024-03-31T07:49:44Z" level=debug msg="legolog: [INFO] [ddns.mydomain.com] acme: Cleaning DNS-01 challenge"
time="2024-03-31T07:49:44Z" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.ddns.mydomain.com.\": \"ho.mydomain.com.\""
time="2024-03-31T07:49:44Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/332803205747"
time="2024-03-31T07:49:45Z" level=error msg="Unable to obtain ACME certificate for domains \"ddns.mydomain.com\": unable to generate a certificate for the domains [ddns.mydomain.com]: error: one or more domains had a problem:\n[ddns.mydomain.com] propagation: time limit exceeded: last error: read udp 172.18.x.y:35158->213.251.x.y:53: i/o timeout\n" providerName=myresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory" routerName=ddnsupdater@docker rule="Host(`ddns.mydomain.com`)"

I'm really not understanding what could prevent my DNS challenge.

Basically, I've a DDNS subdomain(ho.mydomain.com) in the logs above, which has currently my home address, and then I just have a CNAME that points *.mydomain.com to ho.mydomain.com.

I must say that if I navigate to ddns.mydomain.com, I've access to it(obviously, with an invalid certificate error), so the redirection should be working?

Here is my config:

version: "3.9"

services:
  traefik:
    image: "traefik:latest"
    container_name: "traefik"
    command:
      - "--log.level=DEBUG"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.myresolver.acme.dnschallenge=true"
      - "--certificatesresolvers.myresolver.acme.dnschallenge.provider=ovh"
      - "--certificatesresolvers.myresolver.acme.email=AAAA@BBBB.com"
      - "--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json"
      - "--providers.docker.network=home-stack_default"
    ports:
      - "80:80"
      - "8080:8080"
      - "443:443"
    environment:
      - "OVH_ENDPOINT=ovh-eu"
      - "OVH_APPLICATION_KEY=AAAAAAAAAAAA"
      - "OVH_APPLICATION_SECRET=BBBBBBBBBBBBBBBBBBBBBB"
      - "OVH_CONSUMER_KEY=CCCCCCCCCCCCCCCCCCC"
    volumes:
      - config-letsencrypt:/letsencrypt
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    restart: always
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.rule=Host(`traefik.mydomain.com`)"
      - "traefik.http.routers.traefik.entrypoints=websecure"
      - "traefik.http.routers.traefik.tls.certresolver=myresolver"
      - "traefik.http.services.traefik.loadbalancer.server.port=8080"
      - "traefik.http.middlewares.force-secure.redirectscheme.scheme=https"
      - "traefik.http.middlewares.force-secure.redirectscheme.permanent=true"
      - traefik.http.routers.http-catchall.rule=HostRegexp(`{any:.+}`)
      - traefik.http.routers.http-catchall.entrypoints=web
      - traefik.http.routers.http-catchall.middlewares=force-secure

  ddns-updater:
    image: qmcgaw/ddns-updater
    container_name: ddnsupdater
    volumes:
      - config-ddns:/updater/data
    ports:
      - 8007:8000/tcp
    environment:
      - PERIOD=5m
      - CONFIG=
      - UPDATE_COOLDOWN_PERIOD=5m
      - PUBLICIP_FETCHERS=all
      - PUBLICIP_HTTP_PROVIDERS=all
      - PUBLICIPV4_HTTP_PROVIDERS=all
      - PUBLICIPV6_HTTP_PROVIDERS=all
      - PUBLICIP_DNS_PROVIDERS=all
      - PUBLICIP_DNS_TIMEOUT=3s
      - HTTP_TIMEOUT=10s

      # Web UI
      - LISTENING_PORT=8007
      - ROOT_URL=/

      # Backup
      - BACKUP_PERIOD=0 # 0 to disable
      - BACKUP_DIRECTORY=/updater/data

      # Other
      - LOG_LEVEL=info
      - LOG_CALLER=hidden
      - SHOUTRRR_ADDRESSES=
    restart: always    
    healthcheck:
      disable: true
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.ddnsupdater.rule=Host(`ddns.mydomain.com`)"
      - "traefik.http.routers.ddnsupdater.entrypoints=websecure"
      - "traefik.http.routers.ddnsupdater.tls.certresolver=myresolver"
      - "traefik.http.services.ddnsupdater.loadbalancer.server.port=8000"

volumes:
  config-ddns:
    driver_opts:
      type: "nfs"
      o: "addr=192.168.0.60,nolock,rw,soft"
      device: ":/volume2/apps/config/ddns"
  config-letsencrypt:
    driver_opts:
      type: "nfs"
      o: "addr=192.168.0.60,nolock,rw,soft"
      device: ":/volume2/apps/config/letsencrypt"

Any idea what I could have messed up?

Try setting Traefik env var

LEGO_DISABLE_CNAME_SUPPORT=true

even when it seems counter-intuitive.

Try setting Traefik static certresolver config

delayBeforeCheck=30

(in secs) to wait longer for propagation.

Doc (link)

I added both element you mentionned, but it seems I still get the error:
time="2024-03-31T15:37:59Z" level=error msg="Unable to obtain ACME certificate for domains \"ddns.mydomain.com\": unable to generate a certificate for the domains [ddns.mydomain.com]: error: one or more domains had a problem:\n[ddns.mydomain.com] propagation: time limit exceeded: last error: read udp 172.18.a.b:51915->213.251.a.b:53: i/o timeout\n" routerName=ddnsupdater@docker rule="Host(ddns.mydomain.com)" providerName=myresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory"

Is there some way to check what I'm missing?

UDP port 53 is DNS, which seems to have an issue. Do you run a local DNS server like pihole? Try a ping google.com inside Traefik container.

That seems to be working fine
image

The weird thing is that I've this error for the 10 docker images, not only for one, so I don't really believe it was a once-a-time timeout(especially since I've 10Gb fiber).

The two DNS server in charge of this domain are dns18.ovh.net and ns18.ovh.net, not sure if it helps?

Do you have by any chance any other idea/approach/workaround I could try? I'm a bit desperate since all my services are not working anymore :frowning:

Is that one of the OVH DNS servers?

Hi,

I've checked, 213.251.a.b is actually ns18.ovh.net(213.251.188.138).
I must say, I don't know what the other IP is, maybe the internal IP of the traefik container.

I cannot ping ns18.ovh.net(either from my computer or from the container) because it doesn't seems to respond to ping.

I'm not ultra familiar with DNS, but I did some tests:

if I use the linux command line host ddns.mydomain.com dns18.ovh.net, it fails on my computer
If I do the same without specifying the "dns18.ovh.net"(so it uses my provider DNS), it works.
If I do the same, using a web tool (https://network-tools.webwiz.net/nslookup.htm ) and specifying the DNS server, it works.

If I understand properly, it seems that my provider is preventing me from doing DNS requests directly to a specific DNS? I don't have a custom firwall/local dns server installed, and I've checked, there is nothing specific in my router that I can enable/disable for that.

I look a bit over my provider forum, and it appears that I'm not the only one having this issue:

Do you think with my usecase, it would be possible to use another challenge than the DNS one, that would work? Or anyway for traefik to use my ISP DNS provider instead querying directly the one of my domain?

I just remember that I few months ago, I got a mail from my provider that he was activating a "surf protect home" option, I wonder if that could be the issue, I just disabled it, but I guess it will take some time to be disabled.

Well, some good news, now that I've disabled this, I seem to be able to do "nslookup" from the host computer, and traefik doesn't have the same errors anymore.

But now I've another error, which really seems to have been further:

time="2024-04-01T07:39:59Z" level=error msg="Unable to obtain ACME certificate for domains \"ddns.mydomain.com\": unable to generate a certificate for the domains [ddns.mydomain.com]: error: one or more domains had a problem:\n[ddns.mydomain.com] propagation: time limit exceeded: last error: NS dns18.ovh.net. did not return the expected TXT record [fqdn: ho.mydomain.com., value: Rag_lDVgLoaEkiJm9OgYPrpnL09Uk6LRWWUmQohKRy8]: nf8vrQ_7qCZOM-waKDqj5xbRlnoX1EXL6SPnGlwWIA8 ,MNMWfpAXaFK1xsKVmBBN52HT2uMzoKyXW1MNy5M5X24 ,7MNDbfKPZpFSRWoeVJwV_YCGa3Sfsu7komGIz00Npf8 ,4ZjIiPIOiMIKZGbwcQcfNekgiAgSCvrkRUE1ksDS48A ,Ka6xSQu72u4xW8keZ9xJREbrcxA5fNlwLik9FgfCPkA ,Ub4VDxto1ZvFCHiRSBKEyV-gP243HYNKaG08Qzo9UnI ,GB5mTleEPu79g2sJkfV4zZFPGLxAYIxwiT52egV7nVk ,2DFeXfQf-F8jaH5yll-JkknF50iH8Q0gIxvqEXiOzT0 ,GfprcEAQG68OyMFk9wgRcelJkeVMz5Dru6W5yyWS5eg ,TE2FJlY5-bVjuXqLxddj6RBjop2EoxJbrz8nbS1vhRE ,hSZ3KwSAInoTePhC-8_Hvvu-ixc8VmdMkbH8r_4DxoU\n" providerName=myresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory" routerName=ddnsupdater@docker rule="Host(ddns.mydomain.com)"

ho.mydomain.com is the one with my A record with my home address.
What is this TXT records it seems to expect?

I tried to remove the LEGO_DISABLE_CNAME_SUPPORT, to see if it would be working now, but I've another error, do you know anything about it?

time="2024-04-01T07:54:27Z" level=debug msg="'500 Internal Server Error' caused by: tls: failed to verify certificate: x509: cannot validate certificate for 10.0.2.24 because it doesn't contain any IP SANs"
time="2024-04-01T07:54:28Z" level=debug msg="'500 Internal Server Error' caused by: tls: failed to verify certificate: x509: cannot validate certificate for 10.0.2.24 because it doesn't contain any IP SANs"
time="2024-04-01T07:54:27Z" level=debug msg="'500 Internal Server Error' caused by: tls: failed to verify certificate: x509: cannot validate certificate for 10.0.2.24 because it doesn't contain any IP SANs"
time="2024-04-01T07:56:02Z" level=debug msg="http: TLS handshake error from 10.0.0.2:46428: tls: client requested unsupported application protocols ([http/0.9 http/1.0 spdy/1 spdy/2 spdy/3 h2c hq])"
time="2024-04-01T07:56:02Z" level=debug msg="http: TLS handshake error from 10.0.0.2:46008: read tcp 10.0.0.46:443->10.0.0.2:46008: read: connection reset by peer"
time="2024-04-01T07:56:02Z" level=debug msg="http: TLS handshake error from 10.0.0.2:15340: tls: client offered only unsupported versions: [302 301]"
time="2024-04-01T07:56:02Z" level=debug msg="http: TLS handshake error from 10.0.0.2:18901: tls: client requested unsupported application protocols ([hq h2c spdy/3 spdy/2 spdy/1 http/1.0 http/0.9])"
time="2024-04-01T07:56:02Z" level=debug msg="http: TLS handshake error from 10.0.0.2:63928: tls: no cipher suite supported by both client and server"
time="2024-04-01T07:56:02Z" level=debug msg="http: TLS handshake error from 10.0.0.2:23802: read tcp 10.0.0.46:443->10.0.0.2:23802: read: connection reset by peer"
time="2024-04-01T07:54:43Z" level=debug msg="'500 Internal Server Error' caused by: tls: failed to verify certificate: x509: cannot validate certificate for 10.0.2.24 because it doesn't contain any IP SANs"
time="2024-04-01T07:56:01Z" level=debug msg="http: TLS handshake error from 10.0.0.2:55173: EOF"
time="2024-04-01T07:56:02Z" level=debug msg="http: TLS handshake error from 10.0.0.2:40429: EOF"

dnsChallenge will create a TXT record with the DNS provider, which LetsEncrypt will verify from its external servers to confirm you own/control the domain.

I was doing a lot of differents tests, but it seems one of them did work(but the browser was not indicating the certificate were now valid), is there a way to force the certificate generation?

Delete (or rename) acme.json.

I am experiencing your exact same problem and suddenly on a configuration that worked until two days ago. I haven't changed anything. It's just that when traefik tried to renew the certificate it failed with that error. Did you figure out on how to solve ?