Error: one or more domains had a problem: propagation: time limit exceeded

Hi there!

I'm adding some docker container to my stack, and I'm encountering some issue for the new ones:

time="2024-03-05T20:20:53Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] acme: Cleaning DNS-01 challenge"
time="2024-03-05T20:20:53Z" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.grafana.mydomain.com.\": \"ho.mydomain.com.\""
time="2024-03-05T20:20:53Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/322846982607"
time="2024-03-05T20:20:53Z" level=error msg="Unable to obtain ACME certificate for domains \"grafana.mydomain.com\": unable to generate a certificate for the domains [grafana.mydomain.com]: error: one or more domains had a problem:\n[grafana.mydomain.com] propagation: time limit exceeded: last error: read udp xxxxxx:42286->xxxxxx:53: i/o timeout\n" routerName=grafana@docker rule="Host(`grafana.mydomain.com`)" providerName=myresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory"

Here is how (this container + traefik) are configured:

version: "3.9"

services:
  traefik:
    image: "traefik:latest"
    container_name: "traefik"
    network_mode: "host"
    command:
      - "--log.level=DEBUG"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.myresolver.acme.dnschallenge=true"
      - "--certificatesresolvers.myresolver.acme.dnschallenge.provider=ovh"
      - "--certificatesresolvers.myresolver.acme.email=my@domain.com"
      - "--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json"
      - "--providers.docker.network=home-stack_default"
    ports:
      - "80:80"
      - "8080:8080"
      - "443:443"
    environment:
      - "OVH_ENDPOINT=ovh-eu"
      - "OVH_APPLICATION_KEY=AAAA"
      - "OVH_APPLICATION_SECRET=BBBB"
      - "OVH_CONSUMER_KEY=CCC"
    volumes:
      - config-letsencrypt:/letsencrypt
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
    restart: always
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.rule=Host(`traefik.mydomain.com`)"
      - "traefik.http.routers.traefik.entrypoints=websecure"
      - "traefik.http.routers.traefik.tls.certresolver=myresolver"
      - "traefik.http.services.traefik.loadbalancer.server.port=8080"
      - "traefik.http.middlewares.force-secure.redirectscheme.scheme=https"
      - "traefik.http.middlewares.force-secure.redirectscheme.permanent=true"
      - traefik.http.routers.http-catchall.rule=HostRegexp(`{any:.+}`)
      - traefik.http.routers.http-catchall.entrypoints=web
      - traefik.http.routers.http-catchall.middlewares=force-secure
	  

  grafana:
    image: grafana/grafana-enterprise
    container_name: grafana
    restart: unless-stopped
    ports:
      - '3000:3000'
    volumes:
      - grafana-data:/var/lib/grafana
    user: "1028:100"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.grafana.rule=Host(`grafana.mydomain.com`)"
      - "traefik.http.routers.grafana.entrypoints=websecure"
      - "traefik.http.routers.grafana.tls=true"
      - "traefik.http.routers.grafana.tls.certresolver=myresolver"
      - "traefik.http.services.grafana.loadbalancer.server.port=3000" 

volumes:
  grafana-data:
    driver_opts:
      type: "nfs"
      o: "addr=192.168.0.60,nolock,rw,soft"
      device: ":/volume2/apps/work/grafana/data"

What am I missing? I can access the site, it's just that the SSL certificate isn't valid

network_mode: "host"

is rather unusual, especially with

providers.docker.network=home-stack_default

I don't remember why, but it seems to work with the 6-7 containers

Also, I guess that if it was a network issue, I would not be able to access it at all through traefik, right? In my case, its just the SSL that is invalid.

Hi @bluepuma77

I'm trying to reproduce the issue. I've removed the network_mode: host, but I still have the issue.
I'm now noticing another error:

time="2024-03-18T19:16:56Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/327845168427"
time="2024-03-18T19:16:56Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] acme: Could not find solver for: tls-alpn-01"
time="2024-03-18T19:16:56Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] acme: Could not find solver for: http-01"
time="2024-03-18T19:16:56Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] acme: use dns-01 solver"
time="2024-03-18T19:16:56Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] acme: Preparing to solve DNS-01"
time="2024-03-18T19:16:56Z" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.grafana.mydomain.com.\": \"ho.mydomain.com.\""
time="2024-03-18T19:16:57Z" level=debug msg="legolog: [INFO] Found CNAME entry for \"_acme-challenge.grafana.mydomain.com.\": \"ho.mydomain.com.\""
time="2024-03-18T19:16:57Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] acme: Trying to solve DNS-01"
time="2024-03-18T19:16:57Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] acme: Checking DNS record propagation using [127.0.0.11:53]"
time="2024-03-18T19:16:57Z" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 1m0s, interval: 2s]"
time="2024-03-18T19:16:59Z" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 1m0s, interval: 2s]"
time="2024-03-18T19:17:02Z" level=debug msg="Serving default certificate for request: \"grafana.mydomain.com\""
time="2024-03-18T19:17:02Z" level=debug msg="http: TLS handshake error from 10.0.0.2:33871: remote error: tls: unknown certificate"
time="2024-03-18T19:17:09Z" level=debug msg="legolog: [INFO] [grafana.mydomain.com] acme: Waiting for DNS record propagation."
time="2024-03-18T19:17:14Z" level=debug msg="http: TLS handshake error from 10.0.0.2:52769: remote error: tls: unknown certificate"
time="2024-03-18T19:17:14Z" level=debug msg="Serving default certificate for request: \"grafana.mydomain.com\""
time="2024-03-18T19:27:07Z" level=error msg="Error renewing certificate from LE: {grafana.my.domain.com []}" error="error: one or more domains had a problem:\n[grafana.mydomain.com] propagation: time limit exceeded: last error: read udp 172.18.0.15:60504->213.251.188.138:53: i/o timeout\n" providerName=myresolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory"

ho.mydomain.com is my own A DNS record with my home IP(just checked, it's still the case).

Also, apparently, it's now all my container that traefik is not managing to renew. I guess before it was working because the certificate were recent enough.

I guess I changed something since it worked, but I can't figure out what :frowning:

Damned, I'm not sure why, but it suddenly started to work(with the "network_mode: host" ) removed, I'm not sure why it didn't change anything in my previous tries but now it seems ok anyway.