DNS Challenge Timeout

TJS · May 5, 2020, 1:36am

I got the following error:

level=error msg="Unable to obtain ACME certificate for domains \"traefik.springbox-office.com\": cannot get ACME client get directory at 'https://acme-staging-v02.api.letsencrypt.org/directory': Get \"https://acme-staging-v02.api.letsencrypt.org/directory\": dial tcp: lookup acme-staging-v02.api.letsencrypt.org on 127.0.0.11:53: read udp 127.0.0.1:39698->127.0.0.11:53: i/o timeout" providerName=letsencrypt.acme routerName=api@docker rule="Host(`traefik.springbox-office.com`)"

Why Traefik uses docker default DNS server for address resolution (i.e. 127.0.0.11) ?
The following command fails:

dig @127.0.0.11 https://acme-staging-v02.api.letsencrypt.org/directory

I have configured the following command options:

      - "--log.level=DEBUG"
      - "--providers.docker.endpoint=unix://var/run/docker.sock"
      - "--providers.docker.swarmMode=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.web.http.redirections.entryPoint.to=websecure"
      - "--entrypoints.web.http.redirections.entryPoint.scheme=https"
      - "--api.dashboard=true"
      - "--certificatesresolvers.letsencrypt.acme.email=postmaster@company.tld"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
      - "--certificatesresolvers.letsencrypt.acme.dnschallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.dnschallenge.provider=gandiv5"
      - "--certificatesresolvers.letsencrypt.acme.dnschallenge.resolvers=1.1.1.1:53,8.8.8.8:53"
      - "--ping=true"

Any idea ? Thanks for your help !

Regards

Thierry

zespri · May 5, 2020, 1:53am

This option you setup it's specific to DNS challenge. It has nothing to do with normal DNS resolution for the traefik container.

TJS · May 5, 2020, 10:08am

OK, I see. Thanks for your reply.

How can I do for Traefik to change the default DNS server to query (I guess that the only one available is the docker DNS server that cannot reach external servers like google.com either Let's Encrypt end point) ?

zespri · May 5, 2020, 10:52am

Same way as with any docker container. Both docker and docker-compose give you this option. Look it up in their docos. Of course docker needs to be able to reach whatever dns servers you specify, depending on how networking is setup it is not always the case.

TJS · May 5, 2020, 11:45am

Thank you!
Indeed, I have a configuration issue in my docker daemon. Probably, something to change in daemon.json (adding a DNS server ?).

cakiwi · May 5, 2020, 12:07pm

This is just using the resolver in /etc/resolv.conf which, when you are using a docker network(this is by default implicit with docker-compose), is the docker DNS resolver and it is so you can resolve other containers on the same network.

Your dig should be: dig acme-staging-v02.api.letsencrypt.org

TJS · May 5, 2020, 8:31pm

This one works but I wanted to test the docker internal dns.

To summarize DNS external resolution works fine outside container (in the host) and badly inside.
Therefore, there is something wrong in docker configuration in my side (I am using swarm mode).

cakiwi · May 5, 2020, 8:49pm

Your dig example had the full URL not just the FQDN.

What platform are you running docker on? There are issues like this one for docker for win.

TJS · May 5, 2020, 10:41pm

I am using a debian linux platform.

Here is some information:

thierry@springbox01:~/test$ docker system info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: 6k4plaqobx9i9zh1d6aok4x4f
  Is Manager: true
  ClusterID: ifub5um9dl3rfgoipu8ityswp
  Managers: 2
  Nodes: 5
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.0.1
  Manager Addresses:
   192.168.0.1:2377
   192.168.0.2:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-8-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 5.823GiB
 Name: springbox01
 ID: 5GYF:L7WI:I4ME:2MK4:EAFD:OGHE:YORF:NJGX:IL75:Y4GF:JARJ:IEYZ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
WARNING: Running Swarm in a two-manager configuration. This configuration provides
         no fault tolerance, and poses a high risk to lose control over the cluster.
         Refer to https://docs.docker.com/engine/swarm/admin_guide/ to configure the
         Swarm for fault-tolerance.

The traefik service is launched in overlay network that i created just before ...
I am checking if there is something to do in it ...

TJS · May 7, 2020, 1:13am

After restarting docker deamon, ping seems to work again (inside the containers).
Docker seems now OK.
I will retest Traefik tomorrow

TJS · May 7, 2020, 10:41am

It is now OK, the "DNS Challenge Timeout" has disappeared and external systems are reachable from containers as expected.
I confirm it was a docker problem (restarting it on swarm nodes solves the problem).
Thank you for your help and your clarifications

Topic		Replies	Views
Dockers Unable to obtain ACME certificate for domains Traefik v2 docker	2	2348	December 4, 2020
Unable to obtain ACME certificates timeout Traefik v2 docker	8	5455	December 8, 2020
Traefik dns challenge with duckdns Traefik v2 docker	5	7725	March 8, 2023
Traefik serving default certificate instead letsencrypt Traefik v2 docker , letsencrypt-acme	5	11263	April 30, 2023
Unable to obtain ACME: timeout during connect Traefik v2 docker , letsencrypt-acme	6	4701	January 21, 2021

DNS Challenge Timeout

Related topics