Error obtaining certificate: Timeout during read

I am quite new to traefik and followed some helpful guides to build a new docker swarm starting with traefik and portainer on the manager node, secured with LE certificates.

After deploying the stacks for traefik and portainer, accessing the websites for traefik dashboard and portainer works.

But obtaining the certificate from LE leaves me with the following error:

error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during read (your server may be slow or overloaded), url: \n" routerName=portainer-secure@docker rule="Host(hostname)" providerName=http.acme

As new user I am not allowed to post more than 4 links. Seems there are lots of links found in the error message, my logs and config files. So I created two pastes with the content of
config files: config files - Pastebin.com
and
traefik log output for portainer.mydomain.com: traefik log output - Pastebin.com

Would be great if anyone around could have a hint for the configuration error.

Welcome to the forum @qroac

Your portainer service has multiple networks.

Thank you.

I have traefik configured in traefik.yml to use the network "webproxy".
The second portainer network is for communication with the portainer agents.

Communication of traefik with portainer works, as I can access the website.
Just without LE certificate.

I suggest you go reread the linked post and the parameters that it links to.

The timeout issue is indicative of the portainer router being connected to two networks.
I experienced it myself this week when I mistyped traefik in the traefik.docker.network label.

I didn't know that from the docs. Thanks for the hint.
However, it sadly didn't change anything.

I added "traefik.docker.network=webproxy" to portainers deploy.labels.
traefik.yml already contains provider.docker.networks: webproxy

Log output after removing both stacks and redeploying stays the same as posted before.

It is likely due to inter-container networkingthen .

The pastebin did not have traefik.docker.network defined, but good that it is there.

You may need to look further into that webproxy network, ensure both containers are actually connected to it.

As they are in separate compose files and the network is defined external, errors can creep in.

docker network inspect webproxy confirmed there are 3 members in the network. The container for traefik and portainer and a webproxy-endpoint.

I can also reach the portainer interface on https by accepting the TRAEFIK DEFAULT CERT.
Executing SH in the traefik container I am further able to ping the portainer container by IP and by service name.

So to me it seems they are communicating using that network.

Another question for my understanding:
Does the handling of TLS challenges for letsencrypt really depend on containers that use the hostname? Is it not handeled by traefik itself?

You're correct it is handled by traefik.

Good news. Yes I agree all is well there. Is this from inside your network that you can access portainer?

Is 443 open on the firewall(s) an/or being forwarded correctly to the host/traefik?
The timeout is from the letsencrypt server to traefik.

You can use https://letsdebug.net/ to help debug your site.

The container run on a Hetzner Cloud server and I access portainer from my home computer.
So Port 80/443 are public accessible.

I requested a report for the domain of my portainer instance. The error is very much the same as in the logfile:

IssueFromLetsEncrypt

ERROR

A test authorization for swarm.spicyweb.de to the Let's Encrypt staging service has revealed issues that may prevent any certificate for this domain being issued.

Timeout during read (your server may be slow or overloaded)

Very strange.

I see you have the httpChallenge commented out. Did you try that method without success or try tlsChallenge first?

Yes, I tried httpChallenge first. It results in the following error:

traefik_local_main.0.894gjy53z4s7@master | time="2020-07-06T11:30:12+02:00" level=error msg="Unable to obtain ACME certificate for domains "...": unable to generate a certificate for the domains [...]: error: one or more domains had a problem:\n[swarm.spicyweb.de] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://swarm.spicyweb.de/.well-known/acme-challenge/vqZXGXq5PWC9KIo-lLtdOTC-SQBg6kJMv3IA8IHT3WQ: Timeout after connect (your server may be slow or overloaded), url: \n" providerName=le-tls.acme routerName=portainer-secure@docker rule="Host(swarm.spicyweb.de)"

Is httpChallenge the one to prefer over tlsChallenge?

I also saved my current config to a cloudfolder, only changed username/password and mailaddress.
This way I think I can better keep them in sync with config changes while searching for a solution.
https://cloud.fuermann.net/s/YQLzXkynEAHyejB

I did another experiment and deleted the AAAA DNS record for the used host entry.
Now I get the LE certificate issued.

Haven't followed this topic for a while.
Is IPv6 still a problem for docker?
I found this one: https://zerotier.atlassian.net/wiki/spaces/SD/pages/7274520/Using+NDP+Emulated+6PLANE+Addressing+With+Docker
Would you recommend to use 6plane, a different solution or no IPv6 at all.

PS: I had ipv6 and fixed-cidr-v6 in my /etc/docker/daemon.json set all the time.

1 Like

I like to use tlsChallenge, personal preference.. I have read that sometime port 80 is blocked by providers(ISPs), so this can be an option in that case vs DNS.

Nice!
Is that with httpChallenge or tlsChallenge. Can you confirm both now work, for science?
Edit: I checked via letsdebug. Green for both now.

I freely admit no knowledge on this topic :smiley:

Sorry to resurrect this thread, but in case anyone else comes across this via Google search, I wanted to (possibly) help them out.

I was having this same problem following this tutorial on a Linode provisioned server. The first time I ran docker stack deploy -c traefik.yml traefik and watched the logs for it to startup, I got

"Unable to obtain ACME certificate for domains \"<mydomain>\": unable to generate a certificate for the domains [<mydomain>]: error: one or more domains had a problem:\n[<mydomain>] acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: NXDOMAIN looking up A for <mydomain> - check that a DNS record exists for this domain, url: \n" providerName=le.acme routerName=traefik-public-https@docker rule="Host(`<mydomain>`)"

Realized at that point I had forgotten to create the record. I went and did that in my Linode dashboard adding both v4 and v6 IP addresses. I deleted the service and redeployed the stack and started getting:

Unable to obtain ACME certificate for domains \"<mydomain>\": unable to generate a certificate for the domains [<mydomain>]: error: one or more domains had a problem:\n[<mydomain>] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during read (your server may be slow or overloaded), url: \n" routerName=traefik-public-https@docker providerName=le.acme rule="Host(`<mydomain>`)

After coming across this post, I tried removing the the IPv6 address and removing/redeploying the service but got the same error. At this point, I was a bit stumped, but then went spelunking for more data in the volume. That's when I had the idea to completely delete the Docker volume I had made and then redeploy and that seemed to fix it up.

I'm assuming that the IPv6 was the problem, but it might have been caused by the initial failure causing some bad state in the Docker volume. Since the error message changed after adding the records, I'm guessing it wasn't that.