Traefik Unable to Obtain ACME Certificates Due to Connection Timeouts After External IP Change

I’m encountering an issue with Traefik failing to generate ACME certificates after a VM restart caused my external IP address to change. I have updated the DNS records to reflect the new external IP, but now I’m getting connection timeouts when trying to reach the ACME challenge URL from the outside.

Environment Details:

  • Traefik Version: 3.1.5
  • Deployed Using: Docker Compose
  • Cloud Platform: Google Cloud (GCE)
  • External IP: 35.136.188.182
  • DNS Configuration: Updated to point to the new IP (confirmed propagated)
  • Ports Open: Verified that ports 80 and 443 are allowed through the firewall on GCP

Problem:

ACME certificate generation fails with the following error:

ERR Unable to obtain ACME certificate for domains error="unable to generate a certificate for the domains [client.donclemtech.com]: error: one or more domains had a problem: acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://client.donclemtech.com/.well-known/acme-challenge/...: Timeout during connect (likely firewall problem)"

When I run curl http://client.donclemtech.com, it times out:

* Trying 35.136.188.182:80...
* connect to 35.136.188.182 port 80 failed: Connection timed out

However, when I curl 35.136.188.182:80 directly (using the IP), it works and reflects in the Traefik logs.

What I Have Tried:

  • DNS Propagation: Confirmed that DNS points to the correct external IP (dig resolves properly).
  • Firewall: Verified GCP firewall allows ingress on ports 80 and 443.
  • Traefik Configuration:
    • Traefik is configured to handle HTTP-01 challenges via port 80.
    • Entry points are configured as:
      entryPoints:
        web:
          address: ":80"
        websecure:
          address: ":443"
      
  • Logs: Traefik logs confirm that it’s listening on the correct ports and attempting the ACME challenge, but the external request times out.
  • Local vs. External Access: Locally, I can curl the IP address directly, but external access to the domain times out.

Docker Compose Setup:

Here’s a summary of my docker-compose.yml related to Traefik:

services:
  traefik:
    image: traefik:v3.1.5
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./traefik.yml:/etc/traefik/traefik.yml
      - ./acme:/etc/traefik/acme

What Might Be Happening?

  • The problem started after my VM was restarted, causing the external IP to change. DNS has been updated, but the ACME challenge still fails with timeouts on port 80.
  • I suspect the issue may be related to GCP network routing or some unknown configuration, as the firewall and DNS seem correct, and Traefik’s logs show it is responding locally.

Request for Help:

  • What could be preventing external access to port 80, even though the firewall is configured correctly?
  • Are there any additional settings in GCP or Traefik that I might need to check?
  • Could there be an issue with Traefik routing requests based on the domain after the IP change?

Thank you in advance for your help!

DNS entries are cached and have a TTL. Try again after 1 hour.

Thanks for your response, just to let you know:

  1. I have been trying to debug this since yesterday
  2. due to my impatience, I deleted the generated previous certificate in my docker container and after no effect I delete the own volume of traefik.

After all of those, nothing still works out. I suspected that because I upgraded to 3.1.6 so that is why I downgraded to 3.1.5. Please, I need urgent help in this because my company website has been down since yesterday because of this.

Domain needs to resolve via DNS to IP, then connection to IP:port is established.

It makes no sense that the domain is resolved correctly, you can not connect by domain, but you can connect via same IP.

So what is the possible solution to this, because I am frustrated?

You tried curl to domain and IP from the same command prompt? With Traefik running?

yes I did. see below for each testing

  1. Direct IP Access Works (34.136.188.182):
curl -v http://34.136.188.182
*   Trying 34.136.188.182:80...
* Connected to 34.136.188.182 (34.136.188.182) port 80
> GET / HTTP/1.1
> Host: 34.136.188.182
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Fri, 18 Oct 2024 04:48:46 GMT
< Content-Length: 19
404 page not found
  1. Domain Access Resolves but Times Out (client.donclemtech.com):
curl -v http://client.donclemtech.com
* Host client.donclemtech.com:80 was resolved.
* IPv6: (none)
* IPv4: 35.136.188.182
*   Trying 35.136.188.182:80...
* connect to 35.136.188.182 port 80 from 192.168.0.192 port 52398 failed: Connection timed out
* Failed to connect to client.donclemtech.com port 80 after 134273 ms: Couldn't connect to server
  1. Other Domains Also Time Out (task.donclemtech.com):
curl -v http://task.donclemtech.com
* Host task.donclemtech.com:80 was resolved.
* IPv6: (none)
* IPv4: 35.136.188.182
*   Trying 35.136.188.182:80...
* connect to 35.136.188.182 port 80 from 192.168.0.192 port 54532 failed: Connection timed out
* Failed to connect to task.donclemtech.com port 80 after 137761 ms: Couldn't connect to server

when I stop my traefik this will be the result:

curl -v http://34.136.188.182
*   Trying 34.136.188.182:80...
* connect to 34.136.188.182 port 80 from 192.168.0.192 port 49698 failed: Connection refused
* Failed to connect to 34.136.188.182 port 80 after 206 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 34.136.188.182 port 80 after 206 ms: Couldn't connect to server

To show that Traefik work, when I run curl -v http://34.136.188.182, I will get this in my traefik log

traefik-1  | 2024-10-18T04:56:38Z ERR Error while Peeking first byte error="read tcp 172.18.0.9:80->167.94.145.110:36070: read: connection reset by peer"

also check out this link please Let's Debug

One IP starts with 34., the other with 35.. Is that intentional or a typo?

oh my GOD that was typo. thanks a lot, it was the admin that made that mistake

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.