Docker Traefik + Route53 DNS-01 challenge not renewing certificates

I have a private network that I need SSL certificate (Proper one). 4 months ago I set up my docker compose and everything worked. TXT records were added in my Route53 records and certificate was made.
Now traefik refuses to create new certificates. TXT records are created in Route53.

time="2023-02-07T10:43:13Z" level=debug msg="legolog: [INFO] Wait for route53 [timeout: 2m0s, interval: 4s]"
time="2023-02-07T10:43:13Z" level=debug msg="legolog: [INFO] Wait for route53 [timeout: 2m0s, interval: 4s]"
time="2023-02-07T10:43:13Z" level=debug msg="legolog: [INFO] Wait for route53 [timeout: 2m0s, interval: 4s]"
time="2023-02-07T10:43:13Z" level=debug msg="legolog: [INFO] Wait for route53 [timeout: 2m0s, interval: 4s]"
time="2023-02-07T10:43:40Z" level=debug msg="legolog: [INFO] retry due to: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548241066 :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: \"5CA2q8yi8zSiwWWkIorIuBSx-Q-r--cXbKxtDWd7IRtUAWI\""
time="2023-02-07T10:43:40Z" level=debug msg="legolog: [INFO] retry due to: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548241126 :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: \"C8782FyZy8cpoGaoa1SbVX0khXV0GEMPYiOonqGc5S__ptw\""
time="2023-02-07T10:43:40Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548242586"
time="2023-02-07T10:43:40Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548241066"
time="2023-02-07T10:43:40Z" level=error msg="Unable to obtain ACME certificate for domains \"gitea.serveris.link\": unable to generate a certificate for the domains [gitea.serveris.link]: error: one or more domains had a problem:\n[gitea.serveris.link] time limit exceeded: last error: read udp 172.23.0.2:59576->205.251.199.43:53: i/o timeout\n" providerName=route53resolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory" routerName=gitea@docker rule="Host(`gitea.serveris.link`)"
time="2023-02-07T10:43:40Z" level=debug msg="legolog: [INFO] retry due to: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548241126 :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: \"1DFA9Z_CXw37oxdzuGvuHEAbefHJFwDC6ANsONn6cbYz_W4\""
time="2023-02-07T10:43:40Z" level=debug msg="legolog: [INFO] retry due to: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548242376 :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: \"5CA2tXAGJ7ofZp7fxJdSiv-NLKq-uYPlopQl30kEV345bE8\""
time="2023-02-07T10:43:40Z" level=error msg="Unable to obtain ACME certificate for domains \"reporter.serveris.link\": unable to generate a certificate for the domains [reporter.serveris.link]: error: one or more domains had a problem:\n[reporter.serveris.link] time limit exceeded: last error: read udp 172.23.0.2:33476->205.251.193.155:53: i/o timeout\n" ACME CA="https://acme-v02.api.letsencrypt.org/directory" rule="Host(`reporter.serveris.link`)" routerName=poker_server_rails_web_secure@docker providerName=route53resolver.acme
time="2023-02-07T10:43:40Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548242596"
time="2023-02-07T10:43:41Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548241126"
time="2023-02-07T10:43:41Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/201548242376"
time="2023-02-07T10:43:41Z" level=error msg="Unable to obtain ACME certificate for domains \"whoami.serveris.link\": unable to generate a certificate for the domains [whoami.serveris.link]: error: one or more domains had a problem:\n[whoami.serveris.link] time limit exceeded: last error: read udp 172.23.0.2:41332->205.251.197.183:53: i/o timeout\n" providerName=route53resolver.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory" routerName=whoami@docker rule="Host(`whoami.serveris.link`)"

Hello,

do you have a CNAME wildcard?

if yes, you can set the env var LEGO_DISABLE_CNAME_SUPPORT to true.

Added LEGO_DISABLE_CNAME_SUPPORT=true to ENV, but that didn't help. Still have the same errors

Do you have multiple instances of Traefik running?

No, just 1. Just tried to downgrade version to 2.9.1 and 2.8, but that didn't help as well


My DNS records look like this. I took screenshot when DNS challenge was ongoing. So you can see TXT records

One post on the Internet with your phrase was resolved by checking the used domain names, not all were setup correctly.

The anti-replay nonce is not a problem because lego handles that automatically.

But the timeout is really often related to network issues. Check your firewall and your network configuration.

So I have a feeling I'm getting closer to fining issue.
If I run dig @1.1.1.1 +nssearch google.com
I get

;; UDP setup with 2001:4860:4802:36::a#53(2001:4860:4802:36::a) for google.com failed: network unreachable.
;; UDP setup with 2001:4860:4802:32::a#53(2001:4860:4802:32::a) for google.com failed: network unreachable.
;; UDP setup with 2001:4860:4802:34::a#53(2001:4860:4802:34::a) for google.com failed: network unreachable.
;; UDP setup with 2001:4860:4802:38::a#53(2001:4860:4802:38::a) for google.com failed: network unreachable.
SOA ns1.google.com. dns-admin.google.com. 509006591 900 900 1800 60 from server 216.239.36.10 in 20 ms.
SOA ns1.google.com. dns-admin.google.com. 509006591 900 900 1800 60 from server 216.239.38.10 in 23 ms.
SOA ns1.google.com. dns-admin.google.com. 509006591 900 900 1800 60 from server 216.239.32.10 in 40 ms.
SOA ns1.google.com. dns-admin.google.com. 509006591 900 900 1800 60 from server 216.239.34.10 in 50 ms.

Now if I run dig @1.1.1.1 +nssearch serveris.link

;; UDP setup with 2600:9000:5301:9b00::1#53(2600:9000:5301:9b00::1) for serveris.link failed: network unreachable.
;; UDP setup with 2600:9000:5303:fe00::1#53(2600:9000:5303:fe00::1) for serveris.link failed: network unreachable.
;; UDP setup with 2600:9000:5307:2b00::1#53(2600:9000:5307:2b00::1) for serveris.link failed: network unreachable.
;; UDP setup with 2600:9000:5305:b700::1#53(2600:9000:5305:b700::1) for serveris.link failed: network unreachable.
;; communications error to 205.251.197.183#53: timed out
;; communications error to 205.251.193.155#53: timed out
;; communications error to 205.251.199.43#53: timed out
;; communications error to 205.251.195.254#53: timed out
;; NS servers could not be reached

Is this somehow related?

Why does it even need to connect to nameserver? 205.251.197.183:53

I find Traefik container not honouring DNS addresses to resolve ACME challenges · Issue #9588 · traefik/traefik · GitHub and think to myself finally someone else with the same issue. And it gets closed...

After more digging I noticed that yes my networks doesn't let UDP DNS queries through.

  • dig @1.1.1.1 +nssearch +tcp serveris.link Works
  • dig @1.1.1.1 +nssearch serveris.link Doesn't work

Can't seem to find the issues in my hardware so I opened PR in lego so it would fallback to TCP on failed DNS query Fallback to TCP DNS challenge if UDP fails by reinismu · Pull Request #1841 · go-acme/lego · GitHub