Letsencrypt wildcard rate limit - why?

I'm trying to understand why I'm getting rate limited by letsencrypt, when using a wildcard certificate and a few routers. I've replaced my real domain below.

My static configuration:

api:
  dashboard: true
  insecure: true

metrics:
  prometheus:
    entryPoint: https
    manualRouting: true

log:
  level: DEBUG

entryPoints:
  http:
    address: :80
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
  https:
    address: ":443"
    http:
      tls:
        certResolver: letsencrypt
        options: intermediate
        domains:
          - main: "example.net"
            sans:
              - "*.example.net"

# luadns credentials supplied to traefik container as ENV variables
certificatesResolvers:
  letsencryptStaging:
    acme:
      caServer: "https://acme-staging-v02.api.letsencrypt.org/directory"
      email: oscar@example.net
      storage: /etc/traefik/certs/acme-staging.json
      dnsChallenge:
        provider: luadns
        delayBeforeCheck: 0
  letsencrypt:
    acme:
      caServer: "https://acme-v02.api.letsencrypt.org/directory"
      email: oscar@example.net
      storage: /etc/traefik/certs/acme.json
      dnsChallenge:
        provider: luadns
        delayBeforeCheck: 0
  # Internal certificate authority
  stepca:
    acme:
      caServer: "https://ca.home.arpa/acme/acme/directory"
      email: oscar@example.net
      storage: /etc/traefik/certs/acme-stepca.json
      certificatesDuration: 24
      tlsChallenge: {}

providers:
  file:
    directory: "/etc/traefik/config"
    watch: true
  providersThrottleDuration: 11

From my routers configuration:

# /etc/traefik/config/10-routers.yml
---
http:
  routers:
    examplenet:
      rule: Host(`example.net`)
      tls:
        certResolver: letsencrypt
        domains:
          - main: "example.net"
            sans: "*.example.net"

    authelia:
      rule: Host(`auth.example.net`)
      service: authelia
      entryPoints: https
      middlewares:
        - hstsHeader
        - chainNoAuth

    grafana:
      rule: Host(`grafana.example.net`)
      service: grafana
      middlewares:
        - hstsHeader
        - chainNoAuth

    # And then a few more routers defined just like above, copy & paste

    # Then two routers using my internal zone and internal CA
    dashboard:
      rule: Host(`traefik.home.arpa`) && PathPrefix(`/dashboard`)
      service: api@internal
      middlewares:
        - basicAuthDashboard
      tls:
        certresolver: stepca

    metrics:
      rule: Host(`traefik.home.arpa`) && PathPrefix(`/metrics`)
      service: prometheus@internal
      middlewares:
        - prometheusScraperAllowedIPs
      tls:
        certresolver: stepca

In case it's relevant, here's my TLS configuration:

# /etc/traefik/config/40-tls.yml
---
tls:
  options:
    intermediate:
      minVersion: VersionTLS12
      cipherSuites:
        - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
        - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
        - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
        - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
        - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305

Some sed/grep on traefik, this is from the legolog:

jul 28 07:55:37 atomic conmon[2545646]: time="2023-07-28T05:55:37Z" level=debug msg="legolog: [INFO] [traefik.home.arpa] acme: Trying renewal with -8 hours rema
ining"
jul 28 07:55:37 atomic conmon[2545646]: time="2023-07-28T05:55:37Z" level=debug msg="legolog: [INFO] [traefik.home.arpa] acme: Obtaining bundled SAN certificate"
jul 28 07:55:37 atomic conmon[2545646]: time="2023-07-28T05:55:37Z" level=debug msg="legolog: [INFO] [traefik.home.arpa] AuthURL: https://ca.home.arpa/acme/acme/authz/OyrEx4U8Jou7gariJJba21q687ctlhGA"
jul 28 07:55:37 atomic conmon[2545646]: time="2023-07-28T05:55:37Z" level=debug msg="legolog: [INFO] [traefik.home.arpa] acme: use tls-alpn-01 solver"
jul 28 07:55:37 atomic conmon[2545646]: time="2023-07-28T05:55:37Z" level=debug msg="legolog: [INFO] [traefik.home.arpa] acme: Trying to solve TLS-ALPN-01"
jul 28 07:55:37 atomic conmon[2545646]: time="2023-07-28T05:55:37Z" level=debug msg="legolog: [INFO] [traefik.home.arpa] The server validated our request"
jul 28 07:55:37 atomic conmon[2545646]: time="2023-07-28T05:55:37Z" level=debug msg="legolog: [INFO] [traefik.home.arpa] acme: Validations succeeded; requesting certificates"
jul 28 07:55:38 atomic conmon[2545646]: time="2023-07-28T05:55:38Z" level=debug msg="legolog: [INFO] [example.net, *.example.net] acme: Obtaining bundled SAN certificate"
jul 28 07:55:38 atomic conmon[2545646]: time="2023-07-28T05:55:38Z" level=debug msg="legolog: [INFO] [traefik.home.arpa] Server responded with a certificate."
jul 28 07:55:39 atomic conmon[2545646]: time="2023-07-28T05:55:39Z" level=debug msg="legolog: [INFO] [*.example.net] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/249671950336"
jul 28 07:55:39 atomic conmon[2545646]: time="2023-07-28T05:55:39Z" level=debug msg="legolog: [INFO] [example.net] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/249671950346"
jul 28 07:55:39 atomic conmon[2545646]: time="2023-07-28T05:55:39Z" level=debug msg="legolog: [INFO] [*.example.net] acme: use dns-01 solver"
jul 28 07:55:39 atomic conmon[2545646]: time="2023-07-28T05:55:39Z" level=debug msg="legolog: [INFO] [example.net] acme: Could not find solver for: tls-alpn-01"
jul 28 07:55:39 atomic conmon[2545646]: time="2023-07-28T05:55:39Z" level=debug msg="legolog: [INFO] [example.net] acme: Could not find solver for: http-01"
jul 28 07:55:39 atomic conmon[2545646]: time="2023-07-28T05:55:39Z" level=debug msg="legolog: [INFO] [example.net] acme: use dns-01 solver"
jul 28 07:55:39 atomic conmon[2545646]: time="2023-07-28T05:55:39Z" level=debug msg="legolog: [INFO] [*.example.net] acme: Preparing to solve DNS-01"
jul 28 07:55:40 atomic conmon[2545646]: time="2023-07-28T05:55:40Z" level=debug msg="legolog: [INFO] [example.net] acme: Preparing to solve DNS-01"
jul 28 07:55:40 atomic conmon[2545646]: time="2023-07-28T05:55:40Z" level=debug msg="legolog: [INFO] [*.example.net] acme: Trying to solve DNS-01"
jul 28 07:55:40 atomic conmon[2545646]: time="2023-07-28T05:55:40Z" level=debug msg="legolog: [INFO] [*.example.net] acme: Checking DNS record propagation using [172.19.19.5:53]"
jul 28 07:55:42 atomic conmon[2545646]: time="2023-07-28T05:55:42Z" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]"
jul 28 07:55:44 atomic conmon[2545646]: time="2023-07-28T05:55:44Z" level=debug msg="legolog: [INFO] [*.example.net] The server validated our request"
jul 28 07:55:44 atomic conmon[2545646]: time="2023-07-28T05:55:44Z" level=debug msg="legolog: [INFO] [example.net] acme: Trying to solve DNS-01"
jul 28 07:55:44 atomic conmon[2545646]: time="2023-07-28T05:55:44Z" level=debug msg="legolog: [INFO] [example.net] acme: Checking DNS record propagation using [172.19.19.5:53]"
jul 28 07:55:46 atomic conmon[2545646]: time="2023-07-28T05:55:46Z" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]"
jul 28 07:55:47 atomic conmon[2545646]: time="2023-07-28T05:55:47Z" level=debug msg="legolog: [INFO] [example.net] The server validated our request"
jul 28 07:55:47 atomic conmon[2545646]: time="2023-07-28T05:55:47Z" level=debug msg="legolog: [INFO] [*.example.net] acme: Cleaning DNS-01 challenge"
jul 28 07:55:48 atomic conmon[2545646]: time="2023-07-28T05:55:48Z" level=debug msg="legolog: [INFO] [example.net] acme: Cleaning DNS-01 challenge"
jul 28 07:55:48 atomic conmon[2545646]: time="2023-07-28T05:55:48Z" level=debug msg="legolog: [INFO] [example.net, *.example.net] acme: Validations succeeded; requesting certificates"
jul 28 07:55:52 atomic conmon[2545646]: time="2023-07-28T05:55:52Z" level=debug msg="legolog: [INFO] [example.net] Server responded with a certificate."
jul 28 08:01:59 atomic conmon[2567357]: time="2023-07-28T06:01:59Z" level=debug msg="legolog: [INFO] [example.net, *.example.net] acme: Obtaining bundled SAN certificate"

It looks fine to me? But acme.json doesn't contain any certificates! If I check acme-stepca.json (for my internal CA) it contains certificates issued when starting traefik, ie, at the same time as the wildcard cert should have been issued.

# ls -lah certs/
total 43K
drwxr-xr-x 2 root root    5 jul 28 08:00 .
drwxr-xr-x 5 root root    7 jul 26 22:52 ..
-rw------- 1 root root 3,5K jul 28 08:01 acme.json
-rw------- 1 root root  16K jul 26 23:00 acme-staging.json
-rw------- 1 root root  11K jul 28 07:55 acme-stepca.json

Am I missing something obvious? What happened?

How do you know you get rate limited? Didn’t find it in the log file. Is the acme file writeable? I would test with LE staging without stepca.

How do you know you get rate limited?

Ah I missed that. Here's one example:

jul 28 08:01:59 atomic conmon[2567357]: time="2023-07-28T06:01:59Z" level=error msg="Unable to obtain ACME certificate for domains \"example.net,*.example.net\"" routerName=https-grafana@file error="unable to generate a certificate for the domains [example.net *.example.net]: acme: error: 429 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:rateLimited :: Error creating new order :: too many certificates (5) already issued for this exact set of domains in the last 168 hours: *.example.net,example.net, retry after 2023-07-29T14:31:51Z: see https://letsencrypt.org/docs/duplicate-certificate-limit/" rule="Host(`grafana.example.net`)" providerName=letsencrypt.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory"

Is the acme file writeable?

All acme-files have the same permissions and are written to by the same user. The internal certificates are handled correctly so I'm fairly confident that permissions are ok. See my ls printout above.

I would test with LE staging without stepca.

Maybe that's one way forward. I tried setting the default certificate resolver to staging, but it still tried to renew the real certificates.

The 48 hour cooldown period makes this very frustrating...

The rate limiting is by LetsEncrypt, nothing Traefik can do about it.

You should be able to delete (or move) acme.json and use LE staging.

Yes, I am aware of this. However, the rate limiting was caused by Traefik running the configuration above, where it should just have gotten one wildcard certificate and used it for most routers.

Instead it seems it tried getting five or more wildcard certificates, causing the rate limiting.

Yes, I tried changing the certResolver (after commenting out all other routers) to staging and it worked fine.

I just want this to actually work reliably in the future. If I comment out all other routers and renew successfully later today, will it work in the future as well? I know it will take ~60 days before I know for sure but I'm not excited by the thought of doing this manually in the future...

I think I understand the problem.

I summarize your context:

  • you are using 2 different certificate resolvers: one for LE and one for Step (domains are not shared between resolvers).
  • you have defined a default certificate resolver (LE) on the entrypoint https with a root domain and the related wildcard.

Your router's configuration follows 3 patterns:

  • one with root and wildcard domain and LE as certificate resolver
  • one without explicit certificate resolver (it will use LE as certificate resolver)
  • one with domain with Step as certificate resolver

Ultimately, you have certificates inside acme-stepca.json and 0 certificates inside acme.json.
And after some retries, you hit one of the rate limits defined by LE.

Can you confirm that my summary is right?

The default configuration defined on the entrypoint https will apply to all the routers on this entrypoint, this means that the root domain and the wildcard are defined on the routers even those with Step as a resolver.

Important: Traefik stores internally all the certificates inside the same place and domain doubloon is impossible.

Traefik will try, depending on the random order of the routers, to get certificates but you will try to get 2 certificates for the same domain (example.com + wildcard).

To debug, I recommend enabling the API and calling /api/rawdata, you will have your effective dynamic configuration.

I think the problem is related to the default configuration defined on the entrypoint https, can you remove the tls.domains section?

Another solution can be to explicitly define the domain (inside the TLS section) on routers that use Step to override the default entrypoint configuration, example:

    dashboard:
      rule: Host(`traefik.home.arpa`) && PathPrefix(`/dashboard`)
      service: api@internal
      middlewares:
        - basicAuthDashboard
      tls:
        certresolver: stepca
        domains:
          - main: "traefik.home.arpa"
1 Like

Yes! StepCA is used for internal (home.arpa) domains and LE for global domains.

Yes:

entryPoints:
  http:
    address: :80
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
  https:
    address: ":443"
    http:
      tls:
        certResolver: letsencrypt
        options: intermediate
        # domains:                                                                                                                                              
        #   - main: "example.net"                                                                                                                            
        #     sans:                                                                                                                                             
        #       - "*.example.net"                                                                                                                            

I've added an explicit domain to the relevant routers.

It's still some time before my rate limit expires, but will post here once I've tried below.

My sentence was ambiguous: I mean define an explicit domain inside the TLS section of the router configuration (see my previous example).

It was clear, I used the same pattern as in the dashboard router in your example.

It worked! Thanks for your time, @bluepuma77 and @ldez !

It looks like lego tried fetching two certificates for two different routers, but ended up on one? I've scrubbed below with sed so it should hopefully be coherent.

jul 29 21:15:48 atomic conmon[1000557]: time="2023-07-29T19:15:48Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: Obtaining bundled SAN certificate"
jul 29 21:15:48 atomic conmon[1000557]: time="2023-07-29T19:15:48Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: Obtaining bundled SAN certificate"
jul 29 21:15:48 atomic conmon[1000557]: time="2023-07-29T19:15:48Z" level=debug msg="legolog: [INFO] [example.net, *.example.net] acme: Obtaining bundled SAN certificate"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [bar.example.net] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/250129554436"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: Could not find solver for: tls-alpn-01"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: Could not find solver for: http-01"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: use dns-01 solver"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: Preparing to solve DNS-01"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [foo.example.net] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/250129554856"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: Could not find solver for: tls-alpn-01"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: Could not find solver for: http-01"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: use dns-01 solver"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: Preparing to solve DNS-01"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [*.example.net] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/249671950336"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [example.net] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/249671950346"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [*.example.net] acme: authorization already valid; skipping challenge"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [example.net] acme: authorization already valid; skipping challenge"
jul 29 21:15:49 atomic conmon[1000557]: time="2023-07-29T19:15:49Z" level=debug msg="legolog: [INFO] [example.net, *.example.net] acme: Validations succeeded; requesting certificates"
jul 29 21:15:50 atomic conmon[1000557]: time="2023-07-29T19:15:50Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: Trying to solve DNS-01"
jul 29 21:15:50 atomic conmon[1000557]: time="2023-07-29T19:15:50Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: Checking DNS record propagation using [172.19.19.5:53]"
jul 29 21:15:50 atomic conmon[1000557]: time="2023-07-29T19:15:50Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: Trying to solve DNS-01"
jul 29 21:15:50 atomic conmon[1000557]: time="2023-07-29T19:15:50Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: Checking DNS record propagation using [172.19.19.5:53]"
jul 29 21:15:51 atomic conmon[1000557]: time="2023-07-29T19:15:51Z" level=debug msg="legolog: [INFO] [example.net] Server responded with a certificate."
jul 29 21:15:52 atomic conmon[1000557]: time="2023-07-29T19:15:52Z" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]"
jul 29 21:15:52 atomic conmon[1000557]: time="2023-07-29T19:15:52Z" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]"
jul 29 21:15:53 atomic conmon[1000557]: time="2023-07-29T19:15:53Z" level=debug msg="legolog: [INFO] [foo.example.net] The server validated our request"
jul 29 21:15:53 atomic conmon[1000557]: time="2023-07-29T19:15:53Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: Cleaning DNS-01 challenge"
jul 29 21:15:53 atomic conmon[1000557]: time="2023-07-29T19:15:53Z" level=debug msg="legolog: [INFO] [foo.example.net] acme: Validations succeeded; requesting certificates"
jul 29 21:15:53 atomic conmon[1000557]: time="2023-07-29T19:15:53Z" level=debug msg="legolog: [INFO] [bar.example.net] The server validated our request"
jul 29 21:15:53 atomic conmon[1000557]: time="2023-07-29T19:15:53Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: Cleaning DNS-01 challenge"
jul 29 21:15:53 atomic conmon[1000557]: time="2023-07-29T19:15:53Z" level=debug msg="legolog: [INFO] [bar.example.net] acme: Validations succeeded; requesting certificates"
jul 29 21:15:55 atomic conmon[1000557]: time="2023-07-29T19:15:55Z" level=debug msg="legolog: [INFO] [foo.example.net] Server responded with a certificate."
jul 29 21:15:56 atomic conmon[1000557]: time="2023-07-29T19:15:56Z" level=debug msg="legolog: [INFO] [bar.example.net] Server responded with a certificate."

And for future me, what I did:

  • I removed the domain stanza from my https entrypoint
  • I made the internal domains even more explicit

My static configuration, entrypoints:

entryPoints:
  http:
    address: :80
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
  https:
    address: ":443"
    http:
      tls:
        certResolver: letsencrypt
        options: intermediate # defined elsewhere
        # removed below
        # domains:                                                                                                                                              
        #   - main: "example.net"                                                                                                                            
        #     sans:                                                                                                                                             
        #       - "*.example.net" 

And for the routers for my internal domains:

    dashboard:
      rule: Host(`traefik.home.arpa`) && PathPrefix(`/dashboard`)
      service: api@internal
      middlewares:
        - basicAuthDashboard
      tls:
        certresolver: stepca
        # Added this
        domains:
          - main: "traefik.home.arpa"
1 Like

lego just issues certificates, the management of those certificates is made by Traefik.
As I said doubloons are not possible inside the internal Traefik storage.

I had a look at acme.json, and I can find three different certificates in there.

This is fine, I was just surprised, as I have ~15 routers defined with different hostnames for my global domain. The extra certificates (foo/bar) are the same as seen in the log above.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.