ACME cert renewal fails if only one Host() is invalid

Hello everyone,

We have a multi docker container setup with nodejs servers running inside, that are behind a traefik loadbalancer. so far, so good.
These containers are used for a whitelabel app - this means, that our customers route their traffic via a CNAME to us.
This also means, that we need to assign all these hosts from our customers to these containers - we do this like this:

labels:
  - traefik.http.routers.mywhitelabel.rule=Host(`foo.host1.com`) || Host(`bar.anotherhost2.com`) || ...

Works!
BUT, the issue is this: We also have a let's encrypt ACME setup for the SSL certs of all these hosts.
Now, if only ONE of all the Host()s we have mapped to our router has an issue with their CNAME, we get an error like this in our traefik.log (let's imagine that the CNAME setup for foo1.host.com is correct, but the CNAME is wrong/missing for bar.anotherhost2.com):

ERR Unable to obtain ACME certificate for domains error="unable to generate a certificate for the domains [foo.host1.com bar.anotherhost2.com]: error: one or more domains had a problem:\n[bar.anotherhost2.com] acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: NXDOMAIN looking up A for bar.anotherhost2.com - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for bar.anotherhost2.com - check that a DNS record exists for this domain\n"

That is so weird to me! Why would traefik fail on ALL cert renewals, if only 1 host/domain is wrong?

That is really an issue for us! We have up to 50 Host()s mapped to our containers at a time - if only one of our customers f*s up on their CNAME setup, ALL certs for all other domains will not be renewed.

Is this intended? How can we fix this?
Thanks everyone!

I think this is how LetsEncrypt works. You request a cert with multiple domains, all need to work.

Options from my point of view:

  • create individual routers for each domain
  • create individual certs externally and provide them to Traefik (related PoC)