time="2020-04-16T10:13:29Z" level=error msg="Unable to obtain ACME certificate for domains \"mysubdomain.example.com\": unable to generate a certificate for the domains [[mysubdomain.example.com](https://mysubdomain.example.com)]: acme: Error -> One or more domains had a problem:\n[[mysubdomain.example.com](https://mysubdomain.example.com)] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Incorrect TXT record \"ca3-xxx\" found at _acme-challenge.mysubdomain.example.com, url: \n" routerName=staging-icecast@docker rule="Host(`mysubdomain.example.com`)" providerName=le.acme
It works sometimes but more often than not the TXT record validation fails with the 403. I can verify in the Cloudflare audit log that the TXT record is being added to Cloudflare just fine, but it fails verification). dnsChallenge.delayBeforeCheck isn't helping.
All of us upgraded to v2.2 but none of us have verified yet that the problem goes away for sure when just downgrading to v2.1. I will work on doing that ASAP.
Anyone else seeing this? Anyone have an idea of what might be going wrong?
You think those variables would work better than just dnsChallenge.delayBeforeCheck?
Do you by any chance know how the default works? That link doesn't mention. I'm inferring from the behavior that the default is to have a very low timeout and then only do one check. Am I understanding correctly that by setting a polling interval of, for example, 10 seconds and a propagation timeout of 5 minutes it would keep polling every 10 seconds for 5 minutes and should therefore work reliably but quickly?
CLOUDFLARE_POLLING_INTERVAL is the time between two checks of the propagation of the TXT records. (default: 2s)
CLOUDFLARE_PROPAGATION_TIMEOUT is the max time to wait for the propagation, if the validation of the propagation succeeded before, the verification is stopped. (default: 2min)
Another point that I forgot to mention: the propagation checks use by default the nameservers define in /etc/resolv.conf, you can override that by using the resolvers option.
Since I'm using Cloudflare's DNS, I'm getting an error MUCH faster than 2 minutes, and since the error says Incorrect TXT record "ca3-xxx" found instead of something like "couldn't find TXT record" I'm inclined to assume that this isn't a propagation timeout issue.
I'm trying out the settings you suggested and will report back though so we can rule that out as the cause of the issue.
Unable to obtain ACME certificate for domains \"whoami-test-2.example.com\": unable
to generate a certificate for the domains [whoami-test-2.example.com]: acme:
Error -> One or more domains had a problem:\n[whoami-test-2.example.com]
time limit exceeded: last error: NS sri.ns.cloudflare.com. did not return the
expected TXT record [fqdn: example.com.,
value: L39NFmezsVNMDMqIwubrYUDF0D8vulrOxmultPC9D08]
: ca3-89aaeb4aa92a4c19a0f493729729b47c\n" providerName=le.acme routerName
=whoami-test-2@docker rule="Host(`whoami-test-2.example.com`)
Note that this error is different. It's not saying the record was invalid, it is saying it wasn't returned.
Timeline:
Container started: 2020-04-16T21:12:21-07:00
TXT Record Added: 2020-04-16T21:12:22-07:00 (value=L39NFmezsVNMDMqIwubrYUDF0D8vulrOxmultPC9D08)
TXT Record Deleted: 2020-04-16T21:17:23-07:00
Traefik error: 2020-04-16T21:17:24-07:00 (looks like CLOUDFLARE_PROPAGATION_TIMEOUT seconds passed)
Keep in mind those are timestamps from different machines (the container and Cloudflare's audit log) so the ordering might not be correct.
The TXT record value added and the error about what value it couldn't find clearly match. I'm not sure what's going on here. =\
Any ideas, @ldez? It looks like, from the logging and higher timeout, that the TXT record is being created but then something is timing out after not finding it for 5 minutes.
One more thing: the TXT record is clearly there (I can confirm both in Cloudflare audit log and using dig pinging both Cloudflare DNS servers that I'm specifying with dnsChallenge.resolvers=).
It might be helpful if lego told me (when log level = DEBUG) what it was looking for instead of just saying "Waiting for DNS record propagation" on this line. Something isn't matching the created TXT record and I want to know if it's looking for the wrong name or the wrong value or something else.
EDIT: Note that the "oauth" stuff is defined in docker-compose.yaml for completeness but I'm not using the middleware it defines in the whoami test at all, so it should be irrelevant.
But, I don't see why, and if you're willing to spend another moment of your time explaining it to me I would be most appreciative.
I see all the little differences in what you did but I can't spot what made a functional difference. Do you by any chance know not just what is different but what fixed things, @ldez? The old style custom middleware https redirect I was doing that you moved to the entrypoint definition? Moving some of the options from labels to the command? Getting rid of the superfluous tls options on the service router? Changing the oauth service router labels to be above the middleware labels? Something else I'm not even seeing?