dnsChallenge sporadically failing: TXT record invalid

Me and a couple of other people are seeing the same issue with Cloudflare + dns challenge:

time="2020-04-16T10:13:29Z" level=error msg="Unable to obtain ACME certificate for domains \"mysubdomain.example.com\": unable to generate a certificate for the domains [[mysubdomain.example.com](https://mysubdomain.example.com)]: acme: Error -> One or more domains had a problem:\n[[mysubdomain.example.com](https://mysubdomain.example.com)] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Incorrect TXT record \"ca3-xxx\" found at _acme-challenge.mysubdomain.example.com, url: \n" routerName=staging-icecast@docker rule="Host(`mysubdomain.example.com`)" providerName=le.acme

It works sometimes but more often than not the TXT record validation fails with the 403. I can verify in the Cloudflare audit log that the TXT record is being added to Cloudflare just fine, but it fails verification). dnsChallenge.delayBeforeCheck isn't helping.

All of us upgraded to v2.2 but none of us have verified yet that the problem goes away for sure when just downgrading to v2.1. I will work on doing that ASAP.

Anyone else seeing this? Anyone have an idea of what might be going wrong?

Hello,

There is no change between v2.1 an v2.2 on this.

I think it's a DNS propagation issue: the propagation of TXT records over all the DNS can be slow.

To handle that you have to define some custom value for:

  • CLOUDFLARE_POLLING_INTERVAL: Time between DNS propagation check
  • CLOUDFLARE_PROPAGATION_TIMEOUT: Maximum waiting time for DNS propagation

https://go-acme.github.io/lego/dns/cloudflare/#additional-configuration

dnsChallenge.delayBeforeCheck is an historical option, that I recommend to not use it.

Can you elaborate on what you mean by "low"?

It's just a typo: slow

Oh duh, of course, sorry!

  1. You think those variables would work better than just dnsChallenge.delayBeforeCheck?
  2. Do you by any chance know how the default works? That link doesn't mention. I'm inferring from the behavior that the default is to have a very low timeout and then only do one check. Am I understanding correctly that by setting a polling interval of, for example, 10 seconds and a propagation timeout of 5 minutes it would keep polling every 10 seconds for 5 minutes and should therefore work reliably but quickly?

CLOUDFLARE_POLLING_INTERVAL is the time between two checks of the propagation of the TXT records. (default: 2s)

CLOUDFLARE_PROPAGATION_TIMEOUT is the max time to wait for the propagation, if the validation of the propagation succeeded before, the verification is stopped. (default: 2min)

Another point that I forgot to mention: the propagation checks use by default the nameservers define in /etc/resolv.conf, you can override that by using the resolvers option.

https://docs.traefik.io/v2.2/https/acme/#resolvers

I'm using the resolvers option and have them set to Cloudflare's DNS servers (which is what made me think this wasn't an issue with propagation time:

--certificatesResolvers.le.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53

Since I'm using Cloudflare's DNS, I'm getting an error MUCH faster than 2 minutes, and since the error says Incorrect TXT record "ca3-xxx" found instead of something like "couldn't find TXT record" I'm inclined to assume that this isn't a propagation timeout issue.

I'm trying out the settings you suggested and will report back though so we can rule that out as the cause of the issue.

With:

CLOUDFLARE_POLLING_INTERVAL=10
CLOUDFLARE_PROPAGATION_TIMEOUT=300

I get the error:

Unable to obtain ACME certificate for domains \"whoami-test-2.example.com\": unable 
to generate a certificate for the domains [whoami-test-2.example.com]: acme: 
Error -> One or more domains had a problem:\n[whoami-test-2.example.com] 
time limit exceeded: last error: NS sri.ns.cloudflare.com. did not return the 
expected TXT record [fqdn: example.com., 
value: L39NFmezsVNMDMqIwubrYUDF0D8vulrOxmultPC9D08]
: ca3-89aaeb4aa92a4c19a0f493729729b47c\n" providerName=le.acme routerName
=whoami-test-2@docker rule="Host(`whoami-test-2.example.com`)

Note that this error is different. It's not saying the record was invalid, it is saying it wasn't returned.

Timeline:

Container started:  2020-04-16T21:12:21-07:00
TXT Record Added:   2020-04-16T21:12:22-07:00 (value=L39NFmezsVNMDMqIwubrYUDF0D8vulrOxmultPC9D08)
TXT Record Deleted: 2020-04-16T21:17:23-07:00
Traefik error:      2020-04-16T21:17:24-07:00 (looks like CLOUDFLARE_PROPAGATION_TIMEOUT seconds passed)

Keep in mind those are timestamps from different machines (the container and Cloudflare's audit log) so the ordering might not be correct.

The TXT record value added and the error about what value it couldn't find clearly match. I'm not sure what's going on here. =\

It's also not clear to me why I need to be doing the DNS challenge for every subdomain when I'm using a wildcard subdomain certificate.

Any ideas, @ldez? It looks like, from the logging and higher timeout, that the TXT record is being created but then something is timing out after not finding it for 5 minutes.

One more thing: the TXT record is clearly there (I can confirm both in Cloudflare audit log and using dig pinging both Cloudflare DNS servers that I'm specifying with dnsChallenge.resolvers=).

It might be helpful if lego told me (when log level = DEBUG) what it was looking for instead of just saying "Waiting for DNS record propagation" on this line. Something isn't matching the created TXT record and I want to know if it's looking for the wrong name or the wrong value or something else.

this error message come from Let's Encrypt not from lego.

Could you provide a clear view about your infrastruture: static configuration, docker-compose files(s), number routers, etc...

Thank you for helping me investigate further, @ldez

Traefik

./traefik/cloudflare.env

CF_API_EMAIL=example@gmail.com
CF_DNS_API_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxx
CLOUDFLARE_POLLING_INTERVAL=10
CLOUDFLARE_PROPAGATION_TIMEOUT=300

./traefik/docker-compose.yaml:

version: '3.7'

volumes:
  traefik-acme:

services:
  reverse-proxy:
    container_name: traefik-reverse-proxy
    image: "traefik:v2.2"
    command:
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --entrypoints.mopidy_mpd.address=:6660
      - --entrypoints.mopidy_http.address=:6680
      - --entrypoints.icecast.address=:8000
      - --providers.docker
      - --providers.docker.exposedByDefault=false
      - --certificatesResolvers.le.acme.email=example@gmail.com
      - --certificatesResolvers.le.acme.storage=/etc/traefik/acme/acme.json
      - --certificatesResolvers.le.acme.dnsChallenge=true
      - --certificatesResolvers.le.acme.dnsChallenge.provider=cloudflare
      - --certificatesResolvers.le.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
      # - --certificatesResolvers.le.acme.dnsChallenge.delayBeforeCheck=5
      - --log.level=DEBUG
    labels:
      # because exposedByDefault=false
      traefik.enable: "true"
      traefik.http.routers.wildcard-certs.tls.domains[0].main: "example.com"
      traefik.http.routers.wildcard-certs.tls.domains[0].sans: "*.example.com"
      # default http -> https redirection for all routers
      traefik.http.routers.http-catchall.rule: "hostregexp(`{host:.+}`)"
      traefik.http.routers.http-catchall.entrypoints: "web"
      traefik.http.routers.http-catchall.middlewares: "redirect-to-https"
      traefik.http.middlewares.redirect-to-https.redirectscheme.scheme: "https"
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
      - "6600:6600" # mopidy_mpd
      - "6680:6680" # mopidy_http
      - "8000:8000" # icecast
    env_file:
      - "./cloudflare.env"
    volumes: # @TODO Consider not allowing this to harden traefik
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "traefik-acme:/etc/traefik/acme"
    networks:
      - public_web

  ## OAuth - Forward Authentication
  oauth:
    image: thomseddon/traefik-forward-auth
    container_name: traefik-forward-auth
    hostname: oauth
    restart: unless-stopped
    environment:
      PROVIDERS_GOOGLE_CLIENT_ID: ${GOOGLE_CLIENT_ID}
      PROVIDERS_GOOGLE_CLIENT_SECRET: ${GOOGLE_CLIENT_SECRET}
      SECRET: ${OAUTH_SECRET}
      COOKIE_DOMAIN: ${DOMAINNAME}
      INSECURE_COOKIE: "false"
      AUTH_HOST: oauth.${DOMAINNAME}
      URL_PATH: /_oauth
      LOG_LEVEL: debug
      LIFETIME: 2592000 # 30 days
    labels:
      ## Middlewares definitions
      #   rate-limit
      traefik.http.middlewares.middlewares-rate-limit.rateLimit.average: "100"
      traefik.http.middlewares.middlewares-rate-limit.rateLimit.burst: "50"
      #   secure-headers
      traefik.http.middlewares.middlewares-secure-headers.headers.accessControlAllowMethods: "GET,OPTIONS,PUT"
      traefik.http.middlewares.middlewares-secure-headers.headers.accessControlMaxAge: "100"
      traefik.http.middlewares.middlewares-secure-headers.headers.hostsProxyHeaders: "X-Forwarded-Host"
      traefik.http.middlewares.middlewares-secure-headers.headers.sslRedirect: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.stsSeconds: "63072000"
      traefik.http.middlewares.middlewares-secure-headers.headers.stsIncludeSubdomains: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.stsPreload: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.forceSTSHeader: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.customFrameOptionsValue: "allow-from https:example.com"
      traefik.http.middlewares.middlewares-secure-headers.headers.contentTypeNosniff: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.browserXssFilter: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.referrerPolicy: "same-origin"
      traefik.http.middlewares.middlewares-secure-headers.headers.featurePolicy: "camera 'none'; geolocation 'none'; microphone 'none'; payment 'none'; usb 'none'; vr 'none';"
      traefik.http.middlewares.middlewares-secure-headers.headers.customResponseHeaders.X-Robots-Tag: "noindex,nofollow,nosnippet,noarchive,notranslate,noimageindex"
      #   oauth
      traefik.http.middlewares.middlewares-oauth.forwardAuth.address: "http://oauth:4181"
      traefik.http.middlewares.middlewares-oauth.forwardAuth.trustForwardHeader: "true"
      traefik.http.middlewares.middlewares-oauth.forwardAuth.authResponseHeaders: "X-Forwarded-User"
      #   chains
      traefik.http.middlewares.chain-no-auth.chain.middlewares: "middlewares-rate-limit,middlewares-secure-headers"
      traefik.http.middlewares.chain-auth.chain.middlewares: "middlewares-rate-limit,middlewares-secure-headers,middlewares-oauth"
      ## This service
      traefik.enable: "true"
      ## oauth.example.com setup: HTTP Routers
      traefik.http.routers.oauth-rtr.rule: "Host(`oauth.${DOMAINNAME}`)"
      traefik.http.routers.oauth-rtr.tls: "true"
      traefik.http.routers.oauth-rtr.tls.certresolver: "le"
      traefik.http.routers.oauth-rtr.entrypoints: "websecure"
      ## oauth.example.com setup: Middlewares
      traefik.http.routers.oauth-rtr.middlewares: "chain-auth"
      ## oauth.example.com setup: HTTP Services
      ## @TODO Should be able to get rid of this, set EXPOSE: 4181 and let Traefik figure it out?
      traefik.http.routers.oauth-rtr.service: "oauth-svc"
      traefik.http.services.oauth-svc.loadbalancer.server.port: "4181"
    networks:
      - public_web

networks:
  public_web:

Test service, ./whoami-test/docker-compose.yaml:

version: '3.7'

services:
  whoami-test-8:
    image: "containous/whoami"
    labels:
      traefik.enable: "true"
      traefik.http.routers.whoami-test-8.rule: "Host(`whoami-test-8.example.com`)"
      traefik.http.routers.whoami-test-8.tls: "true"
      traefik.http.routers.whoami-test-8.tls.certresolver: "le"
      traefik.http.routers.whoami-test-8.entrypoints: "websecure"
    networks:
      - traefik_public_web

networks:
  traefik_public_web:
    external: true

EDIT: Note that the "oauth" stuff is defined in docker-compose.yaml for completeness but I'm not using the middleware it defines in the whoami test at all, so it should be irrelevant.

version: '3.7'

volumes:
  traefik-acme:

services:
  reverse-proxy:
    container_name: traefik-reverse-proxy
    image: "traefik:v2.2"
    command:
      - --log.level=DEBUG
      - --api

      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entrypoint.to=websecure
      - --entrypoints.web.http.redirections.entrypoint.scheme=https

      - --entrypoints.websecure.address=:443
      - --entrypoints.websecure.http.tls=true
      - --entrypoints.websecure.http.tls.certResolver=le
      - --entrypoints.websecure.http.tls.domains[0].main=example.com
      - --entrypoints.websecure.http.tls.domains[0].sans=*.example.com

      - --entrypoints.mopidy_mpd.address=:6660
      - --entrypoints.mopidy_http.address=:6680
      - --entrypoints.icecast.address=:8000
      
      - --providers.docker.exposedByDefault=false
      
      - --certificatesResolvers.le.acme.email=example@gmail.com
      - --certificatesResolvers.le.acme.storage=/etc/traefik/acme/acme.json
      - --certificatesResolvers.le.acme.dnsChallenge.provider=cloudflare
      - --certificatesResolvers.le.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
      - --log.level=DEBUG
    labels:
      # because exposedByDefault=false
      traefik.enable: "true"

      # Dashboard
      traefik.http.routers.traefik.rule: Host(`traefik.${DOMAINNAME}`)
      traefik.http.routers.traefik.entrypoints: websecure
      traefik.http.routers.traefik.service: api@internal


    restart: unless-stopped
    ports:
      - 80:80
      - 443:443
      - 6600:6600 # mopidy_mpd
      - 6680:6680 # mopidy_http
      - 8000:8000 # icecast
    env_file:
      - ./cloudflare.env
    volumes: # @TODO Consider not allowing this to harden traefik
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - traefik-acme:/etc/traefik/acme
    networks:
      - public_web

  ## OAuth - Forward Authentication
  oauth:
    image: thomseddon/traefik-forward-auth
    container_name: traefik-forward-auth
    hostname: oauth
    restart: unless-stopped
    environment:
      PROVIDERS_GOOGLE_CLIENT_ID: ${GOOGLE_CLIENT_ID}
      PROVIDERS_GOOGLE_CLIENT_SECRET: ${GOOGLE_CLIENT_SECRET}
      SECRET: ${OAUTH_SECRET}
      COOKIE_DOMAIN: ${DOMAINNAME}
      INSECURE_COOKIE: "false"
      AUTH_HOST: oauth.${DOMAINNAME}
      URL_PATH: /_oauth
      LOG_LEVEL: debug
      LIFETIME: 2592000 # 30 days
    labels:
      traefik.enable: "true"
      
      ## oauth.example.com setup: HTTP Routers
      traefik.http.routers.oauth-rtr.rule: Host(`oauth.${DOMAINNAME}`)
      traefik.http.routers.oauth-rtr.entrypoints: websecure
      traefik.http.routers.oauth-rtr.middlewares: chain-auth
      ## oauth.example.com setup: HTTP Services
      ## @TODO Should be able to get rid of this, set EXPOSE: 4181 and let Traefik figure it out?
      traefik.http.routers.oauth-rtr.service: oauth-svc

      traefik.http.services.oauth-svc.loadbalancer.server.port: "4181"

      ## Middlewares definitions

      #   rate-limit
      traefik.http.middlewares.middlewares-rate-limit.rateLimit.average: "100"
      traefik.http.middlewares.middlewares-rate-limit.rateLimit.burst: "50"
      
      #   secure-headers
      traefik.http.middlewares.middlewares-secure-headers.headers.accessControlAllowMethods: "GET,OPTIONS,PUT"
      traefik.http.middlewares.middlewares-secure-headers.headers.accessControlMaxAge: "100"
      traefik.http.middlewares.middlewares-secure-headers.headers.hostsProxyHeaders: "X-Forwarded-Host"
      traefik.http.middlewares.middlewares-secure-headers.headers.sslRedirect: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.stsSeconds: "63072000"
      traefik.http.middlewares.middlewares-secure-headers.headers.stsIncludeSubdomains: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.stsPreload: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.forceSTSHeader: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.customFrameOptionsValue: "allow-from https:example.com"
      traefik.http.middlewares.middlewares-secure-headers.headers.contentTypeNosniff: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.browserXssFilter: "true"
      traefik.http.middlewares.middlewares-secure-headers.headers.referrerPolicy: "same-origin"
      traefik.http.middlewares.middlewares-secure-headers.headers.featurePolicy: "camera 'none'; geolocation 'none'; microphone 'none'; payment 'none'; usb 'none'; vr 'none';"
      traefik.http.middlewares.middlewares-secure-headers.headers.customResponseHeaders.X-Robots-Tag: "noindex,nofollow,nosnippet,noarchive,notranslate,noimageindex"
      
      #   oauth
      traefik.http.middlewares.middlewares-oauth.forwardAuth.address: "http://oauth:4181"
      traefik.http.middlewares.middlewares-oauth.forwardAuth.trustForwardHeader: "true"
      traefik.http.middlewares.middlewares-oauth.forwardAuth.authResponseHeaders: "X-Forwarded-User"
      
      #   chains
      traefik.http.middlewares.chain-no-auth.chain.middlewares: "middlewares-rate-limit,middlewares-secure-headers"
      traefik.http.middlewares.chain-auth.chain.middlewares: "middlewares-rate-limit,middlewares-secure-headers,middlewares-oauth"
      
    networks:
      - public_web

networks:
  public_web:
version: '3.7'

services:
  whoami-test-8:
    image: containous/whoami
    labels:
      traefik.enable: "true"
      traefik.http.routers.whoami-test-8.rule: Host(`whoami-test-8.example.com`)
      traefik.http.routers.whoami-test-8.entrypoints: websecure
    networks:
      - traefik_public_web

networks:
  traefik_public_web:
    external: true

Woo that did the trick! :tada::tada::tada:

But, I don't see why, and if you're willing to spend another moment of your time explaining it to me I would be most appreciative.

I see all the little differences in what you did but I can't spot what made a functional difference. Do you by any chance know not just what is different but what fixed things, @ldez? The old style custom middleware https redirect I was doing that you moved to the entrypoint definition? Moving some of the options from labels to the command? Getting rid of the superfluous tls options on the service router? Changing the oauth service router labels to be above the middleware labels? Something else I'm not even seeing?

Thank you SO much for your help and this product.

Hey @ldez!

I'm facing the same issue. Would you mind explaining what fixes the issue? Any explanation would really benefit me at the moment.

Thanks!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.