Problem with tls challenge - Get the default traefik certificate

Hi everyone,

I try to migrate traefik from 1.7 to 2.1 assuming all breaking changes.
I have a cluster of docker Swarm working with traefik 1.7 and tls challenge.

I configure traefik 2.1 with tls-challenge too and redirect all http request to https
I want to get a certificate from let's encrypt for my traefik service

log:
  level: DEBUG

api:
  dashboard: true
  debug: true
  insecure: true

entryPoints:
  http:
    address: ":80"
  https:
    address: ":443"

## Static configuration
serversTransport:
  maxIdleConnsPerHost: 7

providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: "traefik-public"
    swarmMode: true
    swarmModeRefreshSeconds: "15s"
    watch: true

certificatesResolvers:
  https:
    acme:
      email: "titi@toto.fr"
      storage: "/certificate/acme/acme.json"
      tlsChallenge: {}
      caServer: https://acme-staging-v02.api.letsencrypt.org/directory

(the mail is replaced by a fake mail titi@toto.fr)

Here my docker-compose file:

version: "3.7"

networks:
  traefik-public:
    driver: overlay
    external: true

services:
  traefik:
    image: traefik:2.1
    hostname: "{{.Node.Hostname}}-{{.Service.Name}}"
    command:
      '--configFile=/etc/traefik/traefik.yml'
    networks:
      - traefik-public
    ports:
      - 80:80
      - 443:443
    volumes:
      - traefik/conf/traefik.yml:/etc/traefik/traefik.yml
      - /var/run/docker.sock:/var/run/docker.sock
      - /traefik/certificate:/certificate
    deploy:
      mode: global
      resources:
        limits:
          memory: 512M
          cpus: '0.6'
        reservations:
          memory: 256M
          cpus: '0.3'
      placement:
        constraints:
          - node.role == manager
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.api.rule=hostregexp(`{host:.+}`)"
        - "traefik.http.routers.api.rule=Host(`traefik.${DOMAIN}`)"
        - "traefik.http.routers.api.entrypoints=http"
        - "traefik.http.routers.api.middlewares=redirect-to-https@docker"
        - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
        - "traefik.http.middlewares.redirect-to-https.redirectscheme.permanent=true"
        - "traefik.http.routers.api-secured.rule=Host(`traefik.${DOMAIN}`)"
        - "traefik.http.routers.api-secured.entrypoints=https"
        - "traefik.http.routers.api-secured.tls.certresolver=https"
        - "traefik.http.services.api.loadbalancer.server.port=8080"

I can reach the dashboard of traefik:

But I cannot obtain the fake certificate from let's encrypt. I did when I test to modify some configuration but I had to disable http-redirect (don't know why, maybe luck ^^)

here the logs of traefik:

time="2020-03-17T17:13:05Z" level=debug msg="Building ACME client..." providerName=https.acme
time="2020-03-17T17:13:05Z" level=debug msg="https://acme-staging-v02.api.letsencrypt.org/directory" providerName=https.acme
time="2020-03-17T17:13:05Z" level=info msg=Register... providerName=https.acme
time="2020-03-17T17:13:05Z" level=debug msg="legolog: [INFO] acme: Registering account for <titi@toto.fr>"
time="2020-03-17T17:13:06Z" level=debug msg="Using TLS Challenge provider." providerName=https.acme
time="2020-03-17T17:13:06Z" level=debug msg="legolog: [INFO] [titi.toto.fr] acme: Obtaining bundled SAN certificate"
time="2020-03-17T17:13:06Z" level=debug msg="legolog: [INFO] [titi.toto.fr] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/44221676"
time="2020-03-17T17:13:06Z" level=debug msg="legolog: [INFO] [titi.toto.fr] acme: use tls-alpn-01 solver"
time="2020-03-17T17:13:06Z" level=debug msg="TLS Challenge Present temp certificate for titi.toto.fr" providerName=acme
time="2020-03-17T17:13:06Z" level=debug msg="legolog: [INFO] [titi.toto.fr] acme: Trying to solve TLS-ALPN-01"
time="2020-03-17T17:13:13Z" level=debug msg="TLS Challenge CleanUp temp certificate for titi.toto.fr" providerName=acme
time="2020-03-17T17:13:13Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/44221676"
time="2020-03-17T17:13:13Z" level=debug msg="legolog: [INFO] Unable to deactivate the authorization: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/44221676"
time="2020-03-17T17:13:13Z" level=error msg="Unable to obtain ACME certificate for domains \"titi.toto.fr\": unable to generate a certificate for the domains [titi.toto.fr]: acme: Error -> One or more domains had a problem:\n[titi.toto.fr] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: During secondary validation: Incorrect validation certificate for tls-alpn-01 challenge. Requested titi.toto.fr from xxx.xxx.xxx.xxx:443. Received 1 certificate(s), first certificate had names \"dfe69af7b225577fe97af9811c26a458.1b7a35d8e1aa23865c04bf369d665757.traefik.default, traefik default cert\", url: \n" providerName=https.acme routerName=api-secured@docker rule="Host(`titi.toto.fr`)"

Can you tell me if my configuration is correct and also where could I dig to find a solution.

Thanks for your time.

best regards.

Update: Still not found the solution. I keep traefikv1.7. If someone have an idea maybe ?

Thank you

best regards

I think I have the same error, with pretty similar config.

My hypothesis is that having multiple instances of Traefik running (services.traefik.deploy.mode: global), you may encounter cases where one instance set the special TLS-ALPN-01 challenge cert on the domain, but the check request from ACME servers is handled by another Traefik instance, presenting the default self-signed certificate. As a result, the challenge fail and the valid certificate is not retrieved.
Sometimes, after 2 or 3 tries, check request is handled by the right Traefik instance, the challenge validation suceed and the certificate is corectly retrieved and stored: problem solved. But if the wrong behavior occur too much, then let's encrypt detect too many fail and throttle following tentatives, ending with 429 errors.

If my hypothesis is valid, then we may solve the issue by ensuring only one instance of traefik is running. Or, maybe we could find a way to ensure all traefik instances will present the same TLS-ALPN-01 certificate on all instance for the domain currently in validation ? The question is : how does traefik store this special certificate while challenge is in progress ?

Hi @alorence,

I solved the problem with let's encrypt community:

On my side, I have a Loadbalancer in front of my swarm cluster so the httpchallenge or tls challenge can't work usually because several let's encrypt server will try to check the challenge. My lb will just dispatch the request and the challenge will fail.

I have migrated to dns challenge which works very nice.

An other point, you can't do High avaibility with traefik community because when you have several instances of traefik, one will generate the certificate and store it on a nfs server (acme.json). But the other instance won't reload the acme json file unless you do a docker service update to restart them.

You can do the same test that I did.

  1. Just deploy traefik global
  2. Deploy a service with traefik label and tls with acme generate
  3. Go to your service with a browser and regenerate the page with no cache. You will see when you hit your traefik that generate the certificate, it's ok but when you hit others traefik instance, you will get the default traefik certificate.
  4. docker service update --force traefik
  5. Go to your page and regenerate (it's working)

So you can deploy only one traefik instance that generate certificate and the service is loadbalancing throught the routing mesh of docker nodes. So you can have a little delay when your traefik crash and get restarted by docker.

HA is available with traefik enterprise as I understand

hilne

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.