Let's Encrypt x509: certificate signed by unknown authority

I'm running an instance of the official Traefik 2.4.9 docker image with two cert resolvers. Let's encrypt for public-facing services and Step CA for subdomains that only exist in the company-wide nameserver (such as Portainer and Traefik's dashboard). My traefik.toml looks as follows (redacted some names):

[global]
  sendAnonymousUsage = false
  
[serversTransport]
  rootCAs = ["/etc/traefik/acme.crt"]

[entryPoints]
  [entryPoints.http]
    address = ":80"
  [entryPoints.https]
    address = ":443"

[providers]
  [providers.docker]
    swarmMode = true
    network = "traefik"
    defaultRule = "Host(`{{ normalize .Name }}.docker.ourcompany.com`)"

[api]
  dashboard = true

[certificatesResolvers]
  [certificatesResolvers.letsencrypt]
    [certificatesResolvers.letsencrypt.acme]
      email = "traefik@ourcompany.com"
      storage = "/etc/traefik/acme.json"
      [certificatesResolvers.letsencrypt.acme.tlsChallenge]
  [certificatesResolvers.step-ca]
    [certificatesResolvers.step-ca.acme]
      email = "traefik@ourcompany.com"
      caServer = "https://docker.ourcompany.com:9001/acme/acme/directory"
      storage = "/etc/traefik/step-ca_acme.json"
      [certificatesResolvers.step-ca.acme.httpChallenge]
        entryPoint = "http"

Some explanation for the less intuitive parts:

  • /etc/traefik/acme.crt is the root cert that our Step CA uses.
  • Default resolver and http-to-https redirect are left out of the global config and configured on a per service basis.
  • The only mounted volumes are /var/run/docker.sock and /etc/traefik, so nothing that would interfere with /etc/ssl (see Traefik can't connect to lets encrypt directory)

Services that get their cert through Step CA work without any problems but when I set a service to use Let's Encrypt, I get the following error:

time="2021-07-06T10:40:37Z" level=error msg="Unable to obtain ACME certificate for domains \"umfrage.hebrech.de\": cannot get ACME client get directory at 'https://acme-v02.api.letsencrypt.org/directory': Get \"https://acme-v02.api.letsencrypt.org/directory\": x509: certificate signed by unknown authority" providerName=letsencrypt.acme rule="Host(`survey.ourcompany.com`)" routerName=limesurvey-web-secure@docker

Here are my service labels:

traefik.http.routers.limesurvey-web-secure.entrypoints=https
traefik.http.routers.limesurvey-web-secure.rule=Host(`survey.ourcompany.com`)
traefik.http.routers.limesurvey-web-secure.tls.certresolver=letsencrypt
traefik.http.routers.limesurvey-web.entrypoints=http
traefik.http.routers.limesurvey-web.middlewares=redirect-to-https
traefik.http.routers.limesurvey-web.rule=Host(`survey.ourcompany.com`)
traefik.http.services.limesurvey-service.loadbalancer.server.port=80

Opening a shell in the container and accessing https://acme-v02.api.letsencrypt.org/directory through wget (curl is not installed) works fine so it's not a problem of the company firewall redirecting us somewhere. It doesn't seem to be a general problem with Traefik either as a different server with the same version that doesn't have Step CA configured works without any problems.

I've tried the following:

  • Added the right root cert for Let's Encrypt to rootCAs
  • Added insecureSkipVerify = true
  • Added insecureSkipVerify = true and removed rootCAs completely

Nothing seems to work, what am I missing?

IIRC the rootCA is used to append the root list for backend server connections, not for provider connections outbound.

This means that if you configure a rootCA, it will not apply to external services contacted by traefik, such as the LE endpoint etc.

Can you shell into your container and see if any of the CA certs listed in this file are present?

https://golang.org/src/crypto/x509/root_linux.go

Yes, /etc/ssl/certs/ca-certificates.crt is present. As said, this is the official, unmodified docker image traefik:v2.4.9 and retreiving the URL that appears in the log through attaching to the container and running wget works just fine.

As an additional experiment, I installed curl inside the container (via apk add curl) and ran the following:

# curl -vvv https://acme-v02.api.letsencrypt.org/directory
*   Trying 172.65.32.248:443...
* TCP_NODELAY set
* Connected to acme-v02.api.letsencrypt.org (172.65.32.248) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=acme-v01.api.letsencrypt.org
*  start date: Jul  5 21:25:13 2021 GMT
*  expire date: Oct  3 21:25:12 2021 GMT
*  subjectAltName: host "acme-v02.api.letsencrypt.org" matched cert's "acme-v02.api.letsencrypt.org"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5596eef05600)
> GET /directory HTTP/2
> Host: acme-v02.api.letsencrypt.org
> user-agent: curl/7.67.0
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200 
< server: nginx
< date: Fri, 16 Jul 2021 08:22:31 GMT
< content-type: application/json
< content-length: 658
< cache-control: public, max-age=0, no-cache
< x-frame-options: DENY
< strict-transport-security: max-age=604800
< 
{
  "-pRnA-eo1BM": "https://community.letsencrypt.org/t/adding-random-entries-to-the-directory/33417",
  "keyChange": "https://acme-v02.api.letsencrypt.org/acme/key-change",
  "meta": {
    "caaIdentities": [
      "letsencrypt.org"
    ],
    "termsOfService": "https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf",
    "website": "https://letsencrypt.org"
  },
  "newAccount": "https://acme-v02.api.letsencrypt.org/acme/new-acct",
  "newNonce": "https://acme-v02.api.letsencrypt.org/acme/new-nonce",
  "newOrder": "https://acme-v02.api.letsencrypt.org/acme/new-order",
  "revokeCert": "https://acme-v02.api.letsencrypt.org/acme/revoke-cert"
* Connection #0 to host acme-v02.api.letsencrypt.org left intact

As you can see, curl is completely fine with that certificate.

I've tried modifying my config in several different ways. Removing step-ca, setting log level to DEBUG, adding the HTTPS redirect and default resolver. Nothing helps. I'm really out of ideas. Could it be that the error message is somehow wrong and the error appears while fetching an entirely different URI?

I'd be really grateful for any help. This has been delaying the rollout of this server for almost two weeks now.

Edit:
The most basic config I tried was this:

[global]
  sendAnonymousUsage = false

[entryPoints]
  [entryPoints.http]
    address = ":80"
    [entryPoints.http.http]
      [entryPoints.http.http.redirections]
        [entryPoints.http.http.redirections.entryPoint]
          to = "https"
          scheme = "https"
  [entryPoints.https]
    address = ":443"
    [entryPoints.https.http.tls]
      certResolver = "letsencrypt"

[providers]
  [providers.docker]
    swarmMode = true
    network = "traefik"
    defaultRule = "Host(`{{ normalize .Name }}.docker.ourcompany.com`)"

[api]
  dashboard = true

[certificatesResolvers]
  [certificatesResolvers.letsencrypt]
    [certificatesResolvers.letsencrypt.acme]
      email = "traefik@ourcompany.com"
      storage = "/etc/traefik/acme.json"
      [certificatesResolvers.letsencrypt.acme.tlsChallenge]

Could it be that traefik internally uses a different DNS resolver than the rest of the container it's running in and therefore doesn't actually reach the real acme-v02.api.letsencrypt.org? If so, how could I debug that?

Hello @DFYX,

It is possible that Traefik may not use the same resolver than the container. There is a go issue that was raised that demonstrates some examples of this happening: (net: Go DNS resolver does not read /etc/hosts · Issue #22846 · golang/go · GitHub).

There is a shell script with a test go program that does a simple lookup demonstrating the issue encountered in the ticket.

What orchestration platform are you using? Docker? Kube?

We're using docker 19.03.13 in swarm mode.

As for the test script, I've changed the domain to that of Let's Encrypt, compiled it and copied it to my traefik container where it gives the correct result:

# GODEBUG=netdns=10 /etc/traefik/go_dns_letsencrypt 
go package net: built with netgo build tag; using Go's DNS resolver
go package net: hostLookupOrder(acme-v02.api.letsencrypt.org) = files,dns
[172.65.32.248 2606:4700:60:0:f53d:5624:85c7:3a2c] <nil>

Hmmmmm. @ldez any ideas?

Maybe something is messed up in your /etc/traefik/acme.json - possibly from earlier tries/setups?
If you have some Step CA certificates in that file it may cause problems with LE.
It may be worth a try to rename that file and give it a new try.

Didn't work either. I renamed acme.json to acme.json.bak and restarted traefik. It created a completely empty acme.json (not even the stuff that doesn't depend on individual domains) and threw the same error.

Is there any way to get some more debug info? Like the IP address it tries to reach and the certificate it refuses to accept?

I'm still struggling with this. Even after updating to traefik 2.5 and changing my docker networking settings. Could someone please at least tell me how to debug this so I can find out what traefik is doing here?

@daniel.tomcej I finally found the solution. The culprit was LEGO_CA_CERTIFICATES which I had set to my internal CA's root cert as described by the documentation. What I didn't realize was that this replaces the default certs instead of adding to them. This means that Let's Encrypt's valid cert gets rejected.

My current workaround is to add merge all needed certificates into a custom acme.crt file. A better solution might be to either change the behavior of LEGO_CA_CERTIFICATES to treat it as additional certificates or to allow multiple files separated by commas. Either way, lego would need to be changed. A third option would be to install my root cert into the container's global cert store but that would need a custom container or some scripting.

Edit: I filed a lego issue under Add option to have LEGO_CA_CERTIFICATES add to the local cert pool instead of replacing it · Issue #1490 · go-acme/lego · GitHub

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

In the next version of Traefik (v2.5.7), you will have all the options to handle your problem:

  • LEGO_CA_CERTIFICATES now accepts multiple file paths to be added by using os.PathListSeparator (: on POSIX, ; on Windows) as a separator.
  • LEGO_CA_SYSTEM_CERT_POOL initiates the cert pool with a copy of the system cert pool.