How to Force-update Let's Encrypt Certificates

On January 26 Let’s Encrypt announced that all certificates verified through a TLS-ALPN-01 challenge and created between October 29, 2021 and 00:48 UTC January 26, 2022 will be revoked starting at 16:00 UTC on January 28, 2022.

This is in response to a flaw that was discovered in the library that handles the TLS-ALPN-01 challenge. That flaw has been fixed, and the Let's Encrypt policy states that any mis-issued certificates must be revoked within five days.

Traefik Proxy and Traefik Enterprise users with certificates that meet these criteria must force-renew the certificates before that time. If this does not happen, visitors to any property secured by a revoked certificate may receive errors or warnings until the certificates are renewed.

This article presents step-by-step instructions on how to determine if you are affected by this event, and if so, how to update certificates for Traefik Proxy and Traefik Enterprise.

If you are using Traefik Enterprise v1.x, please reach out directly to Traefik Labs Support, and we will happily help you with the update.

If you have any questions about the process, or if you encounter any problems performing the updates, please reach out to Traefik Labs Support (for Traefik Enterprise customers) or post on the Community Forum (for Traefik Proxy users).

Traefik Proxy v2.x

(These instructions assume that you are using the default certificate store named acme.json.)

No Persistent Storage

If acme.json is not saved on a persistent volume (Docker volume, Kubernetes PersistentVolume, etc), then when Traefik Proxy starts, no acme.json file is present. Traefik Proxy will obtain fresh certificates from Let’s Encrypt and recreate acme.json.

If this is how your Traefik Proxy is configured, then restarting the Traefik Proxy container or Deployment will force all of the certificates to renew.

There may be a few seconds of downtime as Traefik Proxy restarts. Traefik Proxy will also use self-signed certificates for 30-180 seconds while it retrieves new certificates from Let’s Encrypt.

Persistent Storage

If your environment stores acme.json on a persistent volume (Docker volume, Kubernetes PersistentVolume, etc), then the following steps will renew your certificates.

1. Check your certificate resolver configuration

Check if the static configuration contains certificate resolvers using the TLS-ALPN-01 challenge.

Depending on how Traefik Proxy is deployed, the static configuration for the certificate resolvers can be:

  • In a configuration file
  • In the command-line arguments
  • In the environment variables

Certificate resolvers using the TLS-ALPN-01 challenge will have the tlsChallenge configuration key that might look like this:

certificatesResolvers:
  myresolver:
    acme:
      # ...
      tlsChallenge: {}

If using command-line arguments, it might look like this:

--certificatesresolvers.myresolver.acme.tlschallenge=true

See our configuration documentation to find which type of static configuration your environment uses.

If you do not find any certificate resolvers with tlsChallenge in their configuration, then your certificates will not be revoked.

If you do find this key, continue to the next step.

2. Find if the resolver is in use by any routers

A certificate resolver is only used if it is referenced by at least one router.

Review your configuration to determine if any routers use this resolver. In the example above, the resolver is named “myresolver,” and a router that uses it could look like any of the following:

Dynamic Configuration
http:
  routers:
    myrouter:
      # ...
      tls:
        certResolver: myresolver
Kubernetes Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  annotations:
    traefik.ingress.kubernetes.io/router.tls.certresolver: myresolver
Kubernetes IngressRoute
apiVersion: networking.k8s.io/v1
kind: IngressRoute
metadata:
  name: example-ingressroute
spec:
  # ...
  tls:
    certResolver: myresolver
Docker Compose
version: "3"
services:
  my-container:
    # ...
    labels:
      - "traefik.http.routers.myrouter.tls.certresolver=myresolver"

If you do not find any router using the certificate resolver you found in the first step, then your certificates will not be revoked.

If you do find a router that uses the resolver, continue to the next step.

3. Clean acme.json and restart Traefik Proxy
Make a backup of acme.json before continuing.

Edit acme.json to remove all certificates linked to the certificate resolver (or resolvers) identified in the earlier steps.

The acme.json file has the following form:

{
  "certResolverName": { # <- Name of the certificate resolver using TLS-ALPN-01 challenge
    "Account": {
      "Email": "admin@example.com",
      "Registration": {
        "body": {
          "status": "valid",
          "contact": [
            "<mailto:admin@example.com>"
          ]
        },
        "uri": "<https://acme-v02.api.letsencrypt.org/acme/acct/…>"
      },
      "PrivateKey": "redacted",
      "KeyType": "4096"
    },
    "Certificates": [...] # <- Certificate array which needs to be cleaned
  },
  ...
}

Remove all certificates in the Certificates array that were issued before 00:48 UTC January 26, 2022.

If you prefer, you may also remove all certificates. They will all be reissued.

Save the file and exit, and then restart Traefik Proxy.

Certificates that have been removed will be reissued when Traefik restarts, within the constraints of the Let’s Encrypt rate limits.

If you have such a large volume of certificates to renew that you hit the limits (300 new orders within 3 hours), consider updating your certificates in batches over a time that doesn’t exceed the limits. Alternatively, you can follow the guidance in the Let’s Encrypt forum and reach out to Let’s Encrypt to have those limits raised for this event.

Traefik Proxy v1.x

File storage (acme.json)

If you use file storage in v1.7, follow the steps above for Traefik Proxy v2.x.

Key/Value (KV) Storage

When using KV Storage, each resolver is configured to store all its certificates in a single entry.

Therefore, you have two choices:

  1. Remove the entry corresponding to a resolver. This will remove all the certificates for that resolver.
  2. If you do not want to remove all certificates, then carefully edit the resolver entry to remove only certificates that will be revoked.

Trigger a reload of the dynamic configuration to make the change effective.

Traefik Enterprise v2

You can use the teectl command to obtain a list of all certificates and then force Traefik Enterprise to obtain new ones.

Execute the followings steps:

  1. Get the list of all ACME certificates
teectl get acme-certs

The result of that command is the list of all certificates with their IDs.

2. Delete each certificate by using the following command:

# For Let's Encrypt production environment:
teectl delete acme-cert \
  --caserver https://acme-v02.api.letsencrypt.org/directory \
  --id=<ID>
# For Let's Encrypt staging environment:
teectl delete acme-cert \
  --caserver https://acme-staging-v02.api.letsencrypt.org/directory \
  --id=<ID>

3. Check the log file one one of the controllers to see if a new dynamic configuration has been applied. Traefik Enterprise should automatically obtain the new certificate.

If you have such a large volume of certificates to renew that you hit the limits (300 new orders within 3 hours), consider updating your certificates in batches over a time that doesn’t exceed the limits. Alternatively, you can follow the guidance in the Let’s Encrypt forum and reach out to Let’s Encrypt to have those limits raised for this event.

Traefik Enterprise v1

If you use Traefik Enterprise v1 please get in touch with support directly and we will happily help you make the necessary changes to your environment.

Conclusion

Security events are a fact of Internet life, and when they happen, a swift response is the best way to mitigate risk. Let's Encrypt has done precisely that, and while revoking certificates with short notice has sent everyone scrambling, it also assures that no invalid or misissued certificates will be protecting anyone's Internet properties.

These steps will enable any user of Traefik Proxy or Traefik Enterprise to update their certificates before Let's Encrypt revokes them. If you have any questions, please reach out to Traefik Labs Support or make a post in the Community Forum.


This is a companion discussion topic for the original entry at https://traefik.io/blog/how-to-force-update-lets-encrypt-certificates/
1 Like

Thanks! I made a bash one-liner for checking your current traefik instance(s) for the tlschallenge argument, since the article is indicating that's the most reliable way to know if you're affected.

2 Likes

It's a pity you guys spent time on writing this nice instructions rather than releasing a hotfix version automatically re-issuing problematic certificates. it's a matter of couple of lines in code...

Here is quick automatic workaround #1 to mass renew all your certificates:

  • upgrade to 2.6.0
  • set certificatesDuration to 6480, this will force renew of certificates cause their expiry time <4mo.
  • remove certificatesDuration from your config
    NOTE: regardless of large certificatesDuration parameter new certificates will be issued for 3mo as usual. However, traefik has a bug and some certificates may not renew which failed to renew on the very first second of start. See #2 solution then.

Solution #2:

  • checkout 2.6.0 source code
  • edit pkg/provider/acme/provider.go, func renewCertificates() and change condition like this:

old : if err != nil || crt == nil || crt.NotAfter.Before(time.Now().Add(renewPeriod)) {
new: if err != nil || crt == nil || crt.NotAfter.Before(time.Now().Add(renewPeriod)) || crt.NotBefore.Before(time.Date(2022, 1, 27, 9, 0, 0, 0, time.UTC)) {

  • build your own docker image after that using: make build-image

I noticed after I finished the instructions, checking the logs, I saw errors like:

Unable to obtain ACME certificate for domains ... unable to generate a certificate for the domains [dashboard.hephy.pro]: error: one or more domains had a problem:\n[dashboard.hephy.pro] acme: error: 400 :: urn:ietf:params:acme:error:tls :: remote error: tls: internal error\n"

I didn't see any messages indicating recovery from this issue, but it does appear that all certs were reissued correctly :+1: going through and manually verifying each one now, as I have less than about a dozen. I should set up some monitoring for these, (what is a good way to monitor my Traefik proxies broadly speaking, if I want to capture things like "certs about to expire" or "configured for TLS but not answering with a valid certificate"? Is there any guide or recommended solution?)

This is awesome! Thanks for sharing!

1 Like

Thanks for sharing your method for addressing the issue! Unfortunately, it takes two restarts and an update, which some users may not be ready for, so it would not work globally, but we appreciate your take on it.

Hello @kingdonb

Thanks a lot for joining the discussion.

In fact, this is a very generic error happening while Traefik is not yet ready to present a temporary certificate containing the validation token during the TLS challenge negotiation process between ACME servers and the Traefik instance that started the process of obtaining a certificate.

The "urn:ietf:params:acme:error:tls" might be also related to DNS issues (ex: wrong AAAA, ...) or network issues (firewall, IP assignment).

There are no ready-to-use tools to capture similar things, there are only occurring in the log file, so maybe adding a log aggregation solution should address such a requirement. The alert might be created based on the log entries produced by Traefik. However, Traefik will automatically renew a certificate before the expiration date.

Thanks for addressing that, we will definitely consider having a guide explaining in detail the meaning of the log files.

For now, I can offer a great talk recorded by Jerome Perazzoni about Lets Encrypt and Traefik.

Hi,
I am using CVAT and trying to get my revoked certificate renewed.

I tried the steps listed to clean out the certificates array and restarted the server... it did not populate the certificates in acme.json

Next I tried completely removing the acme.json file stopping all the docker containers, exporting:

Note: I replace my real values below just like I did when I set it up the first time.
export CVAT_HOST=<YOUR_DOMAIN>
export ACME_EMAIL=<YOUR_EMAIL>

then running:

docker-compose -f docker-compose.yml -f docker-compose.https.yml up -d

It still didn't work.

Here are some of the settings:

  traefik:
    image: traefik:v2.4
    container_name: traefik
    restart: always
    command:
      - "--providers.docker.exposedByDefault=false"
      - "--providers.docker.network=cvat"
      - "--entryPoints.web.address=:8080"
services:
  cvat:
    labels:
      - traefik.http.routers.cvat.entrypoints=websecure
      - traefik.http.routers.cvat.tls.certresolver=lets-encrypt

  cvat_ui:
    labels:
      - traefik.http.routers.cvat-ui.entrypoints=websecure
      - traefik.http.routers.cvat-ui.tls.certresolver=lets-encrypt

  traefik:
    image: traefik:v2.4
    container_name: traefik
    command:
      - "--providers.docker.exposedByDefault=false"
      - "--providers.docker.network=cvat"
      - "--entryPoints.web.address=:80"
      - "--entryPoints.web.http.redirections.entryPoint.to=websecure"
      - "--entryPoints.web.http.redirections.entryPoint.scheme=https"
      - "--entryPoints.websecure.address=:443"
      - "--certificatesResolvers.lets-encrypt.acme.email=${ACME_EMAIL:?Please set the ACME_EMAIL env variable}"
      - "--certificatesResolvers.lets-encrypt.acme.tlsChallenge=true"
      - "--certificatesResolvers.lets-encrypt.acme.storage=/letsencrypt/acme.json"

Can someone please help me?
Thanks
~Randy

Hi ... replying to myself.

OK I figured it out... bottom line was firewall was blocking letsencrypt from doing its thing. You must allow 80 and 443 ingress from any. I knew that, but had a mistake in my firewall rule.

Anyway here are a few of the troubleshooting tips I wish I had from the get go.

To get the logs of traefik:
get the id of the docker by: docker ps
using the docker id of traefik: docker logs [id from above]

the logs showed me the error about "probably firewall"

So the solution ultimately was:
open firewall
delete acme.json for me it is located at: /var/lib/docker/volumes/cvat_cvat_letsencrypt/_data/acme.json

stop all cvat dockers:
docker-compose down

make sure variables set (with your values):
export CVAT_HOST=[your domain]
export ACME_EMAIL=[your email to register with letsencrypt]

restart docker (make sure you are in the folder where the yaml files are):
docker-compose -f docker-compose.yml -f docker-compose.https.yml up -d

Hope that helps someone.

~Randy

2 Likes

Here a cron job docker that i created a config for, that can do this

version: "3"
services:
   cron_traefik:
      image: mcuadros/ofelia:latest
      container_name: "${COMPOSE_PROJECT_NAME}_cron-traefik"
      command: daemon --docker
      user: 0:0
      volumes:
         - /var/run/docker.sock:/var/run/docker.sock:ro
         # allow access to docker instance
         - "/usr/local/bin/docker:/usr/local/bin/docker"
      labels:
         ofelia.job-local.cron_traefik_prep.schedule: "45 3 1 1,3,5,7,9,11 *"
         ofelia.job-local.cron_traefik_prep.command: "docker exec <container_name> rm -f /letsencrypt/acme.json"
         ofelia.job-local.cron_traefik_restart.schedule: "50 3 1 1,3,5,7,9,11 *"
version: "3"
services:
   cron_traefik:
      image: mcuadros/ofelia:latest
      container_name: "${COMPOSE_PROJECT_NAME}_cron-traefik"
      command: daemon --docker
      user: 0:0
      volumes:
         - /var/run/docker.sock:/var/run/docker.sock:ro
         # allow access to docker instance
         - "/usr/local/bin/docker:/usr/local/bin/docker"
      labels:
	# 6 placeholders in this CRON instead of the usual 5
         ofelia.job-local.cron_traefik_prep.schedule: "0 45 3 1 1,3,5,7,9,11 *"
         ofelia.job-local.cron_traefik_prep.command: "docker exec <container_name> rm -f /letsencrypt/acme.json"
         ofelia.job-local.cron_traefik_restart.schedule: "0 50 3 1 1,3,5,7,9,11 *"
         ofelia.job-local.cron_traefik_restart.command: "docker restart <container_name>"
         ofelia.job-local.cron_traefik_restart.command: "docker restart <container_name>"