Please explain "Distributed Let's Encrypt" feature of TraefikEE

In Traefik 1 there was (is) a fundamental issue with Let's Encrypt in distributed environment.

When you use several traefik nodes for requesting certificates it is not safe to use file store, as more than one node can potentially write to that store which would not work. Atomic writes for a file system has always been tricky. The solution to this that traefik 1 provides is using a key-value store (either etcd or consul). Those two are distributed by nature; which guarantees that what one instance writes another one can read, and simultaneous writes can be handled as well.

However, the atomicity of these stores are scoped to a single value, and a single value has size limit, both in etcd and consul. Basically, both stores are not designed for storing megabytes of data with in a value.

This means that all certificates for a traefik installation need to fit into a single value. These are compressed, but even though, some customers have installation with thousands certificates, and they quickly hit the storage size limit.

Unfortunately it has proven that there is no easy fix for that: you cannot safely write certs into multiple values because of cuncurrency issues, and increasing the size limit is either impossible (consul) or comes at a hefty performance penalty (etcd).

This is the main reason that this github issue is still open.

New Traefik 2 is promising "Distributed Let's Encrypt" on the Traefik Enterprise Edition page.

The new traefik is going to be based on raft protocol (the same thing that etcd is using already) and therefore will have support for High Availability built-in.

Some information about this can be obtained from the following links:

However even with the information above it is not really clear how "Distributed Let's Encrypt" is planned.

In particular, what about Traefik 2 EE is different from etcd kv store, that makes "Distributed Let's Encrypt" feasible when it was not with etcd in Traefik 1? How configuration will be stored and distributed in Traefik 2 in HA mode?

1 Like

Hey @zespri,

thanks a lot for your interest.

TraefikEE is divided in something we call the control-plane and the data-plane. The data-plane can be seen as the regular proxy, while the control-plane handles the management, which is i.e. requesting data from the orchestrator, or in your specific case, management let's encrypt certificates. Once the control-plane has managed to fetch certificates, these are being distributed to the data-plane.

If you're interested in trying that out by yourself, head over to https://containo.us/traefikee/ and request a free trial key :slight_smile:

If you have further question about EE or want to discuss about your project / usecase, let's chat via email. manuel@containo.us - I'm happy to help :wink:

Hi @zespri!

For the sake of making the information more visible, I jump here to copy / paste the answer to the KV Store question we had during the Back to Traefik 2.0 Meetup (that you linked also).

Current (pre-2.0) traefik can store ACME certificates in a file or a KV store. The latter helps with running a cluster of traefik, but the storage is restricted to 512kb of certificate information when using Consul. Will this improve? (storing in a key per certificate instead of a single key for all, for example)

While working on 2.0, we decided it was time to rework this part. Sticking with the Unix Philosophy, “Make each program do one thing well,” Traefik is getting back to be a good old single-instance pure data plane. At the same time, clustering has been redesigned from scratch using a production-proven rock-solid raft-based implementation, shipped with Traefik Enterprise Edition (see how). Distributed features (like Let’s Encrypt) can now rely on this advanced cluster technology for these use cases. But hey! You can still use multiple Traefik instances with your favorite KV store, we only removed the unstable and experimental part :slight_smile:

In a nutshell, it means that our current plan with this is to

  1. work on (and improve) Traefik v2
  2. keep supporting v1 for bug fixes
  3. use Traefik EE's architecture for everything that is distributed

Regarding the current issue you mentioned, based on the above roadmap, it is not something we are actively working on.

I hope it clarifies our team's current vision on this!

1 Like

So based on this I can presume that clustered Traefik will eventually be a paid-only feature from V2 onwards.

1 Like

Does anyone have more information about TraefikEE, like pricing model or user experience stories?

EDIT: This recorded webinar gives some insight into TraefikEE


EDIT: I'm not sure how secret the pricing is according to Containous, but TraefikEE is between $1000 ➜ $30.000 per year depending on the number of nodes.

Hi @csamu! :slight_smile:

I have more information on that :wink:

If you're interested, please drop me an email: manuel@containo.us. Happy to help out!