Traefik 2.0.4 letsencrypt

After a few hours I finally got traefik 2 to run with the new label formats and get access to the API dashboard. I'm pretty happy with the results albeit the things that aren't really well described in the documentation regarding ports and why with docker swarm you still have to expose some port even if you use the api@internal otherwise traefik isn't happy with a container not exposing any ports.

Then I realized that the acme.json config file wasn't directed to my /etc/traefik/acme.json but decided to create its own /acme.json everytime I restarted traefik. As a result each consecutive reboot forced traefik to ask again and again for certificates as each time the acme.json was removed...

Then I reached the limit rates and was wondering.. Is there a way to limit how often dns are queried? I've seen a couple of config file but no where I can find any information of what they're supposed to actually do:

For instance:

  • certificatesResolvers.sample.acme.dnsChallenge.delayBeforeCheck

I guess it's a delay but a delay of what and before a check of what? Is it a delay between checks or after a container is started?

Then my question is as I got working certificates between a few reboots when is it going to be back working? Can I keep traefik running and it will solve itself in a few days or should I disable https for some time. I fear the errors are making traefik do more request than necessary and locking itself perpetually.

The other thing is now that acme.json will not get removed/cleared is there a way to prevent / limit traefik to get the server locked like it is right now. It's not a big deal for me as it's my personal server but I'd like to upgrade our traefik 1.7 to 2.0 at work but I'm a bit scared to have it blow letsencrypt. That would be a bit more dramatic as it may impact quite a lot of important things.

Which things, regarding ports?

Perhaps, you misunderstood or misconfigured something around some ports? :wink:

Yep, that happens, when you make a mistake. This is not really related to traefik, but the concept is quite simple: if you want files persist across container runs, you need to mount a volume on docker level, so that those files become external to the container. If you do not do this, every time container starts it's a brand new copy using image as a blueprint.

DNS is normally queries when an aplication needs to translate a DNS name such as www.microsoft.com to an ip address. But from the context you gave above, I don't think this is what you are asking. I think what you wanted to ask is if it is possible to limit traefik's certificates requests to Let's Encrypt. The answer to it, that this is done automatically. Treafik will not try to renew your certificate unless it's expired, so you have a natural limit here, given by the cert's expiration date.

acme.json contains information that traefik writes and then uses to determine if a new cert request is required. In order for the brand new container (as in your description) to know about previous calls to Let's Encrypt you need to make acme.json with this information available to that brand new continaer via mounting a docker volume. There is an example in documentation here.

This is described in documentation right before the resolvers sectoion (scroll up). Refrasing what the documentation tells you, this option instead of polling the DNS server for the DNS record before issuing Let's Encrypt request, simpy adds a delay before Let's Encrypt request, hoping that this delay was enough to create the DNS record. This is useful, when you are behind a corproate firewall and cannot use issue external DNS queries from there.

This is the delay between the time Traefik sent DNS update to the selected provider, and the time it issues NS lookup to see that the record has appeared in the DNS query result.

There is a page on Let's Encrypt site that details all rate limits applicable. I don't rember the details from the top of my head, but I do remember that I had no issue looking up this information before.

My advice would be is first to make sure that you have a working confiuration. Use staging Let's Encrypt in the meanwhile - it does not have the rate limit. Until you are sure that it is configured to your satisfaction, I think it's not a very good idea to hope that "it will solve itself".

Even if you do not lock yourself out perpetually, it still would not be a good Etiquette, to let your container hummer a free service :wink:

As mentioned above, there is nothing special you need to do. It just works.

Hope this helps!

Perhaps, you misunderstood or misconfigured something around some ports? :wink:

When I configure traefik api just like here: https://docs.traefik.io/operations/api/#configuration

You can see that no loadbalancer is defined (and it makes sense). But in docker swarm, it fails with "port is missing" error and skip the proxy container. Defining one will make traefik not skip the service regardless of the port being used or not.

And this adds up to the confusion: https://docs.traefik.io/routing/providers/docker/#services

The character @ is not authorized in the service name <service_name> .

So it's difficult to understand why you have to define a service for api@internal because the docker provider expect a port to be defined on the container.

Which things, regarding ports?

Well almost nowhere it's clearly explained how to expose ports. If you read the v1 to v2 documentation, it's not really explained anywhere. https://docs.traefik.io/migration/v1-to-v2/#frontends-and-backends-are-dead-long-live-routers-middlewares-and-services

Use staging Let's Encrypt in the meanwhile - it does not have the rate limit. Until you are sure that it is configured to your satisfaction, I think it's not a very good idea to hope that "it will solve itself".

I do have a working config now, as I already fixed the volume part. It's just a question regarding the domain alreadying failing to the rate limit.

Even if you do not lock yourself out perpetually, it still would not be a good Etiquette, to let your container hummer a free service

Yes, sure thing. I already disabled the non working domains and the working one seems to be ok.

As mentioned above, there is nothing special you need to do. It just works.

Well I wouldn't say it just works. Traefik 1.7 just worked for me, it took a bit of time to get it to work but it was pretty straight forward. Traefik 2.0 isn't as straight forward. I guess it's because it just got released and the documentation wasn't very clear. For example, if the acme.json file had been stored relative to the config file or by default in /etc/traefik. I wouldn't have the problem. I could be wrong but traefik 1.7 was by default looking up in /etc/traefik for acme.json but traefik 2.0 defaulted to /. That simple issue caused my server to exceed the rate limits.

Well, yes. The example you used was not for swarm. If you look at docker services reference here: https://docs.traefik.io/routing/providers/docker/#services In the section on traefik.http.services.<service_name>.loadbalancer.server.port it clearly says "Mandatory for Docker Swarm".

Now I do agree, that it feels like there is a room for improvement here: for api@internal, traefik has all the information it needs without adding a useless noop port, there is a discussion about this here.

api@internal is defined already so you should not need to define another one. Save for the useless port wrinkle, I mentioned above.

If you mean docker and docker swarm ports, that's really not traefik's place to explain how to expose them, for that there is documentation for docker/swarm. In terms of exposure the only relevant for traefik piece are entripoints, which are documented here. The load balancer ports you discussed earlier have nothing to do with traefik exposing them, on the opposite, it is using them and/or pointing to them, which is quite natural, it does need to know where to route traffic to.

These are the relevant documentation sections for 1.7 and 2.0 respecitvely in case you missed them:

1 Like

Thank you for the answer, I read most of the links you provided and I'd say the issue is mostly that a lot of sample code provided in the examples are more or less complete.

There is place for improvement and may be it was a move to make traefik more easy to approach to new comers by making the sample of config over simplified.

A configuration that would be more or less complete is a config that define the following things:

  • a router matching an url
  • a router pointing to a service using a particular port
  • a router user multiple middlewares (currently it only show a way to map to one middleware but the format for multiple one isn't clear, are they space separated or comma sperated ? I couldn't find documentation on the format.
  • two services defined in the same label set.

In the case of docker, from what I could see, if you set only one service then the router will implicitely use the defined service without errors. If you have multiple service defined but router not linked, it will silently fail to do anything. The only way to find out what's going on is to start traefik with debug log level.

I guess it could be considered as a bug because the dashboard seems to have been designed to display warning/errors but for some reasons if a config fails I see no message in the UI so my errors/warning are always at 0% and success is always 100%. Being able to see reasons why some router are skipped/discarded in the dashboard would save a lot of time and headache.