Trying to understand Traefik routing and resolve Step CA routing issues

I have set up Traefik with Portainer and Step CA on a Debian 11 VM running on VMware ESXi and I have documented most of it in this Git repo).

My goal for this will be to create a repo which can be used to install and configure a VM and/or a bare-metal box using 100% infrastructure-as-code — meaning of course not having to manually do anything — to see it all work. I think I am running into some catch-22 issues with needing SSL before SSL is available, but not completely sure. For example, how can I use SSL to get an SSL cert if I don't yet have SSL?

So I have two (2) main questions.

  1. In terms of Traefik endpoints I am not clear if I should be using websecure (HTTPS/SSL) vs. web (HTTP), or both, and I have that question for all of 1. the Step CA server, 2. the special cases of Portainer and Traefik, and 3. any other container endpoints I want Step CA to generate a cert for.

    I think have run into problems no matter which of the four (4) permutations I have used — 1. web, 2. websecure, 3. web,websecure, and 4. websecure,web but it has been hard to pinpoint exactly what the problems are and what was causing them because of the chicken-and-egg aspect of bootstrapping.

    It seems in some cases Step CA won't respond if I don't use SSL and in other cases if won't respond if I do. I have not been able to find a way to actually see what is being requested by Traefix when Step CA is not working — either via Traefik logs which do not appear granular enough, or via any form of Step CA logs — so my issue is I cannot figure out what is going on in the black box.

    What would help would be to understand:

    1. Does Step CA handle its own cert, or do I need to generate a cert with Step CA's HTTPS for Step CA?
    2. Does Step CA respond to HTTP, and if so for which of Step CA's internal routes and when?
    3. When does Step CA require HTTPS to respond to its own routes, and if so which ones, why and when?
  2. When using Traefik to route step-ca.local to <host-ip-address>:9000 if I request https://step-ca.local/health in the browser I get Client sent an HTTP request to an HTTPS server. and the error message in Traefik log is level=debug msg="Request has been aborted [192.168.1.10:53691 - /health]: net/http: abort Handler" middlewareName=traefik-internal-recovery middlewareType=Recovery which tells me pretty much nothing about how to resolve this or if this is a Traefik issue or a Step CA issue.

    #justfyi 192.168.1.10 is my laptop's static IP address.

    However, if use https://step-ca.local:9000/health then I get {"status":"ok"}.

    If I remove web (HTTP) from the Traefik label traefik.http.routers.step-ca.entrypoints=websecure and I request https://step-ca.local/health I get 404 page not found and the error message in Traefik log is essentially identical to with HTTPS: level=debug msg="Request has been aborted [192.168.1.10:50427 - /health]: net/http: abort Handler" middlewareName=traefik-internal-recovery middlewareType=Recovery

    if I remove websecure (HTTPS) from the Traefik label and remove the HTTP->HTTPS redirection in Traefik and request step-ca.local/health I get 404 page not found for both https:// and http:// and nothing appears in the Traefik log for either. But If I use :9000 then https:// works and http:// gives me the ``Client sent an HTTP request to an HTTPS server` error, as expected.

    Since I don't know what Step CA expects nor can I see what Step CA is getting from Traefik I have no way to determine if this is a Step CA issue or a Traefik issue.

    Can anyone offer any insight on either of these questions? Thank you in advance if you can help.

-Mike

What is StepCA? You want to run your own CA? How do you setup your clients to trust your own CA?

Normally you have 3 challenge types with Traefik and LetsEncrypt:

  • httpChallenge uses http to fetch a file
  • tlsChallenge uses special TLS, so LE needs to be tightly integrated with Traefik
  • dnsChallenge is using TXT entries in DNS, needed for wildcards and targets not reachable on the Internet

Step CA is a self-signed cert authority. I linked its name above but here is the exposed URL:

I have it set up to serve httpChallenge and tlsChallenge via ACME although I haven’t confirmed that the latter is working:

My expertise is Go development, not sysadmin, so I am just doing my best here, and not at all clear how everything should be configured.

As for trust, I just add Step CA’s root cert to my keychain. This is all just for usage with an mDNS local link NAT network.

Why use multiple challenges? The regular LE process is that Traefik will check the routers for Host() and HostSNI() domains, then create certs for those. No need for multiple challenges.

Have you checked the simple Traefik example?

An alternative solution is to use regular LetsEncrypt with dnsChallenge and use public sub-domains on public DNS with internal private IPs. You can point mynas.example.com -> 192.168.1.100 and get a cert for it.

Because I did not understand it fully and thus did not know what was needed. I'll remove the tlsChallenge, thank you.

Yes, there are many examples of how to get Let's Encrypt working. However, I specifically want to setup and use Step CA and explicitly want to use .local domains, not public sub-domains.

Again, I have it all working, enough for basic needs. What I am trying to do now it learn its fine points that I do not yet understand so I can perfect how it works and keep from breaking it in the future.