I'm running into the same error mentioned here:
When I have:
spec: replicas: 1 selector: matchLabels: app: traefik template: metadata: labels: app: traefik spec: serviceAccountName: traefik-ingress-controller containers: - name: traefik image: traefik:v2.3 args: - --api.insecure - --accesslog - --entrypoints.web.Address=:80 - --entrypoints.websecure.Address=:443 - --providers.kubernetescrd - --certificatesresolvers.myresolver.acme.tlschallenge - --firstname.lastname@example.org - --certificatesresolvers.myresolver.acme.storage=acme.json # Please note that this is the staging Let's Encrypt server. # Once you get things working, you should remove that whole line altogether. - --certificatesresolvers.default.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory ports: - name: web containerPort: 80 - name: websecure containerPort: 443 - name: admin containerPort: 8080
It issues the test certificate perfectly.
When I remove:
# Please note that this is the staging Let's Encrypt server. # Once you get things working, you should remove that whole line altogether. - --certificatesresolvers.default.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
...and reapply the config, the container restarts and a few seconds later I get:
2020-10-13T20:35:44.414405078Z time="2020-10-13T20:35:44Z" level=error msg="Unable to obtain ACME certificate for domains \"redacted.com\": unable to generate a certificate for the domains [redacted.com]: error: one or more domains had a problem:\n[redacted.com] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during read (your server may be slow or overloaded), url: \n" providerName=myresolver.acme routerName=default-ingressroutetls-6d083b779d844aaf84a5@kubernetescrd rule="Host(`redacted.com`)"
If I put the staging resolver back in, it works.
I finally 'tricked' it into working by removing the staging resolver, letting it error out, then changing my IngressRoute to:
apiVersion: traefik.containo.us/v1alpha1 kind: IngressRoute metadata: name: ingressroutetls namespace: default spec: entryPoints: - websecure routes: - match: HostSNI(`redacted.com`) kind: Rule services: - name: portal port: 80 tls: certResolver: myresolver
If didn't like the HostSNI directive being there, so when I changed it back to Host, the traefik container DID NOT restart, but it got its config and updated.
There's nothing special about my k8s cluster. It's on DigitalOcean. It has the RBAC config installed, Prometheus for metrics, and a pretty simple Django application.
For the life of me, I can't figure out why Traefik times out when I apply the config for my app. Shouldn't the Traefik container retry getting the cert after a few minutes if it fails?