I'm running into the same error mentioned here:
When I have:
spec:
replicas: 1
selector:
matchLabels:
app: traefik
template:
metadata:
labels:
app: traefik
spec:
serviceAccountName: traefik-ingress-controller
containers:
- name: traefik
image: traefik:v2.3
args:
- --api.insecure
- --accesslog
- --entrypoints.web.Address=:80
- --entrypoints.websecure.Address=:443
- --providers.kubernetescrd
- --certificatesresolvers.myresolver.acme.tlschallenge
- --certificatesresolvers.myresolver.acme.email=aaron@ctrl-alt-it.com
- --certificatesresolvers.myresolver.acme.storage=acme.json
# Please note that this is the staging Let's Encrypt server.
# Once you get things working, you should remove that whole line altogether.
- --certificatesresolvers.default.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
ports:
- name: web
containerPort: 80
- name: websecure
containerPort: 443
- name: admin
containerPort: 8080
It issues the test certificate perfectly.
When I remove:
# Please note that this is the staging Let's Encrypt server.
# Once you get things working, you should remove that whole line altogether.
- --certificatesresolvers.default.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
...and reapply the config, the container restarts and a few seconds later I get:
2020-10-13T20:35:44.414405078Z time="2020-10-13T20:35:44Z" level=error msg="Unable to obtain ACME certificate for domains \"redacted.com\": unable to generate a certificate for the domains [redacted.com]: error: one or more domains had a problem:\n[redacted.com] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Timeout during read (your server may be slow or overloaded), url: \n" providerName=myresolver.acme routerName=default-ingressroutetls-6d083b779d844aaf84a5@kubernetescrd rule="Host(`redacted.com`)"
If I put the staging resolver back in, it works.
I finally 'tricked' it into working by removing the staging resolver, letting it error out, then changing my IngressRoute to:
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: ingressroutetls
namespace: default
spec:
entryPoints:
- websecure
routes:
- match: HostSNI(`redacted.com`)
kind: Rule
services:
- name: portal
port: 80
tls:
certResolver: myresolver
If didn't like the HostSNI directive being there, so when I changed it back to Host, the traefik container DID NOT restart, but it got its config and updated.
There's nothing special about my k8s cluster. It's on DigitalOcean. It has the RBAC config installed, Prometheus for metrics, and a pretty simple Django application.
For the life of me, I can't figure out why Traefik times out when I apply the config for my app. Shouldn't the Traefik container retry getting the cert after a few minutes if it fails?