However the pods are failing to download the plugins due to a tls error.
time="2023-04-21T05:11:34Z" level=error msg="Plugins are disabled because an error has occurred." error="failed to download plugin github.com/packruler/traefik-themepark: failed to call service: Get \"https://plugins.traefik.io/public/download/github.com/packruler/traefik-themepark/v1.3.0\": tls: failed to verify certificate: x509: certificate is valid for 7f45f56f42100bf88aee3f0ab3adde12.cf03e4201f51fc5b39e11448d034ebd6.traefik.default, not plugins.traefik.io"
Is there a way to disable this check for plugins or is there genuinely a tls problem with traefik?
I restored all my VM's to a clean state, re-installed K3S using ansible and tried again today. Same problem is occurring.
❯ k exec -it --namespace traefik traefik-5777fd4fff-45m26 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ $ wget https://plugins.traefik.io/public/download/github.com/packruler/traefik-themepark/v1.3.0 -O /dev/null
Connecting to plugins.traefik.io (10.23.2.5:443)
48BB7C5E1A7F0000:error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1889:
ssl_client: SSL_connect
wget: error getting response: Connection reset by peer
But from a metalb pod it's working
❯ k exec -it --namespace metallb-system controller-c6c466d64-9cqkk sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
~ $ curl
sh: curl: not found
~ $ wget https://plugins.traefik.io/public/download/github.com/packruler/traefik-themepark/v1.3.0 > /dev/null
Connecting to plugins.traefik.io (104.26.3.101:443)
wget: can't open 'v1.3.0': Read-only file system
~ $ wget https://plugins.traefik.io/public/download/github.com/packruler/traefik-themepark/v1.3.0 -O /dev/null
Connecting to plugins.traefik.io (104.26.2.101:443)
saving to '/dev/null'
null 100% |************************************************************************************************************************************************************| 508k 0:00:00 ETA
'/dev/null' saved
Interestingly the traefik pod is trying to connect to traefik that's running on baremetal. But the metallb pod connects to the real ip. Both have the same resolv.conf file, so I'm not sure why.
The above investigation led me to a DNS issue. resolv.conf on traefik's pods had mydomain.com as a search target, but metallb's did not. Digging deeper it looks like a wildcard *.mydomain.com entry to 10.23.2.5 on my Bind9 setup was causing this.
For now, I've removed that and manually listed out all the dns entries manually instead of the wildcard and it's working as expected now.
Modifying DNS zones feels hacky, but then forcing the exclusion of mydomain.com from resolv.conf (if even possible) also feels hacky. Honestly I have no idea why that entry exists in traefik's pods but not in metallb's pods.
Open to better solutions/advice on a cleaner resolution.