@dduportal yeah make sense.
Will switch to nginx
Thanks for trying to reproduce my error.
Traefik v2 is deployed as a daemonset. As this is my cluster for testing it is not bad if things are unavailable for some time. So Traefik v1 is replaced by v2 while I am testing. Traefik vX is the only ingress-controller in use.
Config of Traefik:
(Important part of the daemonset-definition):
spec:
selector:
matchLabels:
k8s-app: traefik-ingress-lb
template:
metadata:
creationTimestamp: null
labels:
k8s-app: traefik-ingress-lb
name: traefik-ingress-lb
spec:
containers:
- args:
- --configfile=/config/traefik.yaml
image: traefik:v2.0.5
livenessProbe:
failureThreshold: 2
httpGet:
path: /ping
port: 9090
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
name: traefik-ingress-lb
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
- containerPort: 7070
name: metrics
protocol: TCP
- containerPort: 9090
name: ping
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: 9090
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 300m
memory: 150Mi
requests:
cpu: 100m
memory: 50Mi
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
volumeMounts:
- mountPath: /config
name: config
- mountPath: /ssl
name: ssl
volumes:
- configMap:
defaultMode: 420
name: traefik-conf
name: config
- name: ssl
secret:
defaultMode: 420
secretName: traefik-web-ui
---
apiVersion: v1
data:
traefik.yaml: |
global:
checkNewVersion: false
sendAnonymousUsage: false
log:
level: DEBUG
providers:
kubernetesCRD: {}
kubernetesIngress: {}
entryPoints:
ping:
address: ":9090"
web:
address: ":80"
websecure:
address: ":443"
metrics:
address: ":7070"
ping:
entryPoint: "ping"
metrics:
prometheus:
entryPoint: metrics
api:
dashboard: true
tls: (this setting is ignored according to the logs)
stores:
default:
defaultCertificate:
certFile: /ssl/tls.crt
keyFile: /ssl/tls.key
http:
routers:
api:
rule: Host(`traefik.mydomain.com`)
entrypoints:
- websecure
service: api@internal
middlewares:
- auth
tls: {}
middlewares:
auth:
basicAuth:
users:
- 'test:somepassword'
The dashboard is not working, but I guess I could fix that by not defining it in the config. But that is a very secondary problem and I can fix that later.
Also, you can find the logs here:
https://pastebin.com/vmEy3fAb
Mmmh I see a few things to fix or think about here:
kubernetesCRD
provider while it's on the configuration). As it is a DaemonSet, there might be a propagation issue in the cluster. Because of this, we cannot be sure if the same Traefik DaemonSet's instance is answering the whoami
request, handling the events from Kubernetes, etc.. Also, there are no mention of whoami
in the log: so if the traefik instance your retrieved the logs from handled the requests, it's expected to have a 404 as no event from Kubernetes was sent to this Traefik about whoami.=> To solve this, would you mind switching Traefik to a single replica (either deployment, or keep the daemonset and restrict it to a single node), so we are sure that a single pod is handling events and routing? Of course it is a temporary measure and you'll go back to your initial setup after we solved your issue.
(this setting is ignored according to the logs)
in the traefik.toml (in the configmap object). You have to know that the same is valid for the http:
directive, which defines the router for the dashboard, and for the middleware auth as well.traefik.yaml
is handling the static configuration (ref. https://docs.traefik.io/v2.0/getting-started/configuration-overview/#the-static-configuration), while the tls, routers, services and middlewares are objects defined from the dynamic configuration (ref. https://docs.traefik.io/v2.0/migration/v1-to-v2/#tls-configuration-is-now-dynamic-per-router).=> To solve this, you have to enable the file provider (ref. https://docs.traefik.io/v2.0/providers/file/). The de-facto pattern is to define a file dynamic.yml
(or toml) which contains the TLS, router, service object and point it to the file provider, or migrate the objects to another provider (or it could be CRDs associated to your Traefik deployment but let's stay with your current file to minimize the changes for now).
Let's make it simple and keep everything in the file traefik.yaml
for now: change the section provider:
in the configmap to the following:
providers:
kubernetesCRD: {}
kubernetesIngress: {}
file:
filename: "/config/traefik.yaml"
Don't forget to kill your pods to be sure that the configmap changes are taken in account (as we changed the static configuration AND Kubernetes does not guarantee full propagation of configmaps synchronously)
accessLog: {}
to the traefik.yaml
file (and kill the pods...).After these changes, you might check:
If no, I'll need the logs of the only Traefik pod AND the kubectl get svc,pods,ingress
in the namespace where you deployed whoami.
Thanks a lot for the insights and tips.
We are making progress here I think
The certificate gets accepted now and works. Also, the webinterface works.
whoami and every other ingress is still not working.
I also deployed Traefik as an deployment with only one replica this time.
Logs: https://pastebin.com/sU80Q92T
kubectl output:
NAME READY STATUS RESTARTS AGE
pod/whoami-57bc4b5cfc-9zrh9 1/1 Running 0 6m46s
pod/whoami-57bc4b5cfc-fkm4j 1/1 Running 0 6m46s
pod/whoami-57bc4b5cfc-mrn97 1/1 Running 0 6m46s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/whoami ClusterIP 10.233.31.222 <none> 8000/TCP 6m46s
NAME HOSTS ADDRESS PORTS AGE
ingress.extensions/test-ingress whoami.mydomain.com 80 6m46s
Ok, so now, Traefik picked the ingress and added the routing configuration: I got the same kind of logs.
What are the results of the following commands, run from outside the Kubernetes cluster:
curl -v http://traefik.mydomain.com
and
curl -v http://traefik.mydomain.com
?
[edit]
sorry I meant curl -v http://whoami.mydomain.com
for the 2nd one
Those commands are the same.
Anyways, http makes no sense for me, since HAProxy redirects to https:
curl -v https://traefik.mydomain.com
* Trying internal-ip:443...
* Connected to traefik.mydomain.com (internal-ip) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: OU=Domain Control Validated; CN=*.mydomain.com
* start date: Apr 9 14:40:20 2019 GMT
* expire date: Apr 9 14:40:20 2021 GMT
* subjectAltName: host "traefik.mydomain.com" matched cert's "*.mydomain.com"
* issuer: some ca
* SSL certificate verify ok.
> GET / HTTP/1.1
> Host: traefik.mydomain.com
> User-Agent: curl/7.66.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 401 Unauthorized
< Content-Type: text/plain
< Www-Authenticate: Basic realm="traefik"
< Date: Fri, 22 Nov 2019 14:31:49 GMT
< Content-Length: 17
<
401 Unauthorized
Sorry I meant curl -v http://whoami.mydomain.com
for the 2nd one
No problem.
curl -v https://whoami.mydomain.com
* Trying internal-ip:443...
* Connected to whoami.mydomain.com (internal-ip) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: OU=Domain Control Validated; CN=*.mydomain.com
* start date: Apr 9 14:40:20 2019 GMT
* expire date: Apr 9 14:40:20 2021 GMT
* subjectAltName: host "whoami.mydomain.com" matched cert's "*.mydomain.com"
* issuer: some issuer
* SSL certificate verify ok.
> GET / HTTP/1.1
> Host: whoami.mydomain.com
> User-Agent: curl/7.66.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Fri, 22 Nov 2019 14:35:51 GMT
< Content-Length: 19
<
404 page not found
So you have HTTPS. Is HAProxy terminating TLS, or do you expect Traefik to do it?
HAProxy terminates https, but also uses https for the backends.
OK, this is why there is a 404/HTTP: by terminating TLS for HTTP, Haproxy has to create a new HTTP client <-> server connection to Traefik.
So it rewrites the Host header by setting it to the IP (or domain) it uses to reach Traefik (I suppose it is the IP of one of the nodeports).
When the request hits Traefik, then it does not match the domain name, so answers 404/HTTP
.
Did the setup was exactly the same with Traefikv1? I doubt this could have worked as it before.
=:> Can you validate my assumption by removing the host: whoami.mydomain.com
from the Ingress and try again?
I have doubts about your theory.
This setup works perfectly fine in production with Traefik 1.7.19.
My theory is, that this works with v1 because all the traffic for applications alway comes in on port 443 of Traefik from where it gets forwarded to the right pods. That is because of the default-entrypoints setting in Traefik v1.
Traefik v2 just uses all entrypoints.
While testing around and going through the webui, I saw that some routers have a green sign next to them now. So I visited the corresponding URL's. And voila, some ingresses work now, some not.
Why they work puzzles me. But I noticed that the services of the working ingresses are listening on ports 80 and 4443, which leads me to the theory that Traefik somehow can map those correctly to web and websecure? I am just wildly guessing here.
Anyways, I removed the host directive and get the same result:
curl -v https://whoami.mydomain.com
* Trying internal-ip:443...
* Connected to whoami.mydomain.com (internal-ip) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: OU=Domain Control Validated; CN=*.mydomain.com
* start date: Apr 9 14:40:20 2019 GMT
* expire date: Apr 9 14:40:20 2021 GMT
* subjectAltName: host "whoami.mydomain.com" matched cert's "*.mydomain.com"
* issuer: some issuer
* SSL certificate verify ok.
> GET / HTTP/1.1
> Host: whoami.mydomain.com
> User-Agent: curl/7.66.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Fri, 22 Nov 2019 14:56:11 GMT
< Content-Length: 19
<
404 page not found
I don't have anything else to say as I can't reproduce on both my local k3s and a remote EKS Kubernetes cluster with both HAProxy in front so there is for sure somethign that I do not understand
If you think it's because of the entrypoints, then so be it. Then it means better staying on Traefik v1.7 or changing Ingress Controller, as your Kubernetes cluster does not seem to work with Traefik v2.