Kubernetes - IngressRoutes - Gateway Timeout - Namespaces

Hello,

First post around here rather than an github bug report.

I manage to setup Traefik 2.0 into my Rancher 2.0 following the Kubernetes User guides on the documentation.
I understand why i rather use IngressRoute than Ingress definitions and it’s fine for me as long as i got my SSL certificates with my provider :love_you_gesture:

So my traefik is working fine under his namespace traefik with the whoami container (which i found pretty useful :four_leaf_clover:).

Now the story begun...

I got Gateway Timeout when i declare an IngressRoute & IngressRouteTCP to a deployment under another namespace. When i put the deployment under the same namespace than traefik it’s working fine.

I’m currently using file provider and kubernetescrd provider and i had a look on the Docker provider about networking.

I’m feeling that my issue is around the fact that i want to use an namespace for each application i want to run on top of my kubernetes/Rancher.

Did i miss something ? Should i add the docker provider with an ro on /var/run/docker.sock mount in order to be able to reach my pods from other namespaces ? Should i add custom NetworkPolicy to allow communication from my traefik namespace to other ?

Please advice,

Thanks

Hello @Zword,

Can you please provide us with an example of your IngressRoute that is not working, and the debug logs from Traefik?

That way we can continue to troubleshoot with some more information.

Hello @daniel.tomcej,

Thanks you for your answer.

There is my current configuration :

  • An single Rancher node 2.2.6 with an one node Kubernetes Cluster 1.14.3
  • An Traefik Deployment like into an traefik namespace :
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  annotations:
  labels:
    app: traefik
  name: traefik
  namespace: traefik
spec:
  replicas: 1
  selector:
    matchLabels:
      app: traefik
  template:
    metadata:
      labels:
        app: traefik
    spec:
      containers:
      - args:
        - --configFile=/local/traefik/traefik.yml
        env:
        - name: TZ
          value: Europe/Brussels
        image: traefik:v2.0
        imagePullPolicy: IfNotPresent
        name: traefik
        ports:
        - containerPort: 8000
          hostPort: 80
          name: web
          protocol: TCP
        - containerPort: 4443
          hostPort: 443
          name: websecure
          protocol: TCP
        - containerPort: 2222
          hostPort: 22
          name: ssh
          protocol: TCP
        - containerPort: 8080
          name: admin
          protocol: TCP
        volumeMounts:
        - mountPath: /local/traefik
          name: traefik-config
        - mountPath: /local/traefik-file-provider
          name: traefik-file-provider
        - mountPath: /data
          name: traefik-ssl-storage
      volumes:
      - configMap:
          defaultMode: 256
          name: traefik-config-yml
          optional: false
        name: traefik-config
      - configMap:
          defaultMode: 256
          name: traefik-file-provider
          optional: false
        name: traefik-file-provider
      - hostPath:
          path: /mnt/system/traefik-data
          type: DirectoryOrCreate
        name: traefik-ssl-storage

My Traefik configuration :

## Static configuration
entryPoints:
  web:
    address: ":8000"

  web-secure:
    address: ":4443"

  ssh:
    address: ":2222"

  metrics:
    address: ":8082"

certificatesResolvers:
  default:
    acme:
      email: YYY@YYY.XXX
      storage: /data/acme.json
      dnsChallenge:
        provider: ovh
        resolvers:
          - "1.1.1.1:53"
          - "8.8.8.8:53"

providers:
  kubernetesCRD:
    namespaces:
      - "default"
      - "production"
      - "traefik"
      - "gitlab"
  file:
    directory: /local/traefik-file-provider

api:
  dashboard: true

metrics:
  prometheus: 
    addEntryPointsLabels: true
    addServicesLabels: true
    entryPoint: metrics


accessLog: {}
log:
  level: INFO

My file dynamic configuration:

http:
  routers:
    api-dashboard:
      entryPoints:
        - "web-secure"
      rule :  Host(`traefik.toto.yx`) && PathPrefix(`/api`) || Host(`traefik.toto.xy`) && PathPrefix(`/dashboard`)
      tls:
        certResolver: default
      service: api@internal
      middlewares:
        - auth-dashboard
  middlewares:
    auth-dashboard:
      basicAuth:
        users:
          - "test:$toto" 

On my gitlab namespace, i have an gittea-deployment runnig on port 80&22 when this deployment is running on the traefik namespace i can use my IngressRoute without issue but with the exact settings on gitlab namespace i got Gateway Timeout:

My IngressRoute definition :

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: gittea-tls
spec:
  entryPoints:
    - web-secure
  routes:
  - match: Host(`git.toto.xy`) 
    kind: Rule
    services:
    - name: gittea-front
      port: 80
  tls:
    certResolver: default
 ---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: git-http
spec:
  entryPoints:
    - web
  routes:
  - match: Host(`git.toto.xy`)
    kind: Rule
    services:
    - name: gittea-front
      port: 80
    middlewares:
      - name: redirecthttps
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: git-ssh
spec:
  entryPoints:
    - ssh
  routes:
    - match: HostSNI(`*`)
      services:
        - name: gittea-ssh
          port: 22

I think that i might be related on the Clusterrolebinding definition :

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller

roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik-ingress-controller
subjects:
  - kind: ServiceAccount
    name: traefik-ingress-controller
    namespace: traefik

As i’m working on several namespaces i should have an service account member on the ClusterRole for each namespace that i’m working on?

Ok i tried adding another service account traefic-ingress-controller for my gitlab namespaces to the clusterrolesbinding without success

There is my debug log for my request :

time="2019-10-11T08:14:50+02:00" level=debug msg="vulcand/oxy/roundrobin/rr: begin ServeHttp on request" Request="{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"\",\"Fragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\"],\"Accept-Encoding\":[\"gzip, deflate, br\"],\"Accept-Language\":[\"en-US,en;q=0.5\"],\"Cache-Control\":[\"max-age=0\"],\"Dnt\":[\"1\"],\"Te\":[\"trailers\"],\"User-Agent\":[\"Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0\"],\"X-Forwarded-Host\":[\"git.xy.xy\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"traefik-664cd74f88-6ps2b\"],\"X-Real-Ip\":[\"1.1.1.1\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"git.xy.xy\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"1.1.1.1:12319\",\"RequestURI\":\"/\",\"TLS\":null}"

time="2019-10-11T08:14:50+02:00" level=debug msg="vulcand/oxy/roundrobin/rr: Forwarding this request to URL" Request="{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"\",\"Fragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\"],\"Accept-Encoding\":[\"gzip, deflate, br\"],\"Accept-Language\":[\"en-US,en;q=0.5\"],\"Cache-Control\":[\"max-age=0\"],\"Dnt\":[\"1\"],\"Te\":[\"trailers\"],\"User-Agent\":[\"Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0\"],\"X-Forwarded-Host\":[\"git.xy.xy\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"traefik-664cd74f88-6ps2b\"],\"X-Real-Ip\":[\"1.1.1.1\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"git.xy.xy\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"1.1.1.1:12319\",\"RequestURI\":\"/\",\"TLS\":null}" ForwardURL="http://10.42.0.104:3000"

time="2019-10-11T08:14:51+02:00" level=debug msg="'504 Gateway Timeout' caused by: dial tcp 10.42.0.104:3000: i/o timeout"

time="2019-10-11T08:14:51+02:00" level=debug msg="vulcand/oxy/roundrobin/rr: completed ServeHttp on request" Request="{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/serviceworker.js\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"\",\"Fragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"*/*\"],\"Accept-Encoding\":[\"gzip, deflate, br\"],\"Accept-Language\":[\"en-US,en;q=0.5\"],\"Cache-Control\":[\"no-cache\"],\"Pragma\":[\"no-cache\"],\"Service-Worker\":[\"script\"],\"Te\":[\"trailers\"],\"User-Agent\":[\"Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0\"],\"X-Forwarded-Host\":[\"git.calzi.be\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"traefik-664cd74f88-6ps2b\"],\"X-Real-Ip\":[\"1.1.1.1\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"git.toto.xy\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"1.1.1.1:12319\",\"RequestURI\":\"/serviceworker.js\",\"TLS\":null}"

Based on the logs i tried to do a curl 10.42.0.104:3000 from the traefik container without gaining any results.But i’m a bit surprised that traefik try to reach the container directly without using the service with the port 3000 redirect to 80...

Hope that will help :slight_smile:

Hi again,

Since last time i tried several options and test without success like adding some "items" into the clusterroles definitions etc, etc...

I’ve saw topics like mine on the github saying to use the provider docker and a network definition...
I’m feeling unconfortable with that kind of solution when i had another look on the docs about KubernetesIngressCRD saying that it can use an namespaces options, token etc....

If this kind of setup have an namespaces options in order to serve other namespaces than the one traefik is running how can i achieve that ?

The documentation might be a little more explicite about it and i’m willing to do a push to the docs repo as soon as i would have a solution.

Currently i’m not that stuck as i can run my apps on the traefik namespaces if i want. The thing is security as my traefik namespace get an unrestricted pods policy in order to bind hostpath on my server. I would like to be able several for each of my apps with dedicated settings regarding the security.

I’m willing to help to improve this kind of setup if you need some.

Please let me know about it.

PS: I remember being able to use traefik 1.7 with differents namespaces but as i don’t have anymore access to this particular cluster i can’t confirm that. Traefik 2.0 rocks btw :wink:

Hello,

Could it be that someone gets a new hint about this behavior ?

Thanks, :slight_smile:

So from what you posted I understand that traefik tries to connect to gittea-front service and cannot.

First , I'd like to address this comment:

For performance reasons traefik watches kubernetes service endpoints, and access them directly.

Now with this out of the way, it seems that whatever problem you have it is not really traefik related, it has to do with how security and/or networking is set up in your cluster.

First, I'd confirm that your gittea container indeed reachable on cluster ip 10.42.0.104:3000 from anywhere. If it is not, then this is the problem to solve. If it is reachable from somewhere else, but but not from traefik pod, then you'll have to look into cluster networking (security policies, etc) and find out what's preventing the communication, and rectify it.

Good luck.

Hello @zespri :),

Thank you for your answer.
I’m gonna have another look and i will let you know about it.

I understand that Traefik watchs kubernetes service now.

Hello,

I finally found out who the culprit was... my network provider inside my kubernetes.
With Rancher + Canal you might have by default an "Project isolation" settings set to on which made network unavailable between project ( one project can be just one namespace or more).

Now it’s just working as expected.

Thanks you for the help.