Cannot enable sticky sessions on Kubernetes Service

Hi, I have been trying to enable sticky sessions with Traefik for a while without success. I dove into documentation, performed multiple experiments and I got to this point where I think it should have worked but it's not.

First of all, some context. What I am trying to achieve is a front-end to backend communication via WebSocket. Because there are multiple back-end replicas the scokets keep reconnecting. Reducing to 1 replica and everything works.

Second, I only use Kubernetes objects because using Traefik custom CRDs mean that I will loose compatibility with some services that require them like cert-manager or linkerd and there may be more, so I don't think it's a good idea to make the architecture future proof to use CRDs to define Services and Ingresses.

Having that said I reached the conclusion (through extensive documentation research) that on Traefik v2.1.2 That I am currently on, I can only achieve this through dynamic configuration files (as sticky apparently is only supported on Traefik's CRDs).

To make this a bit more flexible and done in Kubernetes way I am mounting ConfigMaps as config files in Traefik pod. The problem is that I have no idea how to check if the configuration I am using has been properly processed. I followed the guide here (https://docs.traefik.io/v2.1/user-guides/crd-acme/) to setup Traefik but customized the deployment so it looks like this:

apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: traefik
  name: traefik-ingress-controller

---
kind: Deployment
apiVersion: apps/v1
metadata:
  namespace: traefik
  name: traefik
  labels:
    app: traefik
spec:
  replicas: 3
  selector:
    matchLabels:
      app: traefik
  template:
    metadata:
      labels:
        app: traefik
    spec:
      serviceAccountName: traefik-ingress-controller
      containers:
        - name: traefik
          image: traefik:v2.1
          args:
            - --api=true
            - --api.insecure=true
            - --api.dashboard=true
            - --accesslog=true
            - --entrypoints.web.Address=:8000
            - --entrypoints.websecure.Address=:4443
            - --providers.kubernetescrd
            - --providers.kubernetesingress=true
            - --providers.kubernetescrd.ingressclass=traefik
            - --providers.file.directory=/config #<-- Where I am putting the configmap mounts
            - --providers.file.watch=true
          ports:
            - name: web
              containerPort: 8000
            - name: websecure
              containerPort: 4443
            - name: admin
              containerPort: 8080
          volumeMounts:
            - name: config-vol
              mountPath: /config
      volumes:
        - name: config-vol
          configMap:
            name: traefik-config
            items:
              - key: sticky-services
                path: sticky-services.yaml

That leads to mount the ConfigMap below in the path /config/sticky-services.yaml, I verified in the pod that the file exists and I can read the file.

apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-config
  namespace: traefik
data:
  sticky-services: |
    http:
      services:
        my-service.othernamespace.svc.cluster.local:
          loadBalancer:
            sticky:
              cookie: {}

Where my-service is the backend service the frontend maintains a websocket connection to.
The end result is that nothing changes. A cookie is not sent and the socket keeps disconnecting and reconnecting again.

The application works fine under one replica, TLS is enabled so certificate generation works fine under this setup and I have no CORS problems, so I get authentication cookies fine.

Is there anything that I am overlooking here?

Thank you for any help that you can provide.
Cheers,
Fábio

So, I figured this out myself and the solution does require Traefik v2.1, v2.0 won't work because of the lack of support to sticky sessions in CRDs

After a lot of trial and error, it turns out that my assumption (that IngressRoute and Ingress were mutually exclusive) was wrong. Using the Kubernetes ingress I am able to have cert-manager handling the certificates and with IngressRoute I can enable sticky sessions on the service:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: my-service-ingressroute
  namespace: othernamespace
  annotations:
    kubernetes.io/ingress.class: traefik #<-- This is optional depending of your setup.
spec:
  entryPoints:
    - websecure
  routes:
  - match: Host(`my-service.mydomain.com`)
    kind: Rule
    services:
    - name: my-service
      port: 80
      sticky:
        cookie:
          httpOnly: true
  tls:
    secretName: my-service.mydomain.com-secret
    domains:
    - main: my-service.mydomain.com

By the way, the ability to do this in the IngressRoute is still undocumented. Although you can see the sticky session on the sample in v2.1 documentation, there is really no documentation about sticky session except using dynamic configuration (which I couldn't find a way to make it work).

It's been a wild ride (not always in the good sense) trying to replace nginx with traefik and most of it is because of poor documentation. And clearly a lot of my frustrations could have been avoided with proper documentation.

It's obvious to me that Traefik is a great product and that it should not be outshined by its documentation. It overcomes the limitations that I have with nginx and if any of the Traefik's team can read this, I would love to contribute to improve the documentation. I certainly thought about giving up multiple times for the wrong reasons.

I think it wouldn't take me more than a day to improve the user guide (which by the way should be more easily discoverable, for example, from the getting started part) so that it's more complete and that provides a picture that covers more scenarios other than just deploying it to Kubernetes, which alone does not accomplish anything.

I hope this is helpful for someone.
Cheers,
Fábio

2 Likes

Hello Fábio,

Thank you for your post. I am going through the same process (migrating from nginx to traefik) now and I am wondering how you use both Kubernetes Ingress and IngressRoute objects simultaneously. I also use cert-manager and it is the reason why I stick with Kubernetes Ingress. Could you share your Ingress definition? Does it look something like this:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: name
  annotations:
      kubernetes.io/ingress.class: "traefik"
      cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - secretName: domain.com
    hosts:
    - domain.com
  rules:
  - host: domain.com
    http:
      paths:
      - path: /
        backend:
          serviceName: service
          servicePort: 443

and then you have an additional IngressRoute for the same host, right?

Thank you.