Having issues with docker swarm, sticky session cookie and AWS Cloudfront CDN

So we recently changed from a standard `docker compose` setup to `docker swarm` and now are running into issues with our AWS Cloudfront CDN setup.

We are running sticky sessions with a set cookie name of `traefiksession` and `samesite=None` and `secure=true`.

Inside cloudfront, we make sure to also add the `traefiksession` cookie to our cache.

But, seemingly randomly, when running an update and deploying a new version or some other change to the docker service inside the docker swarm, random users suddenly get errors fetching the clientside JS bundle for our SPA and the whole page fails because there is no clientside JS - duh.

This really worked like a charm before, but with the introduction of `docker swarm` and an AWS network load balancer** we are now running into these issues.

Can someone help here? Does someone run a similar setup?

Sidenote: We are running meteorjs apps in node docker containers.

Also, here is a screenshot from my own chrome, but not sure if this is expected or somehow could be the issue?! I am really not sure.

Thank you all!
Best, Patrick

You are re-creating the Traefik instances or target service instances?

Maybe share your full Traefik static and dynamic config, and compose file(s).

So, here is the full docker-stack that I am deploying:

services:
  traefik:
    image: traefik:v3.1
    command:
      # Swarm provider
      - --providers.swarm=true
      - --providers.swarm.endpoint=unix:///var/run/docker.sock
      - --providers.swarm.exposedbydefault=false

      # Entrypoints
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443

      # Ping healthcheck endpoint
      - --ping=true
      - --ping.entrypoint=web

      # Other
      - --api=true
      - --api.dashboard=true
      - --certificatesresolvers.le.acme.email=dev@orderlion.com
      - --certificatesresolvers.le.acme.storage=/acme.json
      - --certificatesresolvers.le.acme.httpchallenge.entrypoint=web
      - --certificatesresolvers.le.acme.httpchallenge=true
      - --log=true
      - --log.filepath=/var/log/traefik.log
      - --log.level=WARN
    ports:
      - 80:80
      - 443:443
      - 8080:8080 # optional dashboard
    labels:
      - container_name=traefik
      - traefik.enable=true
      - traefik.http.routers.api.entrypoints=web
      - traefik.http.routers.api.rule=Host(`traefik.orderlion.com`)
      - traefik.http.routers.api.service=api@internal
      - traefik.http.routers.api.middlewares=auth
    volumes:
      - '/var/run/docker.sock:/var/run/docker.sock:ro'
      - '/files/traefik/acme.json:/acme.json'
      - '/var/log:/var/log'
      - loadbalancerdata:/data
    deploy:
      mode: global
      placement:
        constraints:
          - node.role == manager # run Traefik only on manager(s)
      restart_policy:
        condition: on-failure
    networks:
      - webswarm

  orderlion-cdntest:
    image: ${OLIMAGE:-'orderlion/orderlion-main-arm:latest'}
    environment:
      - PORT=3000
      - ROOT_URL=https://cdntest.orderlion.com
      - METEOR_SETTINGS
      - SERVER_ENV=production
      - DD_ENV=production
      - OL_CONFIG=production
      - OL_CONTAINER_NAME=orderlion-cdntest-{{.Task.Slot}}
      - CDN_URL=https://d12l7oswdn4whf.cloudfront.net
    volumes:
      - /files:/files
      - '/var/run/docker.sock:/var/run/docker.sock:ro'
    depends_on:
      - traefik
    networks:
      - webswarm
    deploy:
      labels:
        - traefik.enable=true
        - traefik.http.routers.orderlion-cdntest.rule=Host(`cdntest.orderlion.com`)
        - traefik.http.routers.orderlion-cdntest.tls=true
        - traefik.http.routers.orderlion-cdntest.tls.certresolver=le
        - traefik.http.routers.orderlion-cdntest.entrypoints=websecure
        - traefik.http.routers.orderlion-cdntest.service=orderlion-cdntest
        - traefik.http.routers.orderlion-cdntest-http.entrypoints=web
        - traefik.http.routers.orderlion-cdntest-http.rule=Host(`cdntest.orderlion.com`)
        - traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https
        - traefik.http.middlewares.redirect-to-https.redirectscheme.permanent=true
        - traefik.http.routers.orderlion-cdntest-http.middlewares=redirect-to-https
        - traefik.http.services.orderlion-cdntest.loadbalancer.server.port=3000
        - traefik.http.services.orderlion-cdntest.loadbalancer.sticky=true
        - traefik.http.services.orderlion-cdntest.loadbalancer.sticky.cookie=true
        - traefik.http.services.orderlion-cdntest.loadbalancer.sticky.cookie.name=traefiksession
        - traefik.http.services.orderlion-cdntest.loadbalancer.sticky.cookie.samesite=None
        - traefik.http.services.orderlion-cdntest.loadbalancer.sticky.cookie.secure=true
        - traefik.http.services.orderlion-cdntest.loadbalancer.healthcheck.path=/healthcheck
        - traefik.http.services.orderlion-cdntest.loadbalancer.healthcheck.interval=10s
        - traefik.http.services.orderlion-cdntest.loadbalancer.healthcheck.timeout=5s
      mode: replicated
      replicas: 2
      update_config:
        parallelism: 1
        delay: 30s
        order: start-first
      restart_policy:
        condition: on-failure

volumes:
  loadbalancerdata:

networks:
  webswarm:
    external: true

My current assumption is this:

After a lot of debugging of the HTTP requests to cloudfront and their response, there must be something wrong with the sticky cookie setup. I am getting totally wrong Set-Cookie header responses from cloudfront pointing to traefiksession cookie values that are all outdated - they contain values that are not valid anymore.

Also, I think the initial cookie coming from my test server at cdntest.orderlion.com is not properly sent to cloudfront in the first place - best guess, because Chrome does NOT show it with SameSite=None in the dev tools (see screenshot in 1st post).
Because of that, cloudfront does not even receive a traefiksession cookie and connects to a random docker container when sending the requests, causing this mess.

–> I do not understand why Chrome does not show the SameSite=None in the dev tools, althought I have it set: - traefik.http.services.orderlion-cdntest.loadbalancer.sticky.cookie.samesite=None.

If I understand your question @bluepuma77 correcty: the traefik instance remains unchanged, only the orderlion-cdntest docker service changes!

If the target service is re-created (updated), I would assume that the sticky cookie is not valid anymore within Traefik and you are send to a different instance.

I don’t know about Meteor (used it 2016), but when you update a Svelte app, the generated filenames change, so a simple backend swap will fail. At least for Svelte you need to reload the whole app in the browser.

Yes, the filenames of the clientside JS bundle DO change! There is a full reload of the browser. But still, it all fails somehow - users end up with a broken/outdated JS bundle somehow and can't use the app at all.
I experienced it myself yesterday, I tried reloading the page many times, no change. not even deleting the cookies helped. The only thing that solved it was running an invalidation on /* for the cloudfront distribution.

Before the invalidation, I could still see responses from cloudfront that "tried" to set a traefiksession cookie with a value that was definitely outdated, it contained a value of a docker container from BEFORE the last deployment basically.

Does Cloudfront do caching?

Yes sure, thats kinda the point of a CDN!
But it should revalidate the cache quite quickly. Also, see first post, the value of the session cookie should be part of the cache key.
I still honestly believe that something is broken with the SameSite=None on traefik with swarm somehow.

Docker Swarm itself will not touch the http headers. I think you can check in the browser cookie store if SameSite=None is set.

Exactly ... that's what I did and it is not SameSite=None for some reason. Although it is set in my docker stack yml file as I have shown you. That is exactly my point, somehow SameSite=None is not working.

Here it is directly screenshoted from my Chrome:

I feel the old docs are a bit easier to read. Maybe change spelling:

SameSite can be none, lax, strict or empty.

The sticky example is only for file provider, but the old full reference shows that it should work.

  - "traefik.http.services.service02.loadbalancer.sticky=true"
  - "traefik.http.services.service02.loadbalancer.sticky.cookie=true"
  - "traefik.http.services.service02.loadbalancer.sticky.cookie.domain=foobar"
  - "traefik.http.services.service02.loadbalancer.sticky.cookie.httponly=true"
  - "traefik.http.services.service02.loadbalancer.sticky.cookie.maxage=42"
  - "traefik.http.services.service02.loadbalancer.sticky.cookie.name=foobar"
  - "traefik.http.services.service02.loadbalancer.sticky.cookie.path=foobar"
  - "traefik.http.services.service02.loadbalancer.sticky.cookie.samesite=foobar"
  - "traefik.http.services.service02.loadbalancer.sticky.cookie.secure=true"

I now tried the following: I set up a new subdomain for our cloudfront - so the CDN domain is now cdn.orderlion.com instead of d12l7oswdn4whf.cloudfront.net.
This, as I see it, should also solve the SameSite=None issue, as the root domain is the same. I can confirm this, as the network request inside Chrome now shows that cookies are sent with the request to cloudfront - great!

BUT: Somehow, it still does not work. I sometimes get still wrong responses from cloudfront (the Set-Cookie header of the response contains the wrong value!) which makes no sense to me.

And, for some reason, I even still end up with 2 traefiksession cookies on Chrome - one for my actual domain cdntest.orderlion.com, one for the actual cloudfront domain cdn.orderlion.com. Different domains, same cookie name, different values .... go figure.

I do not understand any of this. :confused: ignoring that I am using Meteor, my setup with several containers in a docker swarm being updated while using sticky sessions AND cloudfront can't be that special, no?! Does nobody has this setup or has run into similar issues?!

Maybe you can find other Traefik and Cloudfront users at reddit.com/r/Traefik/