Help going from K3s to Docker Swarm (on Windows) Invalid request headers (6003)

Issue:
Having trouble getting SSL Certs as well as basic services up

My previous k3s setup was as follows:

Cert-Manager for SSL Certs using cloudflare/letsencrypt for issuing certs
Traefik for Ingresses listening on a specific IP provided by metallb load balancer (192.168.200.150)

A cloudlflare tunnel was used for all the exposed services, which would point to 192.168.200.150

I'm trying to replicate this, but using docker swarm.

I was able to get traefik up and running (as far as being able to get the dashboard accessible - though I can't seem to access it with insucure: false, just get 404 error)

Below are the files I have so far:

docker-compose.yml:

services:
  traefik:
    image: traefik:v2.11
    command:
      - "--configFile=/etc/traefik/traefik.yml"
    environment:
      CF_API_EMAIL: <REDACTED>
      CF_API_KEY: <REDACTED >
    ports:
      - "80:80"
      - "443:443"
      - "8081:8081"
    labels:
      - "traefik.enable=true"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
      - "./traefik.yml:/etc/traefik/traefik.yml"
      - "./acme.json:/acme.json"
      - "./entrypoint.sh:/startup.sh"
      - "./dynamic_conf.yml:/dynamic_conf.yml"
    networks:
      - traefik-public
    entrypoint: "/startup.sh"

  sd.lukium.ai:
    image: lukium/sd.lukium.ai:0.5.3
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.sd.rule=Host(`sd.lukium.ai`)"
      - "traefik.http.routers.sd.entrypoints=web,websecure"
      - "traefik.http.routers.sd.tls=true"
      - "traefik.http.routers.sd.tls.certresolver=letsencrypt"
    networks:
      - traefik-public

networks:
  traefik-public:
    external: true

traefik.yml:

# Global Traefik configuration
global:
  checkNewVersion: true
  sendAnonymousUsage: false

# Define HTTP and HTTPS entryPoints, including one for the dashboard
entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
          permanent: true
  websecure:
    address: ":443"
  traefik:
    address: ":8081"

# Enable Docker provider
providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    watch: true
    exposedByDefault: false
    swarmMode: true
    network: traefik-public
  file:
    filename: /dynamic_conf.yml

# API and Dashboard configuration, secured by basic auth
api:
  dashboard: true
  insecure: true

# Log settings
log:
  level: DEBUG

# Enable Let's Encrypt automatic SSL with Cloudflare DNS challenge
certificatesResolvers:
  letsencrypt:
    acme:
      email: <REDACTED> 
      storage: /acme.json  
      dnsChallenge:
        provider: cloudflare  
        delayBeforeCheck: "0"  
        resolvers:
          - "1.1.1.1:53"
          - "1.0.0.1:53"

dynamic_conf.yml:

tls:
  options:
    default:
      minVersion: VersionTLS12

http:
  routers:
    dashboard:
      rule: "Host(`traefik.lukium.ai`) && (PathPrefix(`/dashboard`) || PathPrefix(`/api`))"
      service: api@internal
      entryPoints:
        - "traefik"
      middlewares:
        - "auth"
      tls:
        certResolver: letsencrypt  
        domains:
          - main: lukium.ai
            sans:
              - '*.lukium.ai'

  middlewares:
    auth:
      basicAuth:
        users:
          # List of users allowed to access the dashboard, replace with your own
          - "<REDACTED>"

entrypoint.sh: #this is just to set the correct perms for acme.json since it's coming from windows

#!/bin/sh

chmod 600 /acme.json

exec /entrypoint.sh "$@"

Main errors I'm getting:

Unable to obtain ACME certificate for domains
error="unable to generate a certificate for the domains
error: one or more domains had a problem: lukium.ai *.lukium.ai
acme: error presenting token: cloudflare: failed to find zone lukium.ai
ListZonesContext command failed: Invalid request headers (6003)

I can't paste the whole log section because it complains about there being too many links (the domain name that the cert is being created for)

I know the cloudflare info is right:
I created a new token, ensuring that it has zone - zone - read and zone - dns - edit perms for all zones
I also know that the tunnel is working, because I have tested accessing the dashboard directly via the tunnel

Any help would be appreciated. Thanks!

Seems Cloudflare has an issue with lukium.ai. Is it registered there, with the right account?

I recommend to place the TLS with certresolver with main/sans directly on the entrypoint.

Usually Traefik requires labels under the deploy section when using Docker Swarm, maybe compare to simple Traefik Swarm example.

Personally I would not do those things on Windows, in the end it will probably run on Linux machines, why bother with the extra hassle? I got 2 tiny VMs to experiment with Swarm.

Are you suggesting essentially to create a VM in hyper-v with linux and run the docker swarm off of the VMs?

Would this work where each host (there are 3 total) has a VM that runs the Docker Swarm Node?

Again, this is a bit of a odd scenario for me. The constraint is that this needs to be done such that the main host is Windows... Otherwise, I would just go my previous route and run Proxmox > VMs > K3S/K8S

Thanks for the reply though :slight_smile:

Sorry, just disregards my comment about Windows.

No problem,

As for cloudflare and lukium.ai, the domain is registered elsewhere, however it's set to use cloudflare's nameservers and the DNS is managed there. I've never had an issue before, and essentially used a very similar setup in K3S to deploy SSL certs for the same domain on k3s without issue...

It just seems that I must be doing something wrong when getting the certs issued through traefik (where before I used cert-manager on k3s)

I'll try the link you provided though.

Thanks!

Maybe you need to use =

  services:
    webapp:
      image: my-webapp-image
      environment:
        - DEBUG=TRUE