Many docker containers handed SSL, one particular one keeps breaking, but

I have a very weird situation.

I am using traefik in docker, with 40+ other docker containers. Many of them are being proxied by Traefik 3.1.4. They all work great (for over a year) except one newer container that constantly stops working every 2-3 days and gives bad gateway errors. I am using the same configuration to hand out an SSL certificate and proxy the container.

Here's the bizarre part, if I restart traefik, the docker container, or any other container, the problem goes away and it's back up and responding again.

docker config for the container

    networks:
      - proxy
      - vikunja

    labels:
      traefik.enable: true
      traefik.http.routers.vikunja.rule: Host(`tasks.example.com`)
      traefik.http.routers.vikunja.entrypoints: websecure
      traefik.http.routers.vikunja.tls.certResolver: myresolver

networks:
  proxy:
    external: true
  vikunja:

This is the same four lines I add to every docker container I require SSL for.

The container will work perfectly and very fast for 2-3 days, then one day it will stop responding and I will get a "bad gateway error" or it is just really slow to time out.

Initially I was restarting the container, and it fixed the problem. Then I realized restarting Traefik fixed the problem as well. Then I stopped another container I wasn't using anymore and I noticed it fixed the problem too. This makes me feel like it is traefik and not the container. But this is the only container that does it, all my other 25+ ssl proxied containers run flawless.

I have looked at the logs off the container and traefik, and no error messages or any messages at all. Traefik runs error free, the container stops updating logs after this happens (as It is likely not receiving traffic anymore).

I'm completely lost as to what else to look at. It's bizarre it is only one container, and how doing anything that touches traefik seems to fix it for the next 2-3 days until it happens again. I've been using this setup for over a year, and it has worked fantastic for all the containers except for this one particular one.

When the problem happens, I have entered into the problem container, and confirmed it was still up and running, I could reach the app when using it's internal local docker ip and port, but I just can't via traefik proxy.

Do you set docker.network in your Traefik static config?

I have --providers.docker.network in my docker compose for traefik, but I did not specify a specific network. I just added --providers.docker.network=proxy to see if that helps. I haven't had any issues with any of the other containers in over a year of it running.

Is it possible it is using the wrong network?

I always include proxy network (external = true) for anything that needs to use traefik, if there is more than one service (like a db), I also include a named network specific to that stack.

Update: Nevermind, adding "proxy" to the --providers.docker.network gave me 404 errors on all containers.

Share your full Traefik static and dynamic config, and docker-compose.yml if used.

services:
  traefik:
    image: traefik:latest
    container_name: traefik
    restart: unless-stopped   
    command: 
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --entrypoints.ssh.address=:2222        
      - --providers.docker
      - --certificatesresolvers.myresolver.acme.dnschallenge=true
      - --certificatesresolvers.myresolver.acme.email=
      - --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
      - --certificatesresolvers.myresolver.acme.dnschallenge.provider=desec
      - --providers.file.directory=/etc/traefik
      - --providers.file.watch=true
    networks:
      - proxy 
    ports:
      - 80:80
      - 443:443
    environment:
      DESEC_TOKEN: 
      DESEC_HTTP_TIMEOUT: 600
      DESEC_PROPAGATION_TIMEOUT: 600
      DESEC_TTL: 900
      DESEC_POLLING_INTERVAL: 60
    healthcheck:
      test: traefik healthcheck --ping
    volumes:
      - ./letsencrypt:/letsencrypt
      - ~/volumes/traefik:/etc/traefik
      - /var/run/docker.sock:/var/run/docker.sock:ro
    labels:
      nautical-backup.enable: false

networks:
  proxy:
    name: "proxy"

I do have a few static files for non-docker hosts, but they all work fine. It is only one of 25+ docker hosts that has a problem, and it is always that and only that container. So the only configuration in question, is the docker compose.

This is the docker compose for Vikunja, using the same traefik configuration I do on all other hosts. I do not believe it is actually related to Vikunja as it is still working if I enter the vikunja container, it is only the proxy that failed. It also starts working if I restart traefik, vikunja, or even another container.

services:
  vikunja:
    image: vikunja/vikunja:latest 
    environment:
      VIKUNJA_SERVICE_PUBLICURL: https://
      VIKUNJA_DATABASE_HOST: db
      VIKUNJA_DATABASE_PASSWORD: 
      VIKUNJA_DATABASE_TYPE: postgres 
      VIKUNJA_DATABASE_USER: vikunja
      VIKUNJA_DATABASE_DATABASE: vikunja
      VIKUNJA_SERVICE_JWTSECRET: 
      VIKUNJA_MAILER_ENABLED: true
      VIKUNJA_MAILER_HOST: smtp-pulse.com
      VIKUNJA_MAILER_PORT: 587
      VIKUNJA_MAILER_AUTHTYPE: login
      VIKUNJA_MAILER_USERNAME: 
      VIKUNJA_MAILER_PASSWORD: 
      VIKUNJA_MAILER_FROMEMAIL: 

    volumes: 
      - ~/volumes/vikunja:/app/vikunja/files
    networks:
      - proxy 
      - vikunja
    depends_on:
      db:
        condition: service_healthy
    restart: unless-stopped
    labels:
      homepage.group: Tools
      homepage.name: Vikunja 
      homepage.icon: vikunja.png
      homepage.href: https://
      homepage.description: Task Tracking 
      traefik.enable: true
      traefik.http.routers.vikunja.rule: Host(``)
      traefik.http.routers.vikunja.entrypoints: websecure
      traefik.http.routers.vikunja.tls.certResolver: myresolver
  db:
    image: postgres:16 
    environment:
      POSTGRES_PASSWORD: 
      POSTGRES_USER: vikunja
    networks:
      - vikunja
    volumes:
      - ~/volumes/vikunja_db:/var/lib/postgresql/data
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -h localhost -U $$POSTGRES_USER"]
      interval: 2s
      start_period: 30s

networks:
  proxy:
    external: true
  vikunja:

Make sure to set docker.network, either static in command or dynamic in labels of target service.

Your target service has multiple networks, Traefik will select a random IP from one of those, even though it does not share all Docker networks.

1 Like

I think you are right, looking at some of my containers that have two networks (traefik then their own to talk to the db) I specified the proxy network for traefik.

I forgot I did this in the past, thanks!

Where do I set this globally rather than per container?

In static config (traefik.yml or command:), see simple Traefik example.

1 Like

Thanks, I get it now.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.