Error output in dashboard?

As I am trying to debug why a traefik load balancer doesn't work (router does not appear in dashboard UI, routes 404), I wondered how I could get more insights into what is happening. I do not get any errors anywhere (logs, UI), but I found a screenshot via Google that showed errors where I have never seen them before. Do I miss some static debug setup to enable this, or is this from some feature that never got implemented?

The scenario I am currently in:

# static.yml
debug: true

  level: DEBUG
  filePath: /var/log/traefik/error.log
  format: json

  filePath: /var/log/traefik/access.log
  format: json # common
  # (i) Performance: Increase log lines to buffer before writing to file.
  bufferingSize: 1

The whole stack is built upon Docker Compose, so most conf happens via the Docker Provider a.k.a. labels. Only tls conf and Middleware use the File Provider via a dynamic.yml file. The reference is set in Docker labels by calling it with the @file suffix.

The following Nodejs container is not showing up in the dashboard and non-routable.

# docker-compose.nodejs.yml

version: '3'

    image: "node:${NODEJS_TAG:-current-alpine}"
      context: ${PWD}/nodejs
      # Only accessible during build time
        NODE_PORT: ${NODE_PORT:-3001}
        NODE_ENV: ${NODE_ENV:-production}
        NPM_CONFIG_PREFIX: ${NODE_CONFIG_PREFIX:-/home/node/.npm-global}
        NODE_EXTRA_CA_CERTS: "/usr/local/share/ca-certificates/rootCA.pem"
        SSL_CERT_DIR: "/usr/local/share/ca-certificates"
    hostname: "nodejs"
    command: sh ./bin/start ./app/server.js # <<<< Here's the spdy ExpressJS server, listening on Port 3001
    restart: on-failure
      - no-new-privileges:true
      - default
      - ${NODE_PORT:-3001}
      - ${PWD}/admin/dist/.env
      NODE_PORT: ${NODE_PORT:-3001}
      NPM_CONFIG_PREFIX: ${NODE_CONFIG_PREFIX:-/home/node/.npm-global}
    working_dir: /usr/src/data
      - ${PWD}/admin/dist:/usr/src/data
      - cert-storage:/usr/local/share/ca-certificates
      - nodejs-hidden-modules:/usr/src/data/node_modules # (i) The `node_modules` volume
      - "traefik.enable=true"
      - "${DEFAULT_NETWORK}"

      - "traefik.http.routers.yolo-router.rule=Host(`yolo.${ROOT_DOMAIN}`)"
      - "traefik.http.routers.yolo-router.entrypoints=https"
      - "traefik.http.routers.yolo-router.tls=true"
      - "traefik.http.routers.yolo-router.tls.options=default"
      - "traefik.http.routers.yolo-router.service=yolo"
      - "traefik.http.routers.yolo-router.middlewares=secured@file"
      - ""

The exposed port is 3001 and I can reach the container from other containers without a problem:

$ docker-compose exec notyolo /bin/sh
wget -qO- nodejs:3001
# prints "yolo"

This is everything I get in the logs:

==> ./logs/traefik/error.log <==
{"level":"debug","msg":"Serving default certificate for request: \"yolo.demo.test\"","time":"2021-01-29T23:24:49+01:00"}

==> ./logs/traefik/access.log <==

Calling the container via Traefik from the external/ host/ default network, results in a 404:

$ curl -k -L -v https://yolo.demo.test 

* Trying
* Connected to yolo.bell.test ( port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /Users/user/Library/Application Support/mkcert/rootCA.pem
*  CApath: /Users/user/Library/Application Support/mkcert/rootCA.pem
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: O=mkcert development certificate; OU=user@yolo.local (Firstname Lastname)
*  start date: Jan 24 23:02:26 2021 GMT
*  expire date: Apr 24 22:02:26 2023 GMT
*  issuer: O=mkcert development CA; OU=user@yolo.local (Firstname Lastname); CN=mkcert user@yolo.local (Firstname Lastname)
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7faa26012000)
> GET / HTTP/2
> Host: yolo.demo.test
> user-agent: curl/7.74.0
> accept: */*
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 404 
< content-type: text/plain; charset=utf-8
< x-content-type-options: nosniff
< content-length: 19
< date: Fri, 29 Jan 2021 22:24:49 GMT
404 page not found
* Connection #0 to host yolo.demo.test left intact

Any idea how I can continue debugging what's wrong with the load balancer config?

After two days of pulling my hair out, I finally figured out what's wrong: The container gets filtered out!

By accident I found the following line in the log, telling me that:

Traefik is filtering unhealthy or starting containers.

You can quickly try it for yourself by switching from an exit code of 0 (healthy) to 1 (unhealthy).

      test: [ "CMD-SHELL", "exit 0" ] # unhealthy: "exit 1"
      # test: [ "CMD", "node", "./healthcheck.js" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

The corresponding log.

  "container": "nodejs-yolo-1234506db4290d8fdff012345a72e2e94e35338111229cbd12ec12345b4bdc04",
  "level": "debug",
  "msg": "Filtering unhealthy or starting container",
  "providerName": "docker",

I transitioned from a setup with:

  • an HTTPS proxy that redirects everything to an TLS port and terminates everything behind it to continue in a "DMZ"
  • to an HTTPS only setup

The thing that wasn't transitioned yet was the healthcheck, which still listened on an HTTP port.

I do not really understand why Traefik as an ingress controller/ service mesh sees itself responsible to check Docker Containers health status. It makes sense to monitor and react upon changes to its own load balancers, but container status seems out of bound to me. Especially if there is no error, but a complete removal from the mesh and the UI. What bugs me most is that this is a single log entry. It will not repeat (couldn't find it) with later health checks, making it incredible hard to stumble upon it – and yourself extremely lucky if that happens.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.