Hello
I am running Traefik on bare metal MacOS (no docker, no kubernetes)
Traefik is installed via brew and running as local daemon
Whole setup is behind Cloudflare
After start, everything is fine, but in few hours, suddenly everything brokes in interesting way:
- I can not access any services, aka grafana.example.com is down
- Traefik itself alive - localhost:8080 displays Traefik dashboard without any errors or waarnings
- There is nothing interesting in Traefik logs, even in debug mode
- No one listens 443 port
sudo lsof -i -P | grep LISTEN | grep :443
- this one is the reason probably
While trying to figure out what is going on i have:
- Reconfigure everything to run Traefik as
root
- just in case to make sure it is not related to MacOS protection (it does not like when someone binds to low port numbers) - Removed fancy plugin for oidc - idea was to get rid of any 3rd parties
- Switched logs to debug mode - idea was to see something related - but nothing
- Discovered that i was missing
KeepAlive
in service plist, but because it does not actually crash, system does not restart it
Setup is described here
Configs are:
static.yml
api:
insecure: true
log:
level: DEBUG
metrics:
prometheus: {}
ping: {}
providers:
file:
filename: /Users/mini/.config/traefik/dynamic.yml
certificatesResolvers:
cloudflare:
acme:
email: user@gmail.com
storage: /Users/mini/.config/traefik/acme.json
dnsChallenge:
provider: cloudflare
resolvers:
- 1.1.1.1:53
- 1.0.0.1:53
entryPoints:
https:
address: :443
asDefault: true
http:
tls:
certResolver: cloudflare
domains:
- main: example.com
sans:
- "*.example.com"
forwardedHeaders:
trustedIPs:
- 173.245.48.0/20
# ... removed to save the space
- 2c0f:f248::/32
# # i was thinking it may be the reason but it is not
#experimental:
# plugins:
# google-oidc-auth-middleware:
# moduleName: github.com/andrewkroh/google-oidc-auth-middleware
# version: v0.1.0
as you will see, nothing fancy here, we are using file based provider, cloudflare for certificate rolver and single https entry point
dynamic.yml
http:
routers:
grafana:
rule: Host(`grafana.example.com`)
service: grafana
prometheus:
rule: Host(`prometheus.example.com`)
service: prometheus
services:
grafana:
loadBalancer:
servers:
- url: http://localhost:3000/
prometheus:
loadBalancer:
servers:
- url: http://localhost:9090/
tls:
options:
default:
sniStrict: true
clientAuth:
caFiles:
- /Users/mini/.config/traefik/authenticated_origin_pull_ca.pem
clientAuthType: RequireAndVerifyClientCert
this one is even simpler, thanks to single default entry point
TLDR: we have two routes/sevice - prometheus and grafana
/Library/LaunchDaemons/traefik.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>traefik</string>
<key>ProgramArguments</key>
<array>
<string>/opt/homebrew/opt/traefik/bin/traefik</string>
<string>--configfile=/Users/mini/.config/traefik/static.yml</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>CLOUDFLARE_DNS_API_TOKEN</key>
<string>[redacted]</string>
</dict>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>UserName</key>
<string>root</string>
<key>StandardErrorPath</key>
<string>/opt/homebrew/var/log/traefik.log</string>
<key>StandardOutPath</key>
<string>/opt/homebrew/var/log/traefik.log</string>
<key>WorkingDirectory</key>
<string>/opt/homebrew/var</string>
<key>ProcessType</key>
<string>Background</string>
<key>ThrottleInterval</key>
<integer>30</integer>
</dict>
</plist>
this one just to give understanding of how it runs, key points here:
- we are passing our
statis.yml
- we are running in background
- we are running as root
Here is snippet of logs
there is nothing interesting or useful, or describing whats going on unfortunately
/opt/homebrew/var/log/traefik.log
2025-03-31T02:50:20+03:00 DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:213 > Service selected by WRR: http://localhost:3000/
2025-03-31T02:50:20+03:00 DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:213 > Service selected by WRR: http://localhost:3000/
2025-03-31T02:54:28+03:00 DBG github.com/traefik/traefik/v3/pkg/middlewares/auth/basic_auth.go:82 > Authentication failed 36mmiddlewareName=auth@file 36mmiddlewareType=BasicAuth
2025-03-31T02:55:20+03:00 DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:213 > Service selected by WRR: http://localhost:3000/
2025-03-31T02:55:20+03:00 DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:213 > Service selected by WRR: http://localhost:3000/
2025-03-31T02:59:28+03:00 DBG github.com/traefik/traefik/v3/pkg/middlewares/auth/basic_auth.go:82 > Authentication failed 36mmiddlewareName=auth@file 36mmiddlewareType=BasicAuth
2025-03-31T03:00:20+03:00 DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:213 > Service selected by WRR: http://localhost:3000/
2025-03-31T03:00:20+03:00 DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:213 > Service selected by WRR: http://localhost:3000/
2025-03-31T03:00:28+03:00 DBG github.com/traefik/traefik/v3/pkg/tls/tlsmanager.go:216 > TLS: strict SNI enabled - No certificate found for domain: "nist.gov", closing connection
2025-03-31T03:00:28+03:00 DBG github.com/traefik/traefik/v3/pkg/tls/tlsmanager.go:216 > TLS: strict SNI enabled - No certificate found for domain: "nist.gov", closing connection
2025-03-31T03:00:28+03:00 DBG log/log.go:245 > http: TLS handshake error from 5.182.209.113:13426: tls: no certificates configured
2025-03-31T03:00:28+03:00 DBG log/log.go:245 > http: TLS handshake error from 5.182.209.113:13406: tls: no certificates configured
2025-03-31T03:00:28+03:00 DBG github.com/traefik/traefik/v3/pkg/tls/tlsmanager.go:216 > TLS: strict SNI enabled - No certificate found for domain: "nist.gov", closing connection
2025-03-31T03:00:28+03:00 DBG log/log.go:245 > http: TLS handshake error from 5.182.209.113:13410: tls: no certificates configured
2025-03-31T03:00:28+03:00 DBG github.com/traefik/traefik/v3/pkg/tls/tlsmanager.go:216 > TLS: strict SNI enabled - No certificate found for domain: "nist.gov", closing connection
2025-03-31T03:00:28+03:00 DBG log/log.go:245 > http: TLS handshake error from 5.182.209.113:13400: tls: no certificates configured
2025-03-31T03:04:29+03:00 DBG github.com/traefik/traefik/v3/pkg/middlewares/auth/basic_auth.go:82 > Authentication failed 36mmiddlewareName=auth@file 36mmiddlewareType=BasicAuth
2025-03-31T03:05:20+03:00 DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:213 > Service selected by WRR: http://localhost:3000/
2025-03-31T03:05:20+03:00 DBG github.com/traefik/traefik/v3/pkg/server/service/loadbalancer/wrr/wrr.go:213 > Service selected by WRR: http://localhost:3000/
Questions:
- may be i'm missing some obvious setting or done some dumb configuration?
- is there a way to somehow answer the question does Traefik have the binding or looses it?
- what else can I do to investigate it further?
So far it is not clear what's going on - trying to reach Reddit community