This is somewhat of a continuation to a previous question I had already asked. Here.
I'm having an issue where I get an infinite redirect loop between every instance of a load balanced service instead of hitting the actual endpoint. This is a cockroachdb healthcheck endpoint which is supposed to return empty json when the node is healthy and ready to receive client connections. Everything works fine when in insecure mode, but I get redirected to oblivion when I use a secure cluster with self signed certificates generated using a tool provided by Cockroachdb themselves as a cli. The certs are generated programmatically using environment variables in my docker compose file to set the relevant information on the certificates using a little script I wrote.
here are the relevant parts of my docker compose file :
services:
roach1:
image: cockroachdb/cockroach:v23.2.4
hostname: roach1.crabby-userdb.io
volumes:
- "roach1:/cockroach/cockroach-data"
- "certs:/certs:ro"
command: start --cluster-name=crabby-roach-secure --logtostderr=WARNING --log-file-verbosity=INFO --http-addr=0.0.0.0:8080 --cache=.25 --accept-sql-without-tls --advertise-addr=roach1.crabby-userdb.io:26257 --join=roach1.crabby-userdb.io:26257 --certs-dir=/certs
container_name: roach1
labels:
- "traefik.enable=true"
- "traefik.docker.network=roachnet"
# admin interface
- "traefik.http.routers.roach_admin.rule=Host(`traefik`) || HostRegexp(`^.+\\.crabby-userdb\\.io$`) "
- "traefik.http.routers.roach_admin.service=roach_admin_svc"
- "traefik.http.services.roach_admin_svc.loadbalancer.server.port=8080"
- "traefik.http.services.roach_admin_svc.loadbalancer.sticky=true"
- "traefik.http.routers.roach_admin.tls=true"
- "traefik.http.routers.roach_admin.entrypoints=traefik"
networks:
- roachnet
db-init:
image: baxydocker/db-init:latest
container_name: init
hostname: roach-init
command: "/tool"
environment:
- COCKROACH_HOST=traefik
- COCKROACH_PORT=26257
- COCKROACH_USER=root
- COCKROACH_INSECURE=false
- COCKROACH_CERTS_DIR=/certs
# uncomment the following line when first initializing a cluster
- COCKROACH_INIT=
- DATABASE_NAME=crabby_userdb
- DATABASE_PASSWORD=test
- DATABASE_USER=crabby
- HEALTHCHECK=https://traefik:8080/api/v2/health/?ready=1
depends_on:
# - traefik
certs_gen_tool:
condition: "service_completed_successfully"
networks:
- roachnet
volumes:
- "certs:/certs"
certs_gen_tool:
image: baxydocker/certs_gen:latest
container_name: certs_gen
command: "/gen"
environment:
- CLIENT_USERNAME=crabby
- NODE_ALTERNATIVE_NAMES=crabby-userdb.io *.crabby-userdb.io localhost traefik
# - COCKROACH_SKIP_KEY_PERMISSION_CHECK=true
volumes:
- "certs:/.cockroach-certs"
networks:
- roachnet
traefik:
image: traefik:v3.0.3
container_name: traefik
hostname: traefik
command:
- "--api=true"
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.watch=true"
- "--providers.docker.exposedbydefault=false"
- --providers.file.filename=dynamic.yml
# - "--log.filepath=traefik.log"
- "--entrypoints.web.address=:80"
# - "--entrypoints.roach_admin.address=:8080"
- "--entrypoints.roachdb_sql.address=:26257"
- "--log=true"
- "--log.filepath=/traefik.log"
- "--log.format=json"
- "--log.level=INFO"
- "--accesslog=true"
- "--accesslog.format=json"
- "--accesslog.addinternals=true"
- "--accesslog.filepath=/traefik.access.log"
- "--serverstransport.rootcas=/certs/ca.crt"
ports:
- "69:80"
- "8080:8080"
- "8081:8081"
- "26257:26257"
labels:
- "traefik.http.routers.dashboard.rule=Host(`traefik.localhost`) && PathPrefix(`/dashboard`))"
# - "traefik.http.routers.dashboard.service=api@internal"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
- "../../config/traefik/dynamic.yml:/dynamic.yml"
- "../../logs/traefik.log:/traefik.log"
- "../../logs/traefik.access.log:/traefik.access.log"
- "certs:/certs:ro"
depends_on:
# roach1:
# condition: "service_healthy"
# roach2:
# condition: "service_healthy"
# roach3:
# condition: "service_healthy"
certs_gen_tool:
condition: "service_completed_successfully"
networks:
- roachnet
here is the Traefik log with relevant startup info:
{"level":"info","version":"3.0.3","time":"2024-06-26T04:15:29Z","message":"Traefik version 3.0.3 built on 2024-06-18T14:31:20Z"}
{"level":"info","time":"2024-06-26T04:15:29Z","message":"\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://doc.traefik.io/traefik/contributing/data-collection/\n"}
{"level":"info","time":"2024-06-26T04:15:29Z","message":"Starting provider aggregator aggregator.ProviderAggregator"}
{"level":"info","time":"2024-06-26T04:15:29Z","message":"Starting provider *file.Provider"}
{"level":"info","time":"2024-06-26T04:15:29Z","message":"Starting provider *traefik.Provider"}
{"level":"info","time":"2024-06-26T04:15:29Z","message":"Starting provider *docker.Provider"}
{"level":"info","time":"2024-06-26T04:15:29Z","message":"Starting provider *acme.ChallengeTLSALPN"}
here is a snippet of the infinite redirect loop:
{"ClientAddr":"172.19.0.2:40406","ClientHost":"172.19.0.2","ClientPort":"40406","ClientUsername":"-","DownstreamContentSize":79,"DownstreamStatus":307,"Duration":111877,"OriginContentSize":79,"OriginDuration":105723,"OriginStatus":307,"Overhead":6154,"RequestAddr":"traefik:8080","RequestContentSize":0,"RequestCount":70030,"RequestHost":"traefik","RequestMethod":"GET","RequestPath":"/api/v2/health/?ready=1","RequestPort":"8080","RequestProtocol":"HTTP/1.1","RequestScheme":"https","RetryAttempts":0,"RouterName":"roach_admin@docker","ServiceAddr":"172.19.0.4:8080","ServiceName":"roach_admin_svc@docker","ServiceURL":"http://172.19.0.4:8080","StartLocal":"2024-06-26T04:16:53.571632006Z","StartUTC":"2024-06-26T04:16:53.571632006Z","TLSCipher":"TLS_AES_128_GCM_SHA256","TLSVersion":"1.3","entryPointName":"traefik","level":"info","msg":"","time":"2024-06-26T04:16:53Z"}
{"ClientAddr":"172.19.0.2:40406","ClientHost":"172.19.0.2","ClientPort":"40406","ClientUsername":"-","DownstreamContentSize":79,"DownstreamStatus":307,"Duration":67095,"OriginContentSize":79,"OriginDuration":61100,"OriginStatus":307,"Overhead":5995,"RequestAddr":"traefik:8080","RequestContentSize":0,"RequestCount":70031,"RequestHost":"traefik","RequestMethod":"GET","RequestPath":"/api/v2/health/?ready=1","RequestPort":"8080","RequestProtocol":"HTTP/1.1","RequestScheme":"https","RetryAttempts":0,"RouterName":"roach_admin@docker","ServiceAddr":"172.19.0.6:8080","ServiceName":"roach_admin_svc@docker","ServiceURL":"http://172.19.0.6:8080","StartLocal":"2024-06-26T04:16:53.571831913Z","StartUTC":"2024-06-26T04:16:53.571831913Z","TLSCipher":"TLS_AES_128_GCM_SHA256","TLSVersion":"1.3","entryPointName":"traefik","level":"info","msg":"","time":"2024-06-26T04:16:53Z"}
{"ClientAddr":"172.19.0.2:40406","ClientHost":"172.19.0.2","ClientPort":"40406","ClientUsername":"-","DownstreamContentSize":79,"DownstreamStatus":307,"Duration":80129,"OriginContentSize":79,"OriginDuration":74819,"OriginStatus":307,"Overhead":5310,"RequestAddr":"traefik:8080","RequestContentSize":0,"RequestCount":70032,"RequestHost":"traefik","RequestMethod":"GET","RequestPath":"/api/v2/health/?ready=1","RequestPort":"8080","RequestProtocol":"HTTP/1.1","RequestScheme":"https","RetryAttempts":0,"RouterName":"roach_admin@docker","ServiceAddr":"172.19.0.4:8080","ServiceName":"roach_admin_svc@docker","ServiceURL":"http://172.19.0.4:8080","StartLocal":"2024-06-26T04:16:53.571992258Z","StartUTC":"2024-06-26T04:16:53.571992258Z","TLSCipher":"TLS_AES_128_GCM_SHA256","TLSVersion":"1.3","entryPointName":"traefik","level":"info","msg":"","time":"2024-06-26T04:16:53Z"}
{"ClientAddr":"172.19.0.2:40406","ClientHost":"172.19.0.2","ClientPort":"40406","ClientUsername":"-","DownstreamContentSize":79,"DownstreamStatus":307,"Duration":83624,"OriginContentSize":79,"OriginDuration":77065,"OriginStatus":307,"Overhead":6559,"RequestAddr":"traefik:8080","RequestContentSize":0,"RequestCount":70033,"RequestHost":"traefik","RequestMethod":"GET","RequestPath":"/api/v2/health/?ready=1","RequestPort":"8080","RequestProtocol":"HTTP/1.1","RequestScheme":"https","RetryAttempts":0,"RouterName":"roach_admin@docker","ServiceAddr":"172.19.0.5:8080","ServiceName":"roach_admin_svc@docker","ServiceURL":"http://172.19.0.5:8080","StartLocal":"2024-06-26T04:16:53.572192763Z","StartUTC":"2024-06-26T04:16:53.572192763Z","TLSCipher":"TLS_AES_128_GCM_SHA256","TLSVersion":"1.3","entryPointName":"traefik","level":"info","msg":"","time":"2024-06-26T04:16:53Z"}
{"ClientAddr":"172.19.0.2:40406","ClientHost":"172.19.0.2","ClientPort":"40406","ClientUsername":"-","DownstreamContentSize":79,"DownstreamStatus":307,"Duration":67247,"OriginContentSize":79,"OriginDuration":62063,"OriginStatus":307,"Overhead":5184,"RequestAddr":"traefik:8080","RequestContentSize":0,"RequestCount":70034,"RequestHost":"traefik","RequestMethod":"GET","RequestPath":"/api/v2/health/?ready=1","RequestPort":"8080","RequestProtocol":"HTTP/1.1","RequestScheme":"https","RetryAttempts":0,"RouterName":"roach_admin@docker","ServiceAddr":"172.19.0.4:8080","ServiceName":"roach_admin_svc@docker","ServiceURL":"http://172.19.0.4:8080","StartLocal":"2024-06-26T04:16:53.572364003Z","StartUTC":"2024-06-26T04:16:53.572364003Z","TLSCipher":"TLS_AES_128_GCM_SHA256","TLSVersion":"1.3","entryPointName":"traefik","level":"info","msg":"","time":"2024-06-26T04:16:53Z"}
here is the result of this command docker exec -it roach1 curl -ksvL /dev/null https://traefik:8080/api/v2/health/?ready=1
:
* Trying 172.19.0.3:8080...
* Connected to traefik (172.19.0.3) port 8080 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Unknown (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=TRAEFIK DEFAULT CERT
* start date: Jun 26 04:15:29 2024 GMT
* expire date: Jun 26 04:15:29 2025 GMT
* issuer: CN=TRAEFIK DEFAULT CERT
* SSL certificate verify result: self-signed certificate (18), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Unknown (23):
* TLSv1.2 (OUT), TLS header, Unknown (23):
* TLSv1.2 (OUT), TLS header, Unknown (23):
* Using Stream ID: 1 (easy handle 0x55c49177c0a0)
* TLSv1.2 (OUT), TLS header, Unknown (23):
> GET /api/v2/health/?ready=1 HTTP/2
> Host: traefik:8080
> user-agent: curl/7.76.1
> accept: */*
>
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.2 (IN), TLS header, Unknown (23):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
* TLSv1.2 (OUT), TLS header, Unknown (23):
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.2 (IN), TLS header, Unknown (23):
* TLSv1.2 (IN), TLS header, Unknown (23):
< HTTP/2 307
< content-type: text/html; charset=utf-8
< date: Wed, 26 Jun 2024 04:23:04 GMT
< location: https://traefik:8080/api/v2/health/?ready=1
< content-length: 79
<
* TLSv1.2 (IN), TLS header, Unknown (23):
* Ignoring the response-body
* Connection #0 to host traefik left intact
* Issue another request to this URL: 'https://traefik:8080/api/v2/health/?ready=1'
* Found bundle for host traefik: 0x55c4917709f0 [can multiplex]
* Re-using existing connection! (#0) with host traefik
* Connected to traefik (172.19.0.3) port 8080 (#0)
* Using Stream ID: 3 (easy handle 0x55c49177c0a0)
* TLSv1.2 (OUT), TLS header, Unknown (23):
> GET /api/v2/health/?ready=1 HTTP/2
> Host: traefik:8080
> user-agent: curl/7.76.1
> accept: */*
>
* TLSv1.2 (IN), TLS header, Unknown (23):
< HTTP/2 307
< content-type: text/html; charset=utf-8
< date: Wed, 26 Jun 2024 04:23:04 GMT
< location: https://traefik:8080/api/v2/health/?ready=1
< content-length: 79
<
* TLSv1.2 (IN), TLS header, Unknown (23):
* Ignoring the response-body
* Connection #0 to host traefik left intact
* Issue another request to this URL: 'https://traefik:8080/api/v2/health/?ready=1'
* Found bundle for host traefik: 0x55c4917709f0 [can multiplex]
* Re-using existing connection! (#0) with host traefik
* Connected to traefik (172.19.0.3) port 8080 (#0)
* Using Stream ID: 5 (easy handle 0x55c49177c0a0)
* TLSv1.2 (OUT), TLS header, Unknown (23):
> GET /api/v2/health/?ready=1 HTTP/2
> Host: traefik:8080
> user-agent: curl/7.76.1
> accept: */*
>
* TLSv1.2 (IN), TLS header, Unknown (23):
< HTTP/2 307
< content-type: text/html; charset=utf-8
< date: Wed, 26 Jun 2024 04:23:04 GMT
< location: https://traefik:8080/api/v2/health/?ready=1
< content-length: 79
<
* TLSv1.2 (IN), TLS header, Unknown (23):
* Ignoring the response-body
* Connection #0 to host traefik left intact
* Issue another request to this URL: 'https://traefik:8080/api/v2/health/?ready=1'
* Found bundle for host traefik: 0x55c4917709f0 [can multiplex]
* Re-using existing connection! (#0) with host traefik
* Connected to traefik (172.19.0.3) port 8080 (#0)
* Using Stream ID: 7 (easy handle 0x55c49177c0a0)
* TLSv1.2 (OUT), TLS header, Unknown (23):
> GET /api/v2/health/?ready=1 HTTP/2
> Host: traefik:8080
> user-agent: curl/7.76.1
> accept: */*
>
* TLSv1.2 (IN), TLS header, Unknown (23):
< HTTP/2 307
< content-type: text/html; charset=utf-8
< date: Wed, 26 Jun 2024 04:23:04 GMT
< location: https://traefik:8080/api/v2/health/?ready=1
< content-length: 79
<
* TLSv1.2 (IN), TLS header, Unknown (23):
* Ignoring the response-body
* Connection #0 to host traefik left intact
* Issue another request to this URL: 'https://traefik:8080/api/v2/health/?ready=1'
* Found bundle for host traefik: 0x55c4917709f0 [can multiplex]
* Re-using existing connection! (#0) with host traefik
* Connected to traefik (172.19.0.3) port 8080 (#0)
* Using Stream ID: 9 (easy handle 0x55c49177c0a0)
* TLSv1.2 (OUT), TLS header, Unknown (23):
> GET /api/v2/health/?ready=1 HTTP/2
> Host: traefik:8080
> user-agent: curl/7.76.1
> accept: */*
I think you get the point, it terminates with :
* TLSv1.2 (IN), TLS header, Unknown (23):
< HTTP/2 307
< content-type: text/html; charset=utf-8
< date: Wed, 26 Jun 2024 04:23:04 GMT
< location: https://traefik:8080/api/v2/health/?ready=1
< content-length: 79
<
* TLSv1.2 (IN), TLS header, Unknown (23):
* Ignoring the response-body
* Connection #0 to host traefik left intact
* Maximum (50) redirects followed
And finally these are the containers listed when "docker network inspect roachnet" is called :
{
"Name": "roachnet",
"Id": "5d261f8a829912dcd8b74aa289237ac68e4cdbebe6d4f311e0f7ae022515e12e",
"Created": "2024-04-28T03:21:21.470836064Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.19.0.0/16",
"Gateway": "172.19.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"2e726eb6925f649dc1aba2e77343cda44285a96dd75ae36957b569db0ce31858": {
"Name": "roach3",
"EndpointID": "b6f63952b227932d54e16de8717ae1be4fda067b20d01a14374e7521dca05979",
"MacAddress": "02:42:ac:13:00:05",
"IPv4Address": "172.19.0.5/16",
"IPv6Address": ""
},
"48878add56f7832188744838459da411cd2825a35e55d92e197be43f1ac53aa7": {
"Name": "roach2",
"EndpointID": "842e3e9092fad32206112cdf6c5ecff94cdd255195c6ebd617ff041c95ca1d1a",
"MacAddress": "02:42:ac:13:00:04",
"IPv4Address": "172.19.0.4/16",
"IPv6Address": ""
},
"73a556ad11e18cdd8b281a3908712602b7c5deadfeafdb9c4d7587bd463ed552": {
"Name": "traefik",
"EndpointID": "fce2c7908670bc99fba1d65344601ab3ea1879d394a72c71507772f20954907c",
"MacAddress": "02:42:ac:13:00:03",
"IPv4Address": "172.19.0.3/16",
"IPv6Address": ""
},
"a05cb646207aa02a365f44ab5c1ffa58cd1e7a23e880c8ed33dcf6c58c3d2708": {
"Name": "roach1",
"EndpointID": "563dbffe557ab21b7971d6e5fa632286a6bdfa016fb1b5757caefb46af0bc406",
"MacAddress": "02:42:ac:13:00:06",
"IPv4Address": "172.19.0.6/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {}
}
]
roachnet was a network created for testing cockroachdb initially, I chose to use it to have my cluster communicate together on it.
We can see that IPs ending in 6, 4 or 5 are all being hit when the request gets to traefik, so the service and routers work as intended I just don't understand why they keep redirecting instead of returning. There are no informative logs after they have started up and have completed their startup. No errors, no info, nothing.