Traefik socket proxy connection timeout error

Running traefik as reverse proxy with Cloudflare / Letsencrypt and recently added socket proxy for additional security layer. The setup seems to work (dashboard accessible), but getting some weird socket proxy errors in the logs, see pastebins for docker log & traefik log.

Any idea what's going on here? It's showing up every 4 minutes. See relevant configs below:

docker compose:

##### NETWORKS
networks:
  default:
    driver: bridge
  socket_proxy:
    name: socket_proxy
    driver: bridge
  t3_proxy:
    name: t3_proxy
    driver: bridge

##### SECRETS
secrets:
  basic_auth_credentials:
    file: ./secrets/basic_auth_credentials
  cf_api_email:
    file: ./secrets/cf_api_email
  cf_dns_api_token:
    file: ./secrets/cf_dns_api_token

##### SERVICES
include:
  # CORE
  - compose/socket-proxy.yml
  - compose/traefik.yml

traefik.yml

services:
  traefik:
    container_name: traefik
    image: traefik:3.0
    security_opt:
      - no-new-privileges=true
    restart: unless-stopped
    ports:
      - 8008:80
      - 8443:443
    networks:
      t3_proxy:
      socket_proxy:
    command:
      - --global.checkNewVersion=true
      - --global.sendAnonymousUsage=true
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --entrypoints.websecure.http.tls=true
      - --entrypoints.web.http.redirections.entrypoint.to=websecure
      - --entrypoints.web.http.redirections.entrypoint.scheme=https
      - --entrypoints.web.http.redirections.entrypoint.permanent=true
      - --api=true
      - --api.dashboard=true
      - --entrypoints.websecure.forwardedHeaders.trustedIPs=$CLOUDFLARE_IPS,$LOCAL_IPS
      - --log=true
      - --log.filePath=/logs/traefik.log
      - --log.level=INFO
      - --accessLog=true
      - --accessLog.filePath=/logs/access.log
      - --accessLog.bufferingSize=100
      - --accessLog.filters.statusCodes=204-299,400-499,500-599
      - --providers.docker=true
      # - --providers.docker.endpoint=unix:///var/run/docker.sock # Disable for Socket Proxy. Enable otherwise.
      - --providers.docker.endpoint=tcp://socket-proxy:2375 # Enable for Socket Proxy. Disable otherwise.
      - --providers.docker.exposedByDefault=false
      - --providers.docker.network=t3_proxy
      - --entrypoints.websecure.http.tls.options=tls-opts@file
      - --entrypoints.websecure.http.tls.certresolver=dns-cloudflare
      - --entrypoints.websecure.http.tls.domains[0].main=$DOMAINNAME
      - --entrypoints.websecure.http.tls.domains[0].sans=*.$DOMAINNAME
      - --providers.file.directory=/rules
      - --providers.file.watch=true
      # - --certificatesResolvers.dns-cloudflare.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory # LetsEncrypt Staging Server - uncomment when testing
      - --certificatesResolvers.dns-cloudflare.acme.storage=/acme.json
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.provider=cloudflare
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.delayBeforeCheck=90
    volumes:
      # - /var/run/docker.sock:/var/run/docker.sock:ro # Enable if not using Socket Proxy
      - $DOCKERDIR/appdata/traefik/letsencrypt/acme.json:/acme.json
      - $DOCKERDIR/appdata/traefik/logs:/logs
      - $DOCKERDIR/appdata/traefik/rules:/rules
    environment:
      - TZ=$TZ
      - CF_API_EMAIL_FILE=/run/secrets/cf_api_email
      - CF_DNS_API_TOKEN_FILE=/run/secrets/cf_dns_api_token
      - DOMAINNAME
    secrets:
      - basic_auth_credentials
      - cf_dns_api_token
      - cf_api_email
    labels:
      - "traefik.enable=true"
      # HTTP Routers
      - "traefik.http.routers.traefik-rtr.entrypoints=websecure"
      - "traefik.http.routers.traefik-rtr.rule=Host(`traefik.$DOMAINNAME`)"
      # Services - API
      - "traefik.http.routers.traefik-rtr.service=api@internal"
      # Middlewares
      - "traefik.http.routers.traefik-rtr.middlewares=chain-basic-auth@file"

socket-proxy.yml

services:
  socket-proxy:
    image: lscr.io/linuxserver/socket-proxy:latest
    container_name: socket-proxy
    environment:
      - LOG_LEVEL=info
      - CONTAINERS=1
      - NETWORKS=1
      - POST=0
    volumes: [
      {type: bind, source: /var/run/docker.sock, target: /var/run/docker.sock, read_only: true},
      {type: tmpfs, target: /run}
    ]
    networks:
      - socket_proxy
    restart: always
    security_opt:
      - no-new-privileges=true
    read_only: true

And the million dollar question: how do I fix this?

(EDITS: made some updates to my traefik config in meantime, added additional middlewares & chains. Still encountering same issue as described above)

This example works for 10 minutes without error:

services:
  traefik:
    image: traefik:v3.0
    ports:
      - 80:80
      - 443:443
    networks:
      - socket
      - proxy
    volumes:
      - letsencrypt:/letsencrypt
      #- /var/log:/var/log
    command:
      - --api.dashboard=true
      - --log.level=DEBUG
      #- --log.filepath=/var/log/traefik.log
      - --accesslog=true
      #- --accesslog.filepath=/var/log/traefik-access.log
      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entrypoint.to=websecure
      - --entryPoints.web.http.redirections.entrypoint.scheme=https
      - --entrypoints.websecure.address=:443
      - --entrypoints.websecure.http.tls.certresolver=myresolver
      - --entrypoints.websecure.asDefault=true
      - --certificatesresolvers.myresolver.acme.email=mail@example.com
      - --certificatesresolvers.myresolver.acme.tlschallenge=true
      - --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
      - --providers.docker.endpoint=tcp://socketproxy:2375
      - --providers.docker.exposedByDefault=false
      - --providers.docker.network=proxy
    labels:
      - traefik.enable=true
      - traefik.http.routers.mydashboard.rule=Host(`traefik.example.com`)
      - traefik.http.routers.mydashboard.service=api@internal
      - traefik.http.routers.mydashboard.middlewares=myauth
      - traefik.http.middlewares.myauth.basicauth.users=test:$$apr1$$H6uskkkW$$IgXLP6ewTrSuBkTrqE8wj/

  socketproxy:
    image: tecnativa/docker-socket-proxy:edge
    networks:
      - socket
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - LOG_LEVEL=warning
      - POST=0
      - EVENTS=1
      - PING=1
      - VERSION=1
      - CONTAINERS=1

  whoami:
    image: traefik/whoami:v1.10
    networks:
      - proxy
    labels:
      - traefik.enable=true
      - traefik.http.routers.mywhoami.rule=Host(`whoami.example.com`) || PathPrefix(`/whoami`)
      - traefik.http.services.mywhoami.loadbalancer.server.port=80

networks:
  proxy:
    name: proxy
  socket:
    name: socket

volumes:
  letsencrypt:
    name: letsencrypt

Make sure to update to Traefik latest v3.0.1.

thx for your great help as always @bluepuma77, the error is indeed gone :clap:

What would be the recommended order of adding existing code from my original config back in to identify the part that's causing the issue? Any services or commands to start with?

After swapping the socketproxy container you provided above back to socket-proxy from linuxservio.io the upstream error comes back again traefik log.

Kept configuration changes in traefik.yml to a bare minimum (only adjusted the certresolver with my original values (dns-cloudflare) to remove earlier errors around certificates, also added secrets for cloudflare, see below.

      - --entrypoints.websecure.http.tls.certresolver=dns-cloudflare
      - --certificatesresolvers.dns-cloudflare.acme.email=$EMAIL
      - --certificatesResolvers.dns-cloudflare.acme.storage=/acme.json
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.provider=cloudflare
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
  environment:
      - CF_API_EMAIL_FILE=/run/secrets/cf_api_email
      - CF_DNS_API_TOKEN_FILE=/run/secrets/cf_dns_api_token
    secrets:
      - cf_dns_api_token
      - cf_api_email

Using the standard config provided by socket-proxy (link here).

Can't completely rule out whether the error also is in the socketproxy from tecnativa, but didn't see anything in the traefik log. Anyway to check the logs for that container specifically?

Try to add NETWORKS =1 AND SERVICES=1 to socket proxy environmental variables.

Try updating this - --providers.docker.endpoint=tcp://socketproxy:2375

To

  • --providers.docker.endpoint=tcp://socket-proxy:2375

Both Docker images use the same code Just make sure to not use tecnativa latest, as that is 3 years old.

Update: do use latest as that is maintained now, instead edge seems stale now (link) :rofl:

As @badfella mentioned, make sure to use the Docker proxy service name in compose as proxy endpoint.

1 Like

Added the NETWORKS & SERVICES variables @badfella suggested, tcp was already set correctly. Still same issue with linuxserver container.

Just changed to tecnativa image and it's totally broken now:

2024-05-28T07:13:23Z ERR github.com/traefik/traefik/v3/pkg/provider/docker/pdocker.go:156 > Provider error, retrying in 1.417934746s error="error during connect: Get \"http://socket-proxy:2375/v1.24/version\": dial tcp: lookup socket-proxy on 127.0.0.11:53: server misbehaving" providerName=docker
2024-05-28T07:13:25Z ERR github.com/traefik/traefik/v3/pkg/provider/docker/pdocker.go:85 > Failed to retrieve information of the docker client and server host error="error during connect: Get \"http://socket-proxy:2375/v1.24/version\": dial tcp: lookup socket-proxy on 127.0.0.11:53: server misbehaving" providerName=docker

for visibility see socket-proxy and traefik configs below:

services:
  socket-proxy:
    image: tecnativa/docker-socket-proxy:latest
    container_name: socket-proxy
    environment:
      - LOG_LEVEL=warning
      - CONTAINERS=1
      - EVENTS=1
      - PING=1
      - POST=0
      - VERSION=1
      - NETWORKS=1
      - SERVICES=1
    networks:
      - socket
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    restart: unless-stopped
    read_only: true
    tmpfs:
      - /run
services:
  traefik:
    image: traefik:v3.0.1
    ports:
      - 8008:80
      - 8443:443
    networks:
      - socket
      - proxy
    volumes:
      - $DOCKERDIR/appdata/temp-traefik/letsencrypt:/letsencrypt
      - $DOCKERDIR/appdata/temp-traefik/logs:/logs
    command:
      - --api.dashboard=true
      - --log=true
      - --log.level=DEBUG
      - --log.filepath=/logs/traefik.log
      - --accesslog=true
      - --accesslog.filepath=/logs/access.log
      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entrypoint.to=websecure
      - --entryPoints.web.http.redirections.entrypoint.scheme=https
      - --entrypoints.websecure.address=:443
      - --entrypoints.websecure.http.tls.certresolver=dns-cloudflare
      - --certificatesresolvers.dns-cloudflare.acme.email=$EMAIL
      - --certificatesResolvers.dns-cloudflare.acme.storage=/acme.json
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.provider=cloudflare
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
      - --providers.docker.endpoint=tcp://socket-proxy:2375
      - --providers.docker.exposedByDefault=false
      - --providers.docker.network=proxy
    environment:
      - CF_API_EMAIL_FILE=/run/secrets/cf_api_email
      - CF_DNS_API_TOKEN_FILE=/run/secrets/cf_dns_api_token
    secrets:
      - cf_dns_api_token
      - cf_api_email
    labels:
      - traefik.enable=true
      - traefik.http.routers.mydashboard.rule=Host(`traefik.$DOMAINNAME`)
      - traefik.http.routers.mydashboard.service=api@internal
      - traefik.http.routers.mydashboard.middlewares=myauth
      - traefik.http.middlewares.myauth.basicauth.users=XXX

What am I doing wrong?

I've stopped all running containers, did system prune --all and also stopped docker. Still getting same issue when running compose.

Where is the IP 127.0.0.11 coming from?

And you added some settings:

Are you sure it’s running with those?

Tried also when removing those settings, still same issue. No idea where the 127.0.0.11 is coming from. Assume dynamic IP through main config temp-docker-compose.yml?

##### NETWORKS
networks:
  proxy:
    name: proxy
  socket:
    name: socket

##### SECRETS
secrets:
#  basic_auth_credentials:
#    file: ./secrets/basic_auth_credentials
  cf_api_email:
    file: ./secrets/cf_api_email
  cf_dns_api_token:
    file: ./secrets/cf_dns_api_token

##### SERVICES
include:
  - compose/temp-traefik.yml
  - compose/temp-socket-proxy.yml
#  - compose/temp-whoami.yml

Ok this is getting interesting, after 5-6 minutes it suddenly appears to start working:

2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/provider/docker/pdocker.go:89 > Provider connection established with docker 26.1.3 (API 1.45) providerName=docker
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/provider/docker/config.go:184 > Filtering disabled container container=socket-proxy-docker-892a076e5899ba6a7f0977ab4285295ab5b503f6c3ca9df8f6c6fe097dc80fa6 providerName=docker
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/server/configurationwatcher.go:227 > Configuration received config={"http":{"middlewares":{"myauth":{"basicAuth":{"users":["test:$apr1$H6uskkkW$IgXLP6ewTrSuBkTrqE8wj/"]}}},"routers":{"mydashboard":{"middlewares":["myauth"],"rule":"Host(`traefik.mydomain.com`)","service":"api@internal"}},"services":{"traefik-docker":{"loadBalancer":{"passHostHeader":true,"responseForwarding":{"flushInterval":"100ms"},"servers":[{"url":"http://192.168.160.2:80"}]}}}},"tcp":{},"tls":{},"udp":{}} providerName=docker
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/server/aggregator.go:51 > No entryPoint defined for this router, using the default one(s) instead entryPointName=["web","websecure"] routerName=mydashboard
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/tls/tlsmanager.go:321 > No default certificate, fallback to the internal generated certificate tlsStoreName=default
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/middlewares/auth/basic_auth.go:33 > Creating middleware entryPointName=web middlewareName=myauth@docker middlewareType=BasicAuth routerName=mydashboard@docker
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/middlewares/observability/middleware.go:33 > Adding tracing to middleware entryPointName=web middlewareName=myauth@docker routerName=mydashboard@docker
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/middlewares/redirect/redirect_scheme.go:29 > Creating middleware entryPointName=web middlewareName=redirect-web-to-websecure@internal middlewareType=RedirectScheme routerName=web-to-websecure@internal
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/middlewares/redirect/redirect_scheme.go:30 > Setting up redirection to https 443 entryPointName=web middlewareName=redirect-web-to-websecure@internal middlewareType=RedirectScheme routerName=web-to-websecure@internal
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/middlewares/recovery/recovery.go:22 > Creating middleware entryPointName=web middlewareName=traefik-internal-recovery middlewareType=Recovery
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/middlewares/auth/basic_auth.go:33 > Creating middleware entryPointName=websecure middlewareName=myauth@docker middlewareType=BasicAuth routerName=websecure-mydashboard@docker
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/middlewares/observability/middleware.go:33 > Adding tracing to middleware entryPointName=websecure middlewareName=myauth@docker routerName=websecure-mydashboard@docker
2024-05-28T07:47:56Z DBG github.com/traefik/traefik/v3/pkg/middlewares/recovery/recovery.go:22 > Creating middleware entryPointName=websecure middlewareName=traefik-internal-recovery middlewareType=Recovery

What do I make up out of this? Waiting for socket-proxy to boot up ± 5 min after each rebuild is not really ideal

Quick update: just did systemctl docker restart temp-docker-compose and then recomposed everything. Now the errors are gone and it immediately boots up. Perhaps that did the trick?

Also noticed another error that I requested too many certificates (5) and have to wait for 32 hours before being able to obtain a new ACME certificate. Assume that's not big error to worry about (probably could've bene avoided by using staging server)?

Yup, use the staging server next time when testing when you don't have a working stack yet.

Make sure to persist your acme.json file using an absolute path in a Docker bind mount on host or in a Docker volume.

Otherwise your might end up without any TLS certs after 5 restarts.

Got everything running now, in socket-proxy log I get the following warning:

30/05/2024
07:01:33
[WARNING] 150/050133 (1) : config : missing timeouts for backend 'docker-events'.
30/05/2024
07:01:33
   | While not properly invalid, you will certainly encounter various problems
30/05/2024
07:01:33
   | with such a configuration. To fix this, please ensure that all following
30/05/2024
07:01:33
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
30/05/2024
07:01:33
[WARNING] 150/050133 (1) : Can't open global server state file '/var/lib/haproxy/server-state': No such file or directory
30/05/2024
07:01:33
[NOTICE] 150/050133 (1) : New worker #1 (12) forked

Here's my socket proxy config:

services:
  socket-proxy:
#    image: lscr.io/linuxserver/socket-proxy:latest
    image: tecnativa/docker-socket-proxy
    container_name: socket-proxy
    environment:
      - LOG_LEVEL=warning
      - CONTAINERS=1
      - EVENTS=1
      - IMAGES=1 # Portainer
      - INFO=1 # Portainer
      - NETWORKS=1
      - PING=1
      - POST=0
      - SERVICES=1
      - TASKS=1 # Portainer
      - VERSION=1
      - VOLUMES=1 # Portainer
    networks:
      - socket
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
#      - $DOCKERDIR/appdata/socket-proxy/proxy.conf:/etc/nginx/proxy.conf
    restart: unless-stopped
#    read_only: true
#    tmpfs:
#      - /run

Not getting a clear answer when searching around. What does it mean? Anything to worry about? Also put the log in debug and the other lines look good, only showing 200s after the warning.

I suggest you create an issue or discussion on the GitHub repository.

ok will do, thx for helping in this matter and the solution shared earlier.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.