Pingable docker socket proxy (falsely?) reports as unresolvable at times

I am trying to harden my environment some and having some strange behavior I am hoping someone can help me understand.

I am incorporating a docker socket proxy in my environment. My dev portion is a 3 node Swarm with two Leaders.

I had it working over all services that needed the socket (3 I think) but when progressing I noticed that Traefik would sporadically report it could not resolve the proxy's address... and changed the def from a tcp:// protocol to http://. The (in)specific error is as follows:

Provider connection error error during connect: Get \"http://dockerproxy:2375/v1.24/version\": dial tcp: lookup dockerproxy on 127.0.0.11:53: no such host, retrying in 610.626583ms

Despite the retry this will not work again.

Strangely if I open an interactive shell in Traefik, I can ping dockerproxy by name and it does resolve. More strange than that, to troubleshoot I created a config that only had Traefik and the socket proxy and tested. When having this issue.... I can actually see activity (assumed from Traefik since it is the only running service) over the socket...


time=2025-05-28T13:09:37.430Z level=INFO msg="socket-proxy running and listening..."
time=2025-05-28T13:09:37.430Z level=DEBUG msg="watchdog running"
time=2025-05-28T13:09:37.786Z level=DEBUG msg="allowed request" method=GET URL=/v1.24/version client=10.0.4.4:50798
time=2025-05-28T13:09:37.792Z level=DEBUG msg="allowed request" method=GET URL=/v1.24/services client=10.0.4.4:50798
time=2025-05-28T13:09:37.794Z level=DEBUG msg="allowed request" method=GET URL=/v1.24/version client=10.0.4.4:50798
time=2025-05-28T13:09:37.800Z level=DEBUG msg="allowed request" method=GET URL="/v1.24/networks?filters=%7B%22scope%22%3A%7B%22swarm%22%3Atrue%7D%7D" client=10.0.4.4:50798
time=2025-05-28T13:09:37.802Z level=DEBUG msg="allowed request" method=GET URL="/v1.24/tasks?filters=%7B%22desired-state%22%3A%7B%22running%22%3Atrue%7D%2C%22service%22%3A%7B%22dpz9mw28zcmu882ztc166l0eq%22%3Atrue%7D%7D" client=10.0.4.4:50798
time=2025-05-28T13:09:37.803Z level=DEBUG msg="allowed request" method=GET URL="/v1.24/tasks?filters=%7B%22desired-state%22%3A%7B%22running%22%3Atrue%7D%2C%22service%22%3A%7B%22jbva8z9glfm3ez231xq7fyom9%22%3Atrue%7D%7D" client=10.0.4.4:50798
time=2025-05-28T13:09:52.806Z level=DEBUG msg="allowed request" method=GET URL=/v1.24/services client=10.0.4.4:50798
time=2025-05-28T13:09:52.807Z level=DEBUG msg="allowed request" method=GET URL=/v1.24/version client=10.0.4.4:50798
time=2025-05-28T13:09:52.813Z level=DEBUG msg="allowed request" method=GET URL="/v1.24/networks?filters=%7B%22scope%22%3A%7B%22swarm%22%3Atrue%7D%7D" client=10.0.4.4:50798
time=2025-05-28T13:09:52.815Z level=DEBUG msg="allowed request" method=GET URL="/v1.24/tasks?filters=%7B%22desired-state%22%3A%7B%22running%22%3Atrue%7D%2C%22service%22%3A%7B%22dpz9mw28zcmu882ztc166l0eq%22%3Atrue%7D%7D" client=10.0.4.4:50798
time=2025-05-28T13:09:52.816Z level=DEBUG msg="allowed request" method=GET URL="/v1.24/tasks?filters=%7B%22desired-state%22%3A%7B%22running%22%3Atrue%7D%2C%22service%22%3A%7B%22jbva8z9glfm3ez231xq7fyom9%22%3Atrue%7D%7D" client=10.0.4.4:50798

So it is actually resolving and communicating... just reporting it is not... sometimes. Cycling the stack sporadically reproduces the issue.

I thought maybe this could be because Traefik and the proxy socket are not on the same node, but experimenting with that theory disproved it. I set the proxy to run globally and although that seemed to reduce the issue, I still found the issue occurring.

I am super confused and this reads to me like an issue in Traefik's reporting of service resolution. Any ideas? TiA

Also practicing improved security with my own Docker socket proxy (repo). Sometimes restarted/recreated Docker target services/containers are not recognized.

Restarting Traefik, proxy or target service helps in that case, but is not an ideal solution.

Thanks for the response. For me, this is not even a solution but a workaround, and a less than ideal one at that.

Do you report the same behavior? Is the protocol for the configured socket target in Traefik changed from tcp:// to http:// ? Does it report it is not resolvable? Can you ping the same name from an interactive shell with Traefik?

To me, this is like being asked to restart your computer up to three times checking if it is fixed for an application issue. How do you propose monitoring for failure and resolution? I was going to have Prometheus do its thing like I do with other services, but I was not sure I wanted it on my socket network.

I found here is another instance of this with another socket proxy mirroring my experience:

Is this is known issue that has been accepted as such yet? I see guides on how to make this config on docker and traefik's site, so it must be a supported and even desirable config from a security lens