I've been experimenting with ways to try and get better isolation between stacks in my Docker Swarm Currently every service that needs Traefik to route traffic to it has to live on the same network. Thus, service Alpha can lookup dns and see service Gamma.
Not the end of the world, but I like defense in depth, and would like to not allow Alpha and Gamma know the other one exists.
My current attempt is to have multiple instances of Traefik. A global instance that is accessible to the outside world, and then an instance per app stack. The idea being that only the Traefik instances will be able to see each other, and the app containers will not have to share the global Traefik network.
Currently that isn't working, so I thought I'd ask if anyone else has figured something out while I try and work around the problems...
So, anyone have any ideas they'd be willing to share?
The idea is that you have all containers that traefik needs to talk to on a single docker network, then trafficjam will dynamically add iptables rules to prevent all the containers on that network from talking to each other (except traefik, which is whitelisted and can talk to all of them). It works on swarm too, and its a relatively simple bash script so it should be pretty secure.
I've been operating on the idea that you can only attach Traefik to one network, and then you attach your services to that network if you need Traefik to route traffic to them.
But a reread of Traefik Docker Documentation - Traefik makes me wonder if maybe you just don't need to set the main network setting at all, and Traefik will figure it out from your individual service labels.
We have Traefik attached to 10+ own services, all in the same "proxy" Docker network.
Next security improvement will be a docker-socker-proxy, simple self-made, not an unknown 3 year old image from the Internet. Then dedicated non-root user and group. Traefik is the first point of contact and has in my view a high security risk.
If we want to separate networks, my first thought would be 10 Docker networks, one for each service, and attach Traefik to all of them. I assume the Traefik container would not route between the networks. And set docker.network for ever target service in the labels - but that’s probably not even required if the target service has no other networks (like one to DB).
So I did some testing. If you make sure that every stack's individual network is external and added to the Traefik stack/service, then Traefik doesn't need all the apps to share the same network like I thought. So, if you put anet and bnet on Traefik, and anet on stacka, and bnet on stackb, then Traefik will route traffic properly, but containers in stacka won't see containers in stackb.
If you don't put anet and bnet on the Traefik stack, it won't work. I tried.
The one thing I did find that is confusing me is that running Traefik outside of Docker doesn't seem to work. It is supposed to, right? I'm pretty sure it's a misconfiguration on my end, but it just isn't routing traffic from outside the Swarm to any of the containers in the stacks.
I'd like to run it outside of Docker so that it can pick up the new networks and stacks as they come in automatically instead of making me add them to the stack configuration and have to redeploy Traefik every time I add a stack to my Swarm.
Well the idea is that you can use it without knowing a thing about iptables but the rules are deliberately simple nonetheless: just basic drops/returns on subnets/ip's pulled from docker. Feel free to ask any questions if you want to dive in, either here or on github.
I am sure you can set Swarm services to expose their used port on the host with a random port (when running multiple).
Then you can create a script to inspect the stack/services to pull the IPs+ports and create a dynamic config file with routers and services, maybe even with labels (or env) for the URL, to make everything dynamic.
That should just not be done on a cloud VM with public IP - except if you can set a real firewall in front of it to block all ports, so no external party can directly access your internal services.
Never mind. That solves the issue of Traefik not being attached to individual Docker networks. But it does not improve isolation.
I solved this scenario with the following procedure if anyone is interested.
create the overlay network for traefik, let's call it traefik-net
init the swarm and join the other nodes
create a dummy global service (--global flag) with normal traefik flags and attached to the traefik-net.
Create iptables rules in every worker node to avoid inter-container communication in the traefik-net namespace. In modern docker versions the network namespaces are located at /var/run/docker/netns. For example,
NETNS is a variable containing the full path of the traefik-net namespace, for example /var/run/docker/netns/1-abcdefghij and abcdefghij is part of the traefik-net id.
The first and second iptable rules allow traffic from/to the traefik load balancer
The third rule prevent inter-container communication.
The limitation is that if no tasks (containers) are running in one node, the network namespace is destroyed with all iptables rules, but with the global dummy service running almost forever, the chances are very low. (I did use the traefik/whoami image). If you want something automatic to create the iptable rules then try the trafficjam project. (I did take this idea from there, thank you @kayson )
This is pretty much what I'm planning to move to. It's a bit annoying to have to manually add the traefik container to each of the new networks though.