Hi
First - acknowledgement: this is the first time I encounter traefik
and - wow, guys, great work! Amazin.
Anyway. I think I got an interesting challange.
In short
I'm looking for a setup where consider an endpoint, a router and two services, one based on a docker instance, and the other is external to the cluster - when the docker runs - the router should prefer it. When the docker is stopped - the router should default to try the external service.
There's a demo repo in the end that does it manually, I'm looking for a smarter traefik
config that can do it automatically.
In details
Consider the following background:
- corporate dev team that works on a complex service mesh of μ-Services
- team is inclusive for Golden-Cage developers (developers that can't work without an IDE)
- developers use a dev-env version of docker-compose with ~50 services, including dbs, queues, and all, but the lion-share are services managed by the team.
- dev team OS are mac and windows (linux is not supported by corporate IT )
Consider the following requirement:
A setup where developers can chose to run any service the team maintain as a process on his developer machine, using the sources from which the docker image in the mesh is built, and the same way they are used to from "home" (i.e IDE, with file-watch restarts, bundlers, whatever). Given they run such a service locally in dev-mode - we need this service to take the place of it's counterpart in the mesh expressed by docker-compose - i.e - consume services from the mesh, and more importantly - be consumed from other services in the mesh.
I discovered that the developer machine is found on the bridged network as host.docker.internal
- which is great. I can use it in traefik
services - so far - so good.
How far did I get
So what I got so far is the following:
- all inter-service communications pass through
traefik
.
traefik
implements the service buss, and is the sole entity that knows what service runs where.
(likek8s ingress
, but without the kubelet and all. Just the routing.) - services in the mesh look for each other using the container name + port.
however, no matter who they ask for - they gettraefik
.
This done using thelinks
section of services indocker-compose
. - The consumed endpoint base URL is also injected using the
environment
section of services indocker-compose
.
when run on developer machine - using baked in defaults ofhttp://localhost:<port>
relaying on the unique ports scheme) - the total config holds an entry-point, a service and a router per node in the mesh.
entry points are identified by ports, which are allocated uniquely across the mesh.
i.e - you know what service you'll get by the port.
(actually, I considered to identify services using host names, and ruled against it.
Intuitively I'll rather thin focused entry-points than a single entry-point with many rules). - when a developer needs to develop a service on his local host - this is what he'll have to do:
- stop the specific docker container of that service in the mesh
- launch the service on his machine as a local process
- update the config behind the file-provider so that the service is routed to his machine using
host.docker.internal:<port>
instead of<container>:<port>
.
All of this is obviously very manual. All relaying on the file provider.
Where do we go from here
I hope to get rid of the last step, relaying on the magnificent dynamic capabilities of traefik
and the docker provider.
I believe I will have to keep the file provider side by side with docker provider - but that's maybe because I don't understand how to use the power in the Docker provider - and will be thankful to be educated here too - but that's not the main goal.
The main goal - is a traefik
configuration with which the developer needs only to stop the docker container, and traefik
would default to serving the service using the process on the developer machine (or return bad gateway if it does not run).
Starting the container again will take precedence again over the developer machine.
But I don't understand my options good enough to be able to do it - and this is where you step in
I understand that router is a function of the request, so the logic would move to some inner hierarchy of services. weighted, load-balanced, prioritized, mirrored - I kinda got lost there... ...I need a point in the right direction.
(I also considered to just have the developer run the service on the developer machine, and have traefik
load-balancers discover it and prefer it - but this will mean a lot of useless health-checks fired on the docker host for all the services, inlcuding ones that should still stay in the mesh. So I ruled against it)
bonus section - informative error pages
A really nice to have bonus layer - is, when the docker is stopped AND the process is not found on dev machine - is to get a the error-page to be more descriptive and tell the URL that did not reply. If we could do that too - I'd be really happy - but that's just a bonus layer. I'm here for the other stuff.