Use Traefik as service bus for Development Setup with docker-compose


First - acknowledgement: this is the first time I encounter traefik and - wow, guys, great work! Amazin.

Anyway. I think I got an interesting challange.

In short

I'm looking for a setup where consider an endpoint, a router and two services, one based on a docker instance, and the other is external to the cluster - when the docker runs - the router should prefer it. When the docker is stopped - the router should default to try the external service.

There's a demo repo in the end that does it manually, I'm looking for a smarter traefik config that can do it automatically.

In details

Consider the following background:

  • corporate dev team that works on a complex service mesh of μ-Services
  • team is inclusive for Golden-Cage developers (developers that can't work without an IDE)
  • developers use a dev-env version of docker-compose with ~50 services, including dbs, queues, and all, but the lion-share are services managed by the team.
  • dev team OS are mac and windows (linux is not supported by corporate IT :frowning:)

Consider the following requirement:

A setup where developers can chose to run any service the team maintain as a process on his developer machine, using the sources from which the docker image in the mesh is built, and the same way they are used to from "home" (i.e IDE, with file-watch restarts, bundlers, whatever). Given they run such a service locally in dev-mode - we need this service to take the place of it's counterpart in the mesh expressed by docker-compose - i.e - consume services from the mesh, and more importantly - be consumed from other services in the mesh.

I discovered that the developer machine is found on the bridged network as host.docker.internal - which is great. I can use it in traefik services - so far - so good.

How far did I get

So what I got so far is the following:

  • all inter-service communications pass through traefik.
    traefik implements the service buss, and is the sole entity that knows what service runs where.
    (like k8s ingress, but without the kubelet and all. Just the routing.)
  • services in the mesh look for each other using the container name + port.
    however, no matter who they ask for - they get traefik.
    This done using the links section of services in docker-compose.
  • The consumed endpoint base URL is also injected using the environment section of services in docker-compose.
    when run on developer machine - using baked in defaults of http://localhost:<port> relaying on the unique ports scheme)
  • the total config holds an entry-point, a service and a router per node in the mesh.
    entry points are identified by ports, which are allocated uniquely across the mesh.
    i.e - you know what service you'll get by the port.
    (actually, I considered to identify services using host names, and ruled against it.
    Intuitively I'll rather thin focused entry-points than a single entry-point with many rules).
  • when a developer needs to develop a service on his local host - this is what he'll have to do:
    • stop the specific docker container of that service in the mesh
    • launch the service on his machine as a local process
    • update the config behind the file-provider so that the service is routed to his machine using host.docker.internal:<port> instead of <container>:<port>.

All of this is obviously very manual. All relaying on the file provider.

Where do we go from here

I hope to get rid of the last step, relaying on the magnificent dynamic capabilities of traefik and the docker provider.
I believe I will have to keep the file provider side by side with docker provider - but that's maybe because I don't understand how to use the power in the Docker provider - and will be thankful to be educated here too - but that's not the main goal.

The main goal - is a traefik configuration with which the developer needs only to stop the docker container, and traefik would default to serving the service using the process on the developer machine (or return bad gateway if it does not run).
Starting the container again will take precedence again over the developer machine.

But I don't understand my options good enough to be able to do it - and this is where you step in :slight_smile:

I understand that router is a function of the request, so the logic would move to some inner hierarchy of services. weighted, load-balanced, prioritized, mirrored - I kinda got lost there... ...I need a point in the right direction.

(I also considered to just have the developer run the service on the developer machine, and have traefik load-balancers discover it and prefer it - but this will mean a lot of useless health-checks fired on the docker host for all the services, inlcuding ones that should still stay in the mesh. So I ruled against it)

bonus section - informative error pages

A really nice to have bonus layer - is, when the docker is stopped AND the process is not found on dev machine - is to get a the error-page to be more descriptive and tell the URL that did not reply. If we could do that too - I'd be really happy - but that's just a bonus layer. I'm here for the other stuff.

I prepared this repo with an isolated demo of everything I got so far:

I hope it will help us use it to discuss and/or demonstrate more concrete cases and config samples :slight_smile:

After getting over my fixation, I saw I was looking in the wrong place.

I found two ways to do that, I'd appreciate a honest opinion about them.

Way one: with a service sending to the localhost loopback.

Although this way looks cleaner in the diagram - I have a gut feeling it's more dirty on the network.

Way Two: add a router per endpoint.

At least, by theory.

Once I get it working I'll commit this to the repo. If I get both ways working - it will be two different branches.


It worked!

I committed to master the solution based on a dynamic router per endpoint that uses that service, I may be convinced to try the other too if experts here will advice so, but for now I go with my gut feelings :slight_smile:

Any expert comment will be appreciated

M.. note an error in the diagrams - port 3000 is bound to traefik, and therefore svc (lc) is bound to another port. As a result, svcX@file points to that port.

Sorry I missed that - but the forum system won't let me fix it anymore.