Traefik 1.7.12 exiting unexpectedly - swarm mode

I’m using traefik 1.7.12 with no .toml file, just command line parameters and docker labels. We’re using swarm on AWS with cloudwatch logging. Several times the traefik container is stopping , with no log output, and either exit code 0, or exit code 137. Inspecting the stopped container shows OOMKilled: false , there are sufficient resources on the swarm managers. Anyone see this? What could be the problem?

Running traefik as a docker service with the following command line options to the container:

              "Args": [
                "--docker",
                "--docker.swarmmode",
                "--docker.domain=traefik",
                "--docker.watch",
                "--docker.exposedbydefault=false",
                "--web"
            ],

And the labels for the containers:

            "traefik.docker.network": "traefiknet",
            "traefik.enable": "true",
            "traefik.frontend.entryPoints": "http",
            "traefik.frontend.rule": "Host:someprefix.somedomain.io",
            "traefik.port": "80"

The configuration works just fine, everything is correctly routed.

It’s just that maybe once a week (but we did have 3 in a week recently) the traefik process will exit with either (0) or (137).

There is nothing in cloudwatch [docker log] for the traefik daemon.

I know this isn’t much to go on, but would appreciate some help in what to investigate next. Thanks.

1 Like

Hi @Harper, when you say There is nothing in cloudwatch [docker log] for the traefik daemon.,
do you mean "zero line of logs", or do you mean "some logs, but nothing related to the error" ?

Also, can you try one of the following options to check if you can get more information?

Finally, if you feel that there might be a resource issue on your swarm managers:

  • What are the limits in memory and CPU (or any other kind) applied to your Traefik's container?
  • You might want to run Traefik on a worker node, and use a "docker socket" forwarder insiode an encrypted network. Example here: https://gist.github.com/dduportal/fe6f07f447e2a88b302b376e36aba934
  • Do you have a monitoring stack of your metrics? It could be interesting to watch the resources usage at machine level (cadvisor/grafana/prometheus might be a good start here if not the case).

Let us know!