502 Bad Gateway (SOLVED)

I just set up the same solution we are running on some other data centers without any issues.
However, all the services just return "Bad Gateway" in the browser.

This is a docker swarm network, and Traefik is running on the swarm manager.
We are running Traefik v2.0.4
Docker is running version 19.03.4

Traefik is launched through a stack with the following configuration:

version: '3.7'
services:
  traefik:
    image: traefik:v2.0.2
    command: 
      - "--api=true" 
      - "--api.insecure=true"
      - "--api.dashboard=true"
      - "--ping"
      - "--providers.docker=true"
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.network=traefik-public"
      - "--providers.docker.swarmMode=true"
      - "--entrypoints.web.address=:80"
      - "--accesslog=true"
      - "--accesslog.bufferingsize=100"
      - "--accesslog.filepath=/var/log/traefik.log"
      - "--metrics.prometheus=true"
      - "--entryPoints.metrics.address=:8082"
      - "--metrics.prometheus.entryPoint=metrics"
      - "--metrics.prometheus.buckets=0.1,0.3,1.2,5.0"
      - "--tracing.jaeger=true"
      - "--tracing.jaeger.samplingServerURL=http://localhost:5778/sampling"
      - "--tracing.jaeger.samplingType=const"
      - "--tracing.jaeger.samplingParam=1.0"
      - "--tracing.jaeger.localAgentHostPort=127.0.0.1:6831"
      - "--tracing.jaeger.propagation=jaeger"
      - "--tracing.jaeger.traceContextHeaderName=uber-trace-id"
    ports:
      - 80:80 
      - 8081:8080
      - 8082:8082
    networks:
      - monitor-net
      - traefik-public
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/log/traefik.log:/var/log/traefik.log
    deploy:
      placement:
        constraints:
          - node.role == manager
    labels:
      - "traefik.docker.network=traefik-public"          
networks:
  monitor-net:
  traefik-public:
    driver: overlay
    external: true

And then a service is added through another stack with the following configuration. This is running on a docker node in the swarm:

version: '3.7'
services:
  backend:
    image: enevodocker/enevoleadsbe:production
    ports: 
      - 5000
    environment:
      - SERVICE_PORTS=5000
      - ASPNETCORE_ENVIRONMENT=Production
    deploy:
      labels:
        - "traefik.http.routers.byndlebackendrouter.rule=Host(`api.byndle.no`)"
        - "traefik.http.services.byndlebackendservice.loadbalancer.server.port=5000"
        - "traefik.http.services.byndlebackendservice.loadbalancer.sticky=true"
        - "traefik.http.services.byndlebackendservice.loadbalancer.sticky.cookie.httponly=false"
        - "traefik.http.services.byndlebackendservice.loadbalancer.sticky.cookie.name=bndlstickyroute"
        - "traefik.http.services.byndlebackendservice.loadbalancer.sticky.cookie.secure=false"
      replicas: 6 
      update_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: on-failure
        max_attempts: 2
        window: 60s
      placement:
        constraints: [node.role != manager]
    networks:
      - web
      - traefik-public
    volumes:
      - type: volume
        source: leads
        target: /leads
        volume:
          nocopy: true      
  frontend:
    image: enevodocker/enevoleadsfe:production
    ports:
      - 80
    environment:
      - SERVICE_PORTS=80
    deploy:
      labels:
        - "traefik.http.routers.portalproxy.rule=Host(`portal.byndle.no`)"
        - "traefik.http.services.portalservice.loadbalancer.server.port=80"
        - "traefik.docker.network=traefik-public"
      replicas: 4
      update_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: on-failure
        max_attempts: 2
        window: 60s
      placement:
        constraints: [node.role != manager]
    networks:
      - web
      - traefik-public
    labels:
      - "traefik.docker.network=traefik-public"         
volumes:
  leads:
    driver: nfs
    driver_opts:
      share: 10.47.2.20:/diskstation/byndle
  redis-data:
  solrdata:
    driver: nfs
    driver_opts:
      share: 10.47.2.20:/diskstation/byndle/solr
networks:
  web:
    driver: overlay
    external: false
  traefik-public:
    driver: overlay
    external: true

When I inspect the traefik container, as well ad the frontend container, both are connected to the traefik-public networks.

Now, if I visit http://myportal.no it just returns "bad gateway".

I am now stuck in debugging this, and can't find anything to go further with.
I have tried deleting the network, adding it again, deleting the stacks and adding them again and so on.

Anyone with any further suggestions?

Hi @ole, can you update your post with the following elements:

Also, can you check the name of the networks with the command line docker network ls? As docker-compose file syntax use a network naming which is not striclty the "real name" of the networks in the end (generally the final name is prefixed by <stack name>_) ?

Updated the original question now. Also forgot to mention that we use these exact same compose-files to spin it up on other swarm environments without issues. And on the traefik dashboard, they all look fine. Its only when accessing the url we get the error of "bad gateway"

Thanks @ole . Your configuration looks fine for the Traefik part. The "Bad Gateway" error is clearly related to networks. You can verify this by accessing to the docker engine of the swarm node where your Traefik container is loacted, and run a docker exec -ti <container id> sh to spawn an interactive commandline. From there, install curl with apk add --no-cache curl and try to access the backend container with curl -v http://<service ip><service port> and see if it works or not. If it works, then Traefik is culprit and we should search for a configuration issue that I missed. If it does not works, then the issue lies within swarm network.

However, can you check the name of the networks that are really used with the command docker network ls ?

Based on https://docs.docker.com/compose/compose-file/#configs-configuration-reference:

  • The directive driver: overlay is not taken in account: you can remove it, as the network is created outside the stacks.
  • The name traefik-public is only a reference inside the docker-compose, for docker-compose directives. It's not the real name that Traefik expects. My guess is that, on your environments where it's working, the newtork had been created with this exact name, but not on the one where it fails.

I've got this figured down to a networking issue now after I enabled loglevel DEBUG.

In the traefik log I could see level=debug msg="'502 Bad Gateway' caused by: dial tcp 10.0.7.24.80: connect: no route to host"

The network names are exactly the same and share the same id's on the swarm manager and worker node

And then it got solved. By taking down the swarm manager, and creating the swarm again.
I think it might have happened as I changed the hostname of the swarm manager after the swarm was created.

1 Like

For others still stuck : disable the firewall and restart docker

2 Likes

Hi,
I have the same issue and I would like to recreate the swarm. However, can you confirm that if I have to recreate the swarm, I should re-register all the services that are currently running on the manager?

Would a simple docker swarm leave --force and a docker swarm init on the manager be enough?

Thanks for your help.