502 Bad Gateway (SOLVED)

ole · November 14, 2019, 4:09pm

I just set up the same solution we are running on some other data centers without any issues.
However, all the services just return "Bad Gateway" in the browser.

This is a docker swarm network, and Traefik is running on the swarm manager.
We are running Traefik v2.0.4
Docker is running version 19.03.4

Traefik is launched through a stack with the following configuration:

version: '3.7'
services:
  traefik:
    image: traefik:v2.0.2
    command: 
      - "--api=true" 
      - "--api.insecure=true"
      - "--api.dashboard=true"
      - "--ping"
      - "--providers.docker=true"
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.network=traefik-public"
      - "--providers.docker.swarmMode=true"
      - "--entrypoints.web.address=:80"
      - "--accesslog=true"
      - "--accesslog.bufferingsize=100"
      - "--accesslog.filepath=/var/log/traefik.log"
      - "--metrics.prometheus=true"
      - "--entryPoints.metrics.address=:8082"
      - "--metrics.prometheus.entryPoint=metrics"
      - "--metrics.prometheus.buckets=0.1,0.3,1.2,5.0"
      - "--tracing.jaeger=true"
      - "--tracing.jaeger.samplingServerURL=http://localhost:5778/sampling"
      - "--tracing.jaeger.samplingType=const"
      - "--tracing.jaeger.samplingParam=1.0"
      - "--tracing.jaeger.localAgentHostPort=127.0.0.1:6831"
      - "--tracing.jaeger.propagation=jaeger"
      - "--tracing.jaeger.traceContextHeaderName=uber-trace-id"
    ports:
      - 80:80 
      - 8081:8080
      - 8082:8082
    networks:
      - monitor-net
      - traefik-public
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/log/traefik.log:/var/log/traefik.log
    deploy:
      placement:
        constraints:
          - node.role == manager
    labels:
      - "traefik.docker.network=traefik-public"          
networks:
  monitor-net:
  traefik-public:
    driver: overlay
    external: true

And then a service is added through another stack with the following configuration. This is running on a docker node in the swarm:

version: '3.7'
services:
  backend:
    image: enevodocker/enevoleadsbe:production
    ports: 
      - 5000
    environment:
      - SERVICE_PORTS=5000
      - ASPNETCORE_ENVIRONMENT=Production
    deploy:
      labels:
        - "traefik.http.routers.byndlebackendrouter.rule=Host(`api.byndle.no`)"
        - "traefik.http.services.byndlebackendservice.loadbalancer.server.port=5000"
        - "traefik.http.services.byndlebackendservice.loadbalancer.sticky=true"
        - "traefik.http.services.byndlebackendservice.loadbalancer.sticky.cookie.httponly=false"
        - "traefik.http.services.byndlebackendservice.loadbalancer.sticky.cookie.name=bndlstickyroute"
        - "traefik.http.services.byndlebackendservice.loadbalancer.sticky.cookie.secure=false"
      replicas: 6 
      update_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: on-failure
        max_attempts: 2
        window: 60s
      placement:
        constraints: [node.role != manager]
    networks:
      - web
      - traefik-public
    volumes:
      - type: volume
        source: leads
        target: /leads
        volume:
          nocopy: true      
  frontend:
    image: enevodocker/enevoleadsfe:production
    ports:
      - 80
    environment:
      - SERVICE_PORTS=80
    deploy:
      labels:
        - "traefik.http.routers.portalproxy.rule=Host(`portal.byndle.no`)"
        - "traefik.http.services.portalservice.loadbalancer.server.port=80"
        - "traefik.docker.network=traefik-public"
      replicas: 4
      update_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: on-failure
        max_attempts: 2
        window: 60s
      placement:
        constraints: [node.role != manager]
    networks:
      - web
      - traefik-public
    labels:
      - "traefik.docker.network=traefik-public"         
volumes:
  leads:
    driver: nfs
    driver_opts:
      share: 10.47.2.20:/diskstation/byndle
  redis-data:
  solrdata:
    driver: nfs
    driver_opts:
      share: 10.47.2.20:/diskstation/byndle/solr
networks:
  web:
    driver: overlay
    external: false
  traefik-public:
    driver: overlay
    external: true

When I inspect the traefik container, as well ad the frontend container, both are connected to the traefik-public networks.

Now, if I visit http://myportal.no it just returns "bad gateway".

I am now stuck in debugging this, and can't find anything to go further with.
I have tried deleting the network, adding it again, deleting the stacks and adding them again and so on.

Anyone with any further suggestions?

dduportal · November 14, 2019, 4:23pm

Hi @ole, can you update your post with the following elements:

Use the "code snippet syntax" in Discourse to have the YAML rendered as code (see. https://discourse.stonehearth.net/t/discourse-guide-code-formatting/30587)
Update the Traefik stack YML definition, as it looks incomplets (I do not see the directive service: for instance).

Also, can you check the name of the networks with the command line docker network ls? As docker-compose file syntax use a network naming which is not striclty the "real name" of the networks in the end (generally the final name is prefixed by <stack name>_) ?

ole · November 14, 2019, 4:26pm

Updated the original question now. Also forgot to mention that we use these exact same compose-files to spin it up on other swarm environments without issues. And on the traefik dashboard, they all look fine. Its only when accessing the url we get the error of "bad gateway"

dduportal · November 14, 2019, 4:39pm

Thanks @ole . Your configuration looks fine for the Traefik part. The "Bad Gateway" error is clearly related to networks. You can verify this by accessing to the docker engine of the swarm node where your Traefik container is loacted, and run a docker exec -ti <container id> sh to spawn an interactive commandline. From there, install curl with apk add --no-cache curl and try to access the backend container with curl -v http://<service ip><service port> and see if it works or not. If it works, then Traefik is culprit and we should search for a configuration issue that I missed. If it does not works, then the issue lies within swarm network.

However, can you check the name of the networks that are really used with the command docker network ls ?

Based on https://docs.docker.com/compose/compose-file/#configs-configuration-reference:

The directive driver: overlay is not taken in account: you can remove it, as the network is created outside the stacks.
The name traefik-public is only a reference inside the docker-compose, for docker-compose directives. It's not the real name that Traefik expects. My guess is that, on your environments where it's working, the newtork had been created with this exact name, but not on the one where it fails.

ole · November 14, 2019, 4:51pm

I've got this figured down to a networking issue now after I enabled loglevel DEBUG.

In the traefik log I could see level=debug msg="'502 Bad Gateway' caused by: dial tcp 10.0.7.24.80: connect: no route to host"

The network names are exactly the same and share the same id's on the swarm manager and worker node

ole · November 14, 2019, 5:11pm

And then it got solved. By taking down the swarm manager, and creating the swarm again.
I think it might have happened as I changed the hostname of the swarm manager after the swarm was created.

mullerch · May 6, 2020, 1:19pm

For others still stuck : disable the firewall and restart docker

janoyolo · March 19, 2021, 4:55am

Hi,
I have the same issue and I would like to recreate the swarm. However, can you confirm that if I have to recreate the swarm, I should re-register all the services that are currently running on the manager?

Would a simple docker swarm leave --force and a docker swarm init on the manager be enough?

Thanks for your help.

josegpulido · February 2, 2024, 6:24pm

I'm late to this post. But yes, just leave the Swarm on all nodes, including the manager ones with --force flag. This will inactive Swarm Mode. Init the Swarm again fixed my connectivity problem. I was facing the same issue.

system · February 5, 2024, 6:24pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Docker Swam 502 bad gateway error Traefik v2 docker , docker-swarm	11	1619	February 27, 2020
Bad gateway traefikv2.2 docker swarm Traefik v2 docker-swarm	2	756	April 30, 2020
Dashboard gives me an error (Bad gateway) Traefik v2 docker-swarm , dashboard-api	4	4417	September 26, 2019
Bad Gateway error Traefik v2 docker-swarm	5	1500	August 15, 2023
Got 502 bad gateway with Traefik and Docker swarm Traefik v2 docker , docker-swarm	5	214	October 23, 2024

502 Bad Gateway (SOLVED)

Related topics