Service Discovery with Multiple Networks Involved

I'm cross posting this as I ran into the same issue with a service other than mattermost. The issue was first posted in the mattermost forum: https://forum.mattermost.org/t/gateway-timout-when-running-mattermost-behind-traefik-proxy/10970

Issue

I got a (in principle) working docker-compose setup with mattermost, postgresql and traefik as reverse proxy which returns a "Gateway Timout" error most of the times when I try to access mattermost but not always. Reachability does not just change between restarts but also while running.

Mostly, people seem to run into this kind of issue by forgetting to specify the respective networks. However, I double checked that and it's not the case (and wouldn't make sense as it works some times).

Versions tested

  • mattermost 5.30.1/5.29.1
  • postgres 13.1/9.4
  • traefik 2.3

UPDATE

I found the problem but no solution though. What I had not checked before were the IPs of the services and it turns out that traefik uses just a random IP if a services has multiple. This is the case whenever a service is involved in multiple networks which is a MUST in my case.

I came across this thread where a user suggests to set traefik.docker.network=service-gateway for all services in order to tell traefik which IP to associate the respective service with. However, it seems to have no effect. Trying to add the project prefix to the network name also does not help.

My Setup

I provide my mattermost configuration as well but since this issue also occurred with another, different service I'm pretty sure it's on traefiks side..

TL;DR Networks

  • traefik reverse proxy
    • service-socket-proxy (internal) for docker proxy
    • service-gateway (external) to communicate with services
  • mattermost
    • service-mattermost (internal) for database communication
    • service-gateway (external) to communicate with traefik
  • other service(s)
    • service-other (internal) for database communication
    • service-gateway (external) to communicate with traefik
  • ...

Reverse Proxy

docker-compose.yml

version: "3"

networks:
  service-socket-proxy:
    external: false
  service-gateway:
    external: true

services:

  socket-proxy:
    image: tecnativa/docker-socket-proxy:latest
    container_name: "socket-proxy"
    restart: unless-stopped
    privileged: yes
    environment:
      CONTAINERS: 1
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - service-socket-proxy

  traefik:
    image: "traefik:v2.3"
    container_name: "traefik"
    depends_on:
      - socket-proxy
    restart: unless-stopped
    privileged: no
    volumes:
      - "./conf:/etc/traefik:ro"
      - "./letsencrypt:/letsencrypt"
    ports:
      - "80:80"
      - "443:443"
    labels:
      - "traefik.enable=true"
      # secure router
      - "traefik.http.routers.traefik.rule=Host(`traefik.example.com, `www.traefik.example.com`)"
      - "traefik.http.routers.traefik.entrypoints=websecure"
      - "traefik.http.routers.traefik.tls.certresolver=LetsEncrypt"
      - "traefik.http.routers.traefik.service=api@internal"
      # middlewares
      - "traefik.http.routers.traefik.middlewares=traefik-auth"
      - "traefik.http.middlewares.traefik-auth.basicauth.users=user:password."
    networks:
      - service-socket-proxy
      - service-gateway

conf/traefik.yml

# Uncomment for Development
log:
  level: DEBUG
  
api:
  dashboard: true
  
providers:
  # Pseudo provider that holds some middlewares that cannot be configured statically
  file:
    filename: "/etc/traefik/dyn.yml"
    watch: true
  # Default docker provider behind socket proxy
  docker:
    network: "service-socket-proxy"
    endpoint: "tcp://socket-proxy:2375"
    exposedByDefault: false
    
entryPoints:
  # HTTP entry point - does nothing but redirecting to HTTPS
  web:
    address: ":80"
    http:
      middlewares:
        - http-redirect@file
  # HTTPS entry point
  websecure:
    address: ":443"
    http:
      middlewares:
        - www-redirect@file

certificatesResolvers:
  LetsEncrypt:
    acme:
      email: "admin@example.com"
      storage: "/letsencrypt/acme.json"
      tlschallenge: {}

conf/dyn.yml

http:
  middlewares:
    # Prune all "www" Prefixes
    www-redirect:
      redirectRegex:
        regex: "^https?://www\\.(.*)"
        replacement: "https://${1}"
        permanent: true
    # Enforce HTTPS
    http-redirect:
      redirectScheme:
        port: "443"
        scheme: https
        permanent: true

Mattermost

docker-compose.yml

version: "3"

networks:
  service-gateway:
    external: true
  service-mattermost:
    external: false

services:

  db:
    build: db
    read_only: true
    container_name: "mattermost-db"
    restart: unless-stopped
    volumes:
      - ./volumes/db/var/lib/postgresql/data:/var/lib/postgresql/data
      - /etc/localtime:/etc/localtime:ro
    environment:
      - POSTGRES_USER=mmuser
      - POSTGRES_PASSWORD=mmuser_password
      - POSTGRES_DB=mattermost
    networks:
      - service-mattermost

  app:
    build:
      context: app
      args:
        - edition=team
        - PUID=1000
        - PGID=1000
    container_name: "mattermost-app"
    restart: unless-stopped
    depends_on:
      - db
    # bypassing traefik by exposing mm directly works just fine
    # ports:
    #  - "8080:8000"
    volumes:
      - ./volumes/app/mattermost/config:/mattermost/config:rw
      - ./volumes/app/mattermost/data:/mattermost/data:rw
      - ./volumes/app/mattermost/logs:/mattermost/logs:rw
      - ./volumes/app/mattermost/plugins:/mattermost/plugins:rw
      - ./volumes/app/mattermost/client-plugins:/mattermost/client/plugins:rw
      - /etc/localtime:/etc/localtime:ro
    environment:
      - MM_USERNAME=mmuser
      - MM_PASSWORD=mmuser_password
      - MM_DBNAME=mattermost
      - MM_SQLSETTINGS_DATASOURCE=postgres://mmuser:mmuser_password@db:5432/mattermost?sslmode=disable&connect_timeout=10
    labels:
      - "traefik.enable=true"
      # insecure router
      - "traefik.http.routers.mm-router.rule=Host(`mattermost.example.com``, `www.mattermost.example.com``)"
      - "traefik.http.routers.mm-router.entrypoints=web"
      # secure router
      - "traefik.http.routers.mm-router-sec.rule=Host(`mattermost.example.com``, `www.mattermost.example.com`)"
      - "traefik.http.routers.mm-router-sec.entrypoints=websecure"
      - "traefik.http.routers.mm-router-sec.tls.certresolver=LetsEncrypt"
    networks:
      - service-mattermost
      - service-gateway

app/Dockerfile

FROM alpine:3.10

# Some ENV variables
ENV PATH="/mattermost/bin:${PATH}"
ENV MM_VERSION=5.30.1

# Build argument to set Mattermost edition
ARG edition=enterprise
ARG PUID=2000
ARG PGID=2000
ARG MM_BINARY=


# Install some needed packages
RUN apk add --no-cache \
	ca-certificates \
	curl \
	jq \
	libc6-compat \
	libffi-dev \
    libcap \
	linux-headers \
	mailcap \
	netcat-openbsd \
	xmlsec-dev \
	tzdata \
	&& rm -rf /tmp/*

# Get Mattermost
RUN mkdir -p /mattermost/data /mattermost/plugins /mattermost/client/plugins \
    && if [ ! -z "$MM_BINARY" ]; then curl $MM_BINARY | tar -xvz ; \
      elif [ "$edition" = "team" ] ; then curl https://releases.mattermost.com/$MM_VERSION/mattermost-team-$MM_VERSION-linux-amd64.tar.gz?src=docker-app | tar -xvz ; \
      else curl https://releases.mattermost.com/$MM_VERSION/mattermost-$MM_VERSION-linux-amd64.tar.gz?src=docker-app | tar -xvz ; fi \
    && cp /mattermost/config/config.json /config.json.save \
    && rm -rf /mattermost/config/config.json \
    && addgroup -g ${PGID} mattermost \
    && adduser -D -u ${PUID} -G mattermost -h /mattermost -D mattermost \
    && chown -R mattermost:mattermost /mattermost /config.json.save /mattermost/plugins /mattermost/client/plugins \
    && setcap cap_net_bind_service=+ep /mattermost/bin/mattermost

USER mattermost

# Healthcheck to make sure container is ready
HEALTHCHECK CMD curl --fail http://localhost:8000 || exit 1

# Configure entrypoint and command
COPY entrypoint.sh /
ENTRYPOINT ["/entrypoint.sh"]
WORKDIR /mattermost
CMD ["mattermost"]

# Expose port 8000 of the container
EXPOSE 8000

# Declare volumes for mount point directories
VOLUME ["/mattermost/data", "/mattermost/logs", "/mattermost/config", "/mattermost/plugins", "/mattermost/client/plugins"]

app/entrypoint.sh

#!/bin/sh

# Function to generate a random salt
generate_salt() {
  tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 48 | head -n 1
}

# Read environment variables or set default values
DB_HOST=${DB_HOST:-db}
DB_PORT_NUMBER=${DB_PORT_NUMBER:-5432}
MM_DBNAME=${MM_DBNAME:-mattermost}
MM_CONFIG=${MM_CONFIG:-/mattermost/config/config.json}

if [ "${1:0:1}" = '-' ]; then
    set -- mattermost "$@"
fi

if [ "$1" = 'mattermost' ]; then
  # Check CLI args for a -config option
  for ARG in $@;
  do
      case "$ARG" in
          -config=*)
              MM_CONFIG=${ARG#*=};;
      esac
  done

  if [ ! -f $MM_CONFIG ]
  then
    # If there is no configuration file, create it with some default values
    echo "No configuration file" $MM_CONFIG
    echo "Creating a new one"
    # Copy default configuration file
    cp /config.json.save $MM_CONFIG
    # Substitue some parameters with jq
    jq '.ServiceSettings.ListenAddress = ":8000"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.LogSettings.EnableConsole = true' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.LogSettings.ConsoleLevel = "ERROR"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.FileSettings.Directory = "/mattermost/data/"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.FileSettings.EnablePublicLink = true' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.FileSettings.PublicLinkSalt = "'$(generate_salt)'"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.EmailSettings.SendEmailNotifications = false' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.EmailSettings.FeedbackEmail = ""' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.EmailSettings.SMTPServer = ""' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.EmailSettings.SMTPPort = ""' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.EmailSettings.InviteSalt = "'$(generate_salt)'"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.EmailSettings.PasswordResetSalt = "'$(generate_salt)'"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.RateLimitSettings.Enable = true' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.SqlSettings.DriverName = "postgres"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.SqlSettings.AtRestEncryptKey = "'$(generate_salt)'"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
    jq '.PluginSettings.Directory = "/mattermost/plugins/"' $MM_CONFIG > $MM_CONFIG.tmp && mv $MM_CONFIG.tmp $MM_CONFIG
  else
    echo "Using existing config file" $MM_CONFIG
  fi

  # Configure database access
  if [[ -z "$MM_SQLSETTINGS_DATASOURCE" && ! -z "$MM_USERNAME" && ! -z "$MM_PASSWORD" ]]
  then
    echo -ne "Configure database connection..."
    # URLEncode the password, allowing for special characters
    ENCODED_PASSWORD=$(printf %s $MM_PASSWORD | jq -s -R -r @uri)
    export MM_SQLSETTINGS_DATASOURCE="postgres://$MM_USERNAME:$ENCODED_PASSWORD@$DB_HOST:$DB_PORT_NUMBER/$MM_DBNAME?sslmode=disable&connect_timeout=10"
    echo OK
  else
    echo "Using existing database connection"
  fi

  # Wait another second for the database to be properly started.
  # Necessary to avoid "panic: Failed to open sql connection pq: the database system is starting up"
  sleep 1

  echo "Starting mattermost"
fi

exec "$@"

db/Dockerfile

FROM postgres:13.1-alpine

ENV DEFAULT_TIMEZONE UTC

# update packages
RUN apk upgrade --no-cache && rm -rf /var/cache/apk/* /tmp/* /var/tmp/*

#Healthcheck to make sure container is ready
HEALTHCHECK CMD pg_isready -U $POSTGRES_USER -d $POSTGRES_DB || exit 1

# Add and configure entrypoint and command
COPY entrypoint.sh /
ENTRYPOINT ["/entrypoint.sh"]
CMD ["postgres"]

VOLUME ["/var/run/postgresql", "/usr/share/postgresql/", "/var/lib/postgresql/data", "/tmp"]

db/entrypoint.sh

#!/bin/bash

function update_conf () {
  # PGDATA is defined in upstream postgres dockerfile
  config_file=$PGDATA/postgresql.conf

  # Check if configuration file exists. If not, it probably means that database is not initialized yet
  if [ ! -f $config_file ]; then
    return
  fi

  # Reinitialize config
  sed -i "s/log_timezone =.*$//g" $config_file
  sed -i "s/timezone =.*$//g" $config_file

  echo "log_timezone = $DEFAULT_TIMEZONE" >> $config_file
  echo "timezone = $DEFAULT_TIMEZONE" >> $config_file
}

if [ "${1:0:1}" = '-' ]; then
  set -- postgres "$@"
fi

if [ "$1" = 'postgres' ]; then
  # Update postgresql configuration
  update_conf

  # Run the postgresql entrypoint
  docker-entrypoint.sh postgres
fi

Thanks in advance!

For a specific service you can use treafik.docker.network label to specify the docker network. This is the network name as seen from docker network ls NOT the shortname as you see in compose.

Or you can set a default one at the provider level.

I usually do this at the service level, as I don't have many services that use two networks that also are fronted by traefik.

Well, I had this solution already, I was just writing it to the wrong config files without noticing :sweat_smile:.

However, I find this a bad pattern to choose a random network silently rather than failing with a respective errors in such ambiguous cases. Sure, one could notice the different subnets displayed in the dashboard but it can be overseen easily also.

Whatever, now it works like a charm :slight_smile:

Thank you!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.