Marathon and traefik 2.1 not reading /etc/hosts of container resulting in 504 Gateway timeout

Bug

What did you do?

I have deployed the following app on marathon:

{
  "id": "/whoami",
  "cpus": 0.1,
  "mem": 256.0,
  "instances": 3,
  "labels": {
    "traefik.enable": "true",
    "traefik.http.routers.whoami.rule": "Host(`traefik-testing.mydomain.com`)",
    "traefik.http.routers.whoami.entrypoints": "web-secure",
    "traefik.http.routers.whoami.tls.certresolver": "letsencryptStaging",
    "traefik.http.routers.whoami.tls.domains[0].main": "traefik-testing.mydomain.com",
    "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme": "https",
    "traefik.http.routers.redirs.rule": "hostregexp(`{host:.+}`)",
    "traefik.http.routers.redirs.entrypoints": "web",
    "traefik.http.routers.redirs.middlewares": "redirect-to-https"
  },
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "containous/whoami",
      "network": "BRIDGE",
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 0,
          "name": "http-api",
          "protocol": "tcp"
        }
      ]
    },
    "volumes": [
    ]
  },
  "readinessChecks": [
    {
      "name": "readinessCheck",
      "protocol": "HTTP",
      "path": "/",
      "portName": "http-api",
      "intervalSeconds": 30,
      "timeoutSeconds": 10,
      "httpStatusCodesForReady": [200],
      "preserveLastResponse": false
    }
  ],
  "healthChecks": [
    {
      "portIndex": 0,
      "protocol": "TCP",
      "gracePeriodSeconds": 30,
      "intervalSeconds": 10,
      "timeoutSeconds": 30,
      "maxConsecutiveFailures": 3
    },
    {
      "path": "/",
      "portIndex": 0,
      "protocol": "HTTP",
      "gracePeriodSeconds": 30,
      "intervalSeconds": 10,
      "timeoutSeconds": 30,
      "maxConsecutiveFailures": 3
    }
  ]
}

This is the docker-compose.yml used to start the traefik container on localhost, that can reach marathon on the LAN through http://192.168.0.22:8080:

version: '3'

services:
  reverse-proxy:
    image: traefik:v2.1
    network_mode: "host"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "./traefik.yaml:/etc/traefik/traefik.yaml"
      - "./letsencrypt:/letsencrypt"
      - "./staging/fakelerootx1.pem:/etc/ssl/certs/fakelerootx1.pem"

What did you expect to see?

I expect to be able to succeed this command:

curl -H 'Host: traefik-testing.mydomain.com' -L https://localhost  --insecure

What did you see instead?

Curl response:

Gateway Timeout

traefik debug log:

Output of traefik version: (What version of Traefik are you using?)

v2.1

What is your environment & configuration (arguments, toml, provider, platform, ...)?

This is my traefik.yaml

## Static configuration

global:
  checkNewVersion: true
  sendAnonymousUsage: false

serversTransport:
  insecureSkipVerify: false

log:
  level: "DEBUG"

entryPoints:
  web:
    address: ":80"
  web-secure:
    address: ":443"

api:
  insecure: true # enable WEB UI
  dashboard: true
  debug: true

providers:
  marathon:
    endpoint: "http://192.168.0.22:8080"
    watch: true
    exposedByDefault: false
    respectReadinessChecks: true
certificatesResolvers:
  letsencrypt:
    acme:
      email: "me@mydomain.com"
      storage: "/letsencrypt/acme.json"
      caServer: "https://acme-v02.api.letsencrypt.org/directory"
      dnsChallenge:
        provider: ovh
        delayBeforeCheck: 10
  letsencryptStaging:
    acme:
      email: "me@mydomain.com"
      storage: "/letsencrypt/acme-staging.json"
      caServer: "https://acme-staging-v02.api.letsencrypt.org/directory"
      dnsChallenge:
        provider: ovh
        delayBeforeCheck: 10

If applicable, please paste the log output in DEBUG level (--log.level=DEBUG switch)

reverse-proxy_1  | time="2019-12-11T13:45:56Z" level=debug msg="'504 Gateway Timeout' caused by: dial tcp 212.95.74.75:31400: i/o timeout"

Note: the IP it dial as nothing to do with my LAN, it is owned by my ISP and it is not even my WAN address.

On the dashboard, IP are correctly guessed and all are reachable by telnet from traefik host.

I am out of idea, it simply does not work with marathon, any idea what is failing here?

Thanks and best!

Hi @kopax, aren't the IPs from Marathon API? And these IP could be only privates inside Marathon's infrastructure and maybe colliding with Docker's private network?

Hi @dduportal. No the IPs aren't from marathon API, but I have found what it is retrieved...

curl -L https://212.95.74.75 --insecure

<!DOCTYPE html><!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="fr"> <![endif]-->
<!--[if IE 7]>    <html class="no-js lt-ie9 lt-ie8" lang="fr"> <![endif]-->
<!--[if IE 8]>    <html class="no-js lt-ie9" lang="fr"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="fr"> <!--<![endif]-->
	<head>
		<meta charset="utf-8" />
		<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
		<meta name="viewport" content="width=device-width,initial-scale=1, maximum-scale=1,user-scalable=0">
		<link href="https://assistance.numericable.fr" rel="canonical">
		<link href="style.css" media="screen" rel="stylesheet" type="text/css">
		<meta name="robots" content="noindex">
		<title>Assistance Numericable</title>		
		<meta name="description" content="Numericable.tv, le site r&amp;eacute;f&amp;eacute;rence des programmes tv, des films et s&amp;eacute;ries en vid&amp;eacute;o à la demande (VOD) et des &amp;eacute;missions de t&amp;eacute;l&amp;eacute; en Replay ! D&amp;eacute;couvrez notre Guide TV personnalisable afin de programmer votre propre soir&amp;eacute;e t&amp;eacute;l&amp;eacute;vision, selon vos go&amp;ucirc;ts, vos humeurs ou vos envies.">		
		<link rel="shortcut icon" type="image/x-icon" href="favicon.ico" />
		<link rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,700,700italic" />
	</head>
	<body>
		<div class="assistance_globalContainer">
            <div id="mainContainer">
                <header class="assistance_header">
					<div class="assistance_container">
						<div class="assistance_header_left">
							<div class="assistance_header_baseline">Abonnements Internet Très Haut Débit, fibre optique et ADSL, télévision 3D HD, forfaits mobiles</div>
							<a href="/" class="assistance_header_logo">
								<img src="numericable-sfr.png" alt="Assistance Numericable">
							</a>
						</div>
					</div>
				</header>
				<div id="mainContent">        
					<div class="assistance_home">
						<div class="assistance_section assistance_sectionIntro">
							<div class="assistance_section_container">
								<div class="assistance_section_content">
									<h2>O&ugrave; trouver de l'aide ?</h2>
									<p>
										<p>Votre Espace Assistance a fermé ses portes le 23 novembre 2017</p>

<p>Vous pouvez dès à présent utiliser l'Assistance SFR, disponible à l'adresse https://assistance.sfr.fr<br>

<p>Pour retrouver vos paramètres d'installation, rendez-vous sur installation.numericable.fr</p>
									<a href="https://assistance.sfr.fr/" class="assistance_popinLink assistance_button assistance_button-green assistance_button-home">Acc&eacute;der &agrave; assistance.sfr.fr</a>&nbsp;&nbsp;<a href="http://installation.numericable.fr/" class="assistance_popinLink assistance_button assistance_button-green assistance_button-home">Acc&eacute;der &agrave; installation.numericable.fr</a>&nbsp;&nbsp;<a href="https://moncompte.numericable.fr/?link=ME-CO" class="assistance_popinLink assistance_button assistance_button-green assistance_button-home">Contactez-nous</a>
								</div>
							</div>
						</div>
						<div class="assistance_homeFooter">
							<div class="assistance_homeFooter_container assistance_homeFooter_content">
								&copy; 2017 Numericable. Tous droits r&eacute;serv&eacute;s.
							</div>
							
						</div>
					</div>            
				</div>
			</div>
		</div>

	</body>
</html>

This is the screenshot of that page:

My ISP use that page to indicate that an unknown DNS was requested.

marathon is running an app and show an ip with the host where the app run, for example, for me marathon show dev-11.yeutech.vn:31601, this dns is resolvable (normally), I am pretty sure that traefik can't resolve it and return a fallback IP from my ISP.

I am 99% sure the issue is that, if I tell marathon not to use the host, I can prove that using the IP instead of the DNS will work.

The issue is that traefic reverse proxy try to redirect to that DNS so traefik MUST be able to resolve it

It can't resolve the DNS even if the host as no network misconfiguration. if it was an IP, it could probably serve the page.

To me traefik is having a big bug, either traefik skip some DNS resolution, or perform some DNS resolution

What does traefik to solve DNS? Any idea what I can do?

My ISP host a page in case a DNS is not known. This is the IP that traefik return to me, while it should return another IP (the one from marathon). I have never configured that one anywhere, it's not mine.

I don't get why traefic is getting crazy about it, any idea what to do?

Edit

This is the DNS returning wrong page: https://testing-traefik.icimatin.com/

I did some tcpdump,

My /etc/resolv.conf:

# Generated by NetworkManager
search numericable.fr
nameserver 8.8.8.8
nameserver 8.8.4.4

It seems that dev-11.numericable.fr is really redirecting to 212.95.74.75

I was configuring marathon with an intranet host (dev-11), this was the tcp dump:

/ # tcpdump -ni any port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes


23:37:25.891422 IP 192.168.0.16.34846 > 8.8.8.8.53: 378+ AAAA? dev-11.numericable.fr. (39)
23:37:25.891479 IP 192.168.0.16.50642 > 8.8.8.8.53: 43249+ A? dev-11.numericable.fr. (39)
23:37:25.904896 IP 8.8.8.8.53 > 192.168.0.16.34846: 378 1/1/0 CNAME nc-ass-vip.sdv.fr. (120)
23:37:25.930865 IP 8.8.8.8.53 > 192.168.0.16.50642: 43249 2/0/0 CNAME nc-ass-vip.sdv.fr., A 212.95.74.75 (84)
23:37:26.739694 IP 192.168.0.16.37848 > 8.8.8.8.53: 4284+ A? static.asm.skype.com. (38)
23:37:26.801338 IP 8.8.8.8.53 > 192.168.0.16.37848: 4284 3/0/0 CNAME static-asm-skype.trafficmanager.net., CNAME neu1-authgw.cloudapp.net., A 13.94.102.123 (138)
23:37:41.825827 IP 192.168.0.16.52410 > 8.8.8.8.53: 11463+ A? static.asm.skype.com. (38)
23:37:41.838653 IP 8.8.8.8.53 > 192.168.0.16.52410: 11463 3/0/0 CNAME static-asm-skype.trafficmanager.net., CNAME neu1-authgw.cloudapp.net., A 13.94.102.123 (138)
23:37:41.865510 IP 192.168.0.16.58649 > 8.8.8.8.53: 5711+ A? c1.iggcdn.com. (31)
23:37:41.880401 IP 8.8.8.8.53 > 192.168.0.16.58649: 5711 1/0/0 A 34.102.138.247 (47)
23:37:41.903823 IP 192.168.0.16.57996 > 8.8.8.8.53: 58412+ A? pixel.adsafeprotected.com. (43)
23:37:41.925646 IP 8.8.8.8.53 > 192.168.0.16.57996: 58412 2/0/0 CNAME anycast.pixel.adsafeprotected.com., A 199.166.0.26 (81)
23:37:41.947066 IP 192.168.0.16.60551 > 8.8.8.8.53: 42171+ A? fonts.googleapis.com. (38)
23:37:41.961610 IP 8.8.8.8.53 > 192.168.0.16.60551: 42171 1/0/0 A 216.58.215.42 (54)
23:37:50.180251 IP 192.168.0.16.48654 > 8.8.8.8.53: 14439+ A? d3cv4a9a9wh0bt.cloudfront.net. (47)
23:37:50.191554 IP 8.8.8.8.53 > 192.168.0.16.48654: 14439 4/0/0 A 13.225.84.215, A 13.225.84.133, A 13.225.84.80, A 13.225.84.126 (111)

It was fallbacking to dev-11.numericable.fr, so I have configured a fqdn (dev-11.yeutech.vn) with marathon, now the tcpdump look like:

 # tcpdump -ni any port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
23:45:51.408693 IP 192.168.0.16.52437 > 8.8.8.8.53: 60928+ A? static.asm.skype.com. (38)
23:45:51.435049 IP 8.8.8.8.53 > 192.168.0.16.52437: 60928 3/0/0 CNAME static-asm-skype.trafficmanager.net., CNAME neu1-authgw.cloudapp.net., A 13.94.102.123 (138)
23:45:51.457513 IP 192.168.0.16.58807 > 8.8.8.8.53: 13877+ A? edge.skype.com. (32)
23:45:51.465513 IP 8.8.8.8.53 > 192.168.0.16.58807: 13877 3/0/0 CNAME edge-skype-com.s-0001.s-msedge.net., CNAME s-0001.s-msedge.net., A 13.107.3.128 (110)
23:45:51.488500 IP 192.168.0.16.51746 > 8.8.8.8.53: 41661+ A? csm.nl.eu.criteo.net. (38)
23:45:51.497684 IP 8.8.8.8.53 > 192.168.0.16.51746: 41661 1/0/0 A 178.250.2.150 (54)
23:45:56.754038 IP 192.168.0.16.56726 > 8.8.8.8.53: 6818+ AAAA? dev-11.yeutech.vn. (35)
23:45:56.754055 IP 192.168.0.16.42437 > 8.8.8.8.53: 59618+ A? dev-11.yeutech.vn. (35)
23:45:57.177981 IP 8.8.8.8.53 > 192.168.0.16.56726: 6818 NXDomain 0/1/0 (91)
23:45:57.178003 IP 8.8.8.8.53 > 192.168.0.16.42437: 59618 NXDomain 0/1/0 (91)
23:45:57.178478 IP 192.168.0.16.40609 > 8.8.8.8.53: 17534+ AAAA? dev-11.yeutech.vn.numericable.fr. (50)
23:45:57.178504 IP 192.168.0.16.54313 > 8.8.8.8.53: 63898+ A? dev-11.yeutech.vn.numericable.fr. (50)
23:45:57.205267 IP 8.8.8.8.53 > 192.168.0.16.54313: 63898 2/0/0 CNAME nc-ass-vip.sdv.fr., A 212.95.74.75 (95)
23:45:57.217886 IP 8.8.8.8.53 > 192.168.0.16.40609: 17534 1/1/0 CNAME nc-ass-vip.sdv.fr. (131)
23:46:06.521472 IP 192.168.0.16.34661 > 8.8.8.8.53: 33403+ A? static.asm.skype.com. (38)
23:46:06.552800 IP 8.8.8.8.53 > 192.168.0.16.34661: 33403 3/0/0 CNAME static-asm-skype.trafficmanager.net., CNAME neu1-authgw.cloudapp.net., A 13.94.102.123 (138)
23:46:06.574352 IP 192.168.0.16.43737 > 8.8.8.8.53: 24754+ A? static-asm.secure.skypeassets.com. (51)
23:46:06.597444 IP 8.8.8.8.53 > 192.168.0.16.43737: 24754 6/0/0 CNAME 1180c.wpc.azureedge.net., CNAME 1180c.ec.azureedge.net., CNAME lb.apr-1180c.edgecastdns.net., CNAME hlb.apr-1180c-0.edgecastdns.net., CNAME cs10.wpc.v0cdn.net., A 68.232.34.200 (225)
23:46:06.618327 IP 192.168.0.16.44711 > 8.8.8.8.53: 51941+ A? www.googleapis.com. (36)
23:46:06.634170 IP 8.8.8.8.53 > 192.168.0.16.44711: 51941 1/0/0 A 216.58.201.234 (52)

I am still confused, because my /etc/hosts is having :

/ # cat /etc/hosts | grep dev-11
192.168.0.22 dev-11.yeutech.vn dev-11

host file seems to be ignored by traefik...

After doing:

  • using marathon with FQDN dev-11.yeutech.vn
  • creating a real DNS, dev-11.yeutech.vn
  • opening marathon private port publicly

I was able to query my server. But I don't know how to fix it, because the port can't remain publicly open, and I don't know why /etc/hosts does not get resolved causing all those issue.

Any idea?

Not related to #1243, it seems that /etc/hosts is ignored totally by traeffic.

Related to https://stackoverflow.com/questions/49476452/traefik-forwarding-to-a-host-and-overriding-ip

/etc/nsswitch.conf is absent from the system, doing:

echo "hosts: files dns" > /etc/nsswitch.conf

will solve the issue of /etc/hosts being ignore.

Have you try this option?
https://docs.traefik.io/providers/marathon/#forcetaskhostname

Yes I did.

I submited a fix https://github.com/containous/traefik/pull/6012 but it was rejected because it was not on the official Dockerfile. I don't know where is the official Dockerfile.

Status: you opened https://github.com/containous/traefik-library-image/pull/75.