Traefik V2 and ConsulCatalog

Hi! Not really sure if this a bug/missing feature/config error on my part. But Traefik with Consulcatalog in 2.1 causes severe outages during deployments of services Nomad/Consul cluster where services deregister and registers again. In 1.7.X backends beeing removed and added are instantly represented in the routing table. But with 2.1 old services stay present sending traffic to a service that doesn't exist..
In 1.7.X i can do a

while :; do curl domain/echo; end

and not miss a single reply when doing deploy of a new version
with 2.1.X it's 45 Seconds of mixed successful responsens and bad gateway responses (edited)

I have watch = true in 1.7.x even though it's not documented, don't know if it make a different. It's available as a configuration option in 1.7 Consul (not catalog). In 2.1.X it won't even start with that config option.
Could someone please point me in the right direction?

Hello,

watch=true for Consul Catalog do nothing in Traefik v1, this is why it's not documented.

The configuration of Consul and Consul Catalog are not related.

The core code of the CC provider is mainly the same in Traefik v1 and Traefik v2.

So could you give the exact version of Traefik that you are using and more information about your configuration and Traefik log (not access log)

Could you provide a simple reproducible case.

Hi. Sure. Here's some information about the environment i'm running it in:
Small cluster with 3 Nomad servers and 2 workers in AWS behind an ALB. Worker nodes run Traefik on them.
Nomad 0.10.4
Consul 1.7.2
Traefik 2.1.4

Traefik Config:

[global]
  checkNewVersion = false
  sendAnonymousUsage = false

[entryPoints]
  [entryPoints.http]
    address = ":9999"
    [entryPoints.http.forwardedHeaders]
      insecure = true
  [entryPoints.traefik]
    address = ":9998"

[providers]
  [providers.consulCatalog]
    prefix = "traefik"
    requireConsistent = true     # Also tried with False here, no difference
    exposedByDefault = false
    [providers.consulCatalog.endpoint]
      address = "http://127.0.0.1:8500"
      scheme = "http"

[api]
  insecure = true
  dashboard = true
  debug = true

[metrics]
  [metrics.prometheus]
    buckets = [0.1,0.3,0.5,1.0,1.5,5.0]
    entryPoint = "traefik"


[ping]
  entryPoint = "traefik"

[log]
  level = "debug"

Nomad jobfile:

job "http-echo" {
    region      = "eu-central-1"
    datacenters = ["eu-central-1"]
    type        = "service"

    group "debug" {
        count = 2

        constraint {
            operator = "distinct_hosts"
            value = "true"
        }
        constraint {
            #Place workload in different Availability Zones
            distinct_property = "${attr.platform.aws.placement.availability-zone}"
        }
        update {
                max_parallel = 1
                min_healthy_time = "10s"
                healthy_deadline = "5m"
        }
        task "http-echo" {
            driver = "docker"
            config {
                image = "hashicorp/http-echo"
                port_map {
                        echo = 5678
                }
                args  = ["-text", "version2: ${node.unique.name}, ip: ${attr.unique.network.ip-address}"]
            }
            service {
                name = "http-echo"
                tags = [
                    "http-echo",
                    "traefik.enable=true",
                    "traefik.http.routers.http-echo.rule=Host(`new-test2.example.com`)",
                ]

                port = "echo"
                check {
                    type     = "http"
                    path     = "/"
                    interval = "5s"
                    timeout  = "2s"
                    port     = "echo"
                }
            }
            resources {
                cpu = 20
                memory = 20
                network {
                    port "echo" { }
                }
            }
        }
    }
}

Curl result during deploy (normally i run without sleep and it's no problem, not a single Bad Gateway. But added a sleep to not spam with too much unrelevant info:

while :; do echo -n $(date) " "; curl https://new-test2.example.com; sleep 0.5;done
Wed 04 Mar 2020 03:03:07 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:08 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:09 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:09 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:10 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:10 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:11 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:12 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:12 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:13 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:14 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:14 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:15 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:15 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:16 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:17 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:17 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:18 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:19 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:19 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:20 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:20 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:21 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:22 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:22 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:23 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:24 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:24 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:25 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:26 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:26 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:27 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:27 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:28 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:29 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:29 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:30 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:31 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:31 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:32 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:32 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:33 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:34 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:34 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:35 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:36 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:36 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:37 PM CET  version1: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:37 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:38 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:39 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:39 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:40 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:41 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:41 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:42 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:42 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:43 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:44 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:44 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:45 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:46 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:46 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:47 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:48 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:48 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:49 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:50 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:50 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:51 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:51 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:52 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:53 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:53 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:54 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:55 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:55 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:56 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:57 PM CET  version1: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:03:57 PM CET  Bad Gateway
Wed 04 Mar 2020 03:03:58 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:58 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:03:59 PM CET  Bad Gateway
Wed 04 Mar 2020 03:04:00 PM CET  Bad Gateway
Wed 04 Mar 2020 03:04:00 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:01 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:02 PM CET  Bad Gateway
Wed 04 Mar 2020 03:04:02 PM CET  Bad Gateway
Wed 04 Mar 2020 03:04:03 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:03 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:04 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:05 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:05 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:06 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:07 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:07 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:08 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:08 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:09 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:10 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:10 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:11 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:12 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:12 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:13 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:13 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139
Wed 04 Mar 2020 03:04:14 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:15 PM CET  version2: ip-10-0-1-223, ip: 10.0.1.223
Wed 04 Mar 2020 03:04:15 PM CET  version2: ip-10-0-3-139, ip: 10.0.3.139

Logs from 10.0.1.223
Logs from 10.0.3.139

Config that works on 1.7.19:

# traefik.toml
################################################################
# Global configuration
################################################################

[entryPoints]
  [entryPoints.http]
  address = ":9999"

  [entryPoints.traefik_api]
  address = ":9998"

[ping]
  entryPoint = "traefik_api"

[api]
  entryPoint = "traefik_api"
  dashboard = true
  debug = true

################################################################
# Consul Catalog configuration backend
################################################################

[consulCatalog]
  endpoint = "127.0.0.1:8500"
  domain = "service.consul"
  prefix = "traefik"
  watch = true

[metrics]
  [metrics.prometheus]
    entryPoint = "traefik_api"
    buckets = [0.1,0.3,1.2,5.0]

Could you try with Traefik v2.1.6

Getting the same issue with 2.1.6

I revisited this with traefik 2.2.0 today, unfortunately the problem persist. Is there anything i can do to help troubleshoot this?

Hi amase!
You can use option: endpointWaitTime

@thanhkm12 Hi! Thanks for the suggestion. Unfortunately i don't see any improvements.. I've tried with endpointWaitTime = "1s" and endpointWaitTime = "30s", at 1s i still got the same amount of dropped request... and with 30s i got alot more dropped requests.

Any updates on this issue?

Turned out there were actually missing requests during earlier versions of Traefik also (altho not as frequent as in 2.X). Went back to using Fabio that works without any issues during deploys of services and since this problem is over half a year old i don't believe there is any interest in fixing it. (Fyi i did a quick test with 2.3.1 but the issue is still present)

I've been suffering from this :slight_smile: My suspicion is traefik/consul_catalog.go at v2.4 · traefik/traefik · GitHub here traefik fetches HealthAny state from consul. In your case, setting it to HealthPassing will likely solve this problem.

For those, who are suffering from this, by default traefik refreshes consul catalog every 15 secs, so you can set that interval to a lower one (something like 1s is fine), and set your shutdown_delay to something like 2s. Using this method, you'll do the switch without downtime.