Unstable IngressRouteTCP with MySQL / PostgreSQL services

I have Traefik 2.10.1 deployed in Kubernetes using Helm (v. 23.0.1).

Two extra ports are defined, for exposing MySQL and PSQL services outside the cluster:

ports:
  traefik:
    port: 9000
    expose: false
    exposedPort: 9000
  web:
    port: 8000
    expose: true
    exposedPort: 80
    nodePort: 30000
  websecure:
    port: 8443
    expose: true
    exposedPort: 443
    nodePort: 30001
  mysql:
    port: 32001
    expose: true
    exposedPort: 32001
    nodePort: 32001
  pgsql:
    port: 32002
    expose: true
    exposedPort: 320
    nodePort: 32002

A sample IngressRouteTCP is created for the MySQL service:

apiVersion: traefik.io/v1alpha1
kind: IngressRouteTCP
metadata:
  name: pxc-haproxy
  namespace: pxc
spec:
  entryPoints:
    - mysql
  routes:
    - match: HostSNI(`*`)
      services:
        - name: pxc-muc2-haproxy
          port: 3306

When connected to the database, either using standard mysql shell or from applications outside the cluster, the connection is dropped / lost often:

mysql> select count(*) from users;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    1271682
Current database: zabbix
  5795:20230606:070822.443 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
  5717:20230606:070834.373 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]

Things I have tried to far:

  • experiment with nativeLB true/false - doesn't change the behavior
  • increase / disable terminationDelay - same results
  • expose a similar TCP service for PostgreSQL - same issue
  • when using standard Kubernetes port-forward using the exact same service, there is no interruption

Is there anything I am missing here?

What does Traefik debug log tell you?

Not much:

time="2023-06-08T05:25:16Z" level=debug msg="Handling TCP connection from 10.1.2.13:53852 to 172.20.6.154:3306"
time="2023-06-08T05:25:21Z" level=debug msg="Handling TCP connection from 10.1.2.13:53924 to 172.20.7.62:3306"
time="2023-06-08T05:25:25Z" level=debug msg="Handling TCP connection from 10.1.2.13:53964 to 172.20.9.247:3306"
time="2023-06-08T05:25:38Z" level=debug msg="Handling TCP connection from 10.1.2.13:54136 to 172.20.6.154:3306"
time="2023-06-08T05:25:42Z" level=debug msg="Handling TCP connection from 10.1.2.13:54190 to 172.20.7.62:3306"
time="2023-06-08T05:25:43Z" level=debug msg="Handling TCP connection from 10.1.2.13:54210 to 172.20.9.247:3306"
time="2023-06-08T05:25:43Z" level=debug msg="Handling TCP connection from 10.1.2.13:54226 to 172.20.6.154:3306"
time="2023-06-08T05:25:43Z" level=debug msg="Handling TCP connection from 10.1.2.13:54246 to 172.20.7.62:3306"
time="2023-06-08T05:25:44Z" level=debug msg="Handling TCP connection from 10.1.2.13:54260 to 172.20.9.247:3306"
time="2023-06-08T05:25:44Z" level=debug msg="Handling TCP connection from 10.1.2.13:54274 to 172.20.6.154:3306"
time="2023-06-08T05:25:44Z" level=debug msg="Handling TCP connection from 10.1.2.13:54286 to 172.20.7.62:3306"
time="2023-06-08T05:25:44Z" level=debug msg="Handling TCP connection from 10.1.2.13:54294 to 172.20.9.247:3306"
time="2023-06-08T05:25:48Z" level=debug msg="Handling TCP connection from 10.1.2.13:54350 to 172.20.6.154:3306"
time="2023-06-08T05:25:53Z" level=debug msg="Handling TCP connection from 10.1.2.13:54426 to 172.20.7.62:3306"

While on the Zabbix side:

7426:20230608:072519.946 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
7435:20230608:072519.961 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
7470:20230608:072521.826 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host=some-hostname and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
7471:20230608:072523.569 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host=some-hostname and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
7424:20230608:072524.828 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid,r.delay from drules r left join dchecks c on c.druleid=r.druleid and c.uniq=1 where r.status=0 and r.nextcheck<=1686201924 and mod(r.druleid,1)=0]
7416:20230608:072525.135 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select hostid,key_,evaltype,formula,lifetime from items where itemid=80725]
7469:20230608:072526.864 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host=some-hostname and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
7415:20230608:072544.831 [Z3005] query failed: [2013] Lost connection to MySQL server during query [select hostid,key_,evaltype,formula,lifetime from items where itemid=80744]
7429:20230608:072544.896 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
7405:20230608:072547.184 [Z3005] query failed: [1158] Got an error reading communication packets [select discovery_groupid,snmptrap_logging,severity_name_0,severity_name_1,severity_name_2,severity_name_3,severity_name_4,severity_name_5,hk_events_mode,hk_events_trigger,hk_events_internal,hk_events_discovery,hk_events_autoreg,hk_services_mode,hk_services,hk_audit_mode,hk_audit,hk_sessions_mode,hk_sessions,hk_history_mode,hk_history_global,hk_history,hk_trends_mode,hk_trends_global,hk_trends,default_inventory_mode,db_extension,autoreg_tls_accept,compression_status,compress_older,instanceid,default_timezone,hk_events_service,auditlog_enabled from config order by configid]
zabbix_server [7405]: ERROR [file and function: <dbconfig.c,DCsync_configuration>, revision:d2032721bc8, line:6673] Something impossible has just happened.
7405:20230608:072547.184 === Backtrace: ===
7405:20230608:072547.185 9: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](zbx_backtrace+0x3c) [0x56400611a6ec]
7405:20230608:072547.185 8: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](DCsync_configuration+0xc46) [0x5640060c3de6]
7405:20230608:072547.185 7: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](dbconfig_thread+0x2a7) [0x564005fc5437]
7405:20230608:072547.185 6: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](zbx_thread_start+0x27) [0x56400611cab7]
7405:20230608:072547.185 5: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](+0x71e4a) [0x564005fb7e4a]
7405:20230608:072547.185 4: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](MAIN_ZABBIX_ENTRY+0x9d3) [0x564005fb8e73]
7405:20230608:072547.185 3: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](daemon_start+0x220) [0x56400611a400]
7405:20230608:072547.185 2: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](main+0x49e) [0x564005fb046e]
7405:20230608:072547.185 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fd6a4b5109b]
7405:20230608:072547.185 0: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.797177 sec, syncing configuration](_start+0x2a) [0x564005fb723a]
7434:20230608:072547.902 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
7480:20230608:072548.187 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
7438:20230608:072549.955 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
7426:20230608:072553.966 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
7427:20230608:072553.969 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
7435:20230608:072553.981 [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]

Hi @backaf, thanks for your interest in Traefik!

Could you share your static configuration?

This is the full helm config:

providers:
  kubernetesCRD:
    enabled: true
    allowCrossNamespace: true

deployment:
  enabled: true
  replicas: 3

globalArguments:
  - "--global.checknewversion"

ports:
  traefik:
    port: 9000
    expose: false
    exposedPort: 9000
  web:
    port: 8000
    expose: true
    exposedPort: 80
    nodePort: 30000
  websecure:
    port: 8443
    expose: true
    exposedPort: 443
    nodePort: 30001
  mysql:
    port: 32001
    expose: true
    exposedPort: 32001
    nodePort: 32001
  pgsql:
    port: 32002
    expose: true
    exposedPort: 32002
    nodePort: 32002

additionalArguments:
  - "--entryPoints.web.proxyProtocol.trustedIPs=127.0.0.1/32,10.111.222.12/32,10.111.222.13/32"
  - "--entryPoints.web.forwardedHeaders.trustedIPs=127.0.0.1/32,10.111.222.12/32,10.111.222.13/32"
  - "--entryPoints.websecure.proxyProtocol.trustedIPs=127.0.0.1/32,10.111.222.12/32,10.111.222.13/32"
  - "--entryPoints.websecure.forwardedHeaders.trustedIPs=127.0.0.1/32,10.111.222.12/32,10.111.222.13/32"

service:
  enabled: true
  type: NodePort
  spec:
    externalTrafficPolicy: Local

nodeSelector:
    ingress-endpoint: "true"

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app.kubernetes.io/name
          operator: In
          values:
          - traefik
      topologyKey: kubernetes.io/hostname

I believe this is related to our environment and not Traefik. We run Kubernetes on-premises with HAProxy and Keepalived handling failover and LB to our Traefik ingress pods.

This morning, I switched from using the HAProxy address, directly to one of Trafik's NodePorts and I was no longer able to reproduce the issue.

I later ended up increasing MySQL's net_read_timeout and net_write_timeout together with HAProxy's client and server timeout. It seems the default values were too low.

With a timeout of 300s, I rarely see the MySQL errors. I will keep testing for a few more days to confirm this is the issue.

Some references:
MySQL load balancing with HAProxy | Severalnines (MySQL Server has gone away part)
I use haproxy as banlancer for mariadb cluster,but got lost connection during query - Stack Overflow
Haproxy + Mysql Lost connection to MySQL server during query - Database Administrators Stack Exchange

Thanks for your help!