Tip/info for MariaDB Galera cluster load balancing

So, I'm trying to build a MariaDB Galera cluster with three nodes (VMs) on my infrastructure.
I'm currently debating if it's better to use keepalived to point directly to port 3306 of the cluster, use keepalived to point to a Træfik instance with load balancing of all three servers' IPs, or perhaps (if possible) to use just Træfik and let it manage a virtual IP on its own in load balancing mode (having three instances, one on each node, instead of one on the primary node or on an external system).

All this because I would like no single point of failure in software too (hardware is been taken care already, three proxmox servers, each VM is on a different host, networking is managed by two mikrotik switches with MLAG, each server has two network ports and is connected to both switches, disks are on a CEPH cluster with high availability and redundancy, etc etc).

So, community, for those of you who manage a highly available MariaDB cluster, what is the best option (also for consistency)? Virtual IP managed by keepalived? Træfik or not Træfik? If Træfik, VIP managed by Træfik? How many instances of Træfik? Or just configure the applications to connect to the servers' IPs directly? (please tell me no, lol)

Thanks in advance for all the info!

PS: don't tell me "just use k8s", lol.
I know k8s exists, I'm learning it, I want to "jump through the hoops" before using the "latest and greatest" technologies, I want to know how the bloody things work under the hood, also I'm doing k8s the hard way, so that might give you an idea on how much time will it take me to grasp the concepts :sweat_smile:

Different approach:

We currently use a MongoDB cluster and every app is set up with a connection pool to all 3 servers.

Can the driver for MariaDB handle multiple servers? That way it could probably optimize read/write.

I would put Traefik on each node where is your DB client and DB client would connect on localhost (one hop saved). Traefik would loadbalance and prioritize db server based on expected latency.

From Traefik Services doc:

For now, only round robin load balancing is supported

1 Like

I was aming for the scenario, where the DB client and DB server runs on the same host to prioritize it to save another hop. The documentation suggest one can use service failover or weights to achieve that. But it looks to me, that traefik might not be able to use proper healtchecks to MariaDB endpoints (it looks as if it expecteded HTTP and gRPC upstreams only). I would suggest using local instance of HAProxy instead.

You can use the regular Docker healthchecks for the target services/containers, so pretty much anything is possible.

You mean to use docker healthcheck on the DB node to restart or shutdown failing node so that Traefik would mark it as unhealthy and remove from load balancing?

Traefik will only load balance to Docker nodes with good health check, if health check is enabled.