So, I'm trying to build a MariaDB Galera cluster with three nodes (VMs) on my infrastructure.
I'm currently debating if it's better to use keepalived to point directly to port 3306 of the cluster, use keepalived to point to a Træfik instance with load balancing of all three servers' IPs, or perhaps (if possible) to use just Træfik and let it manage a virtual IP on its own in load balancing mode (having three instances, one on each node, instead of one on the primary node or on an external system).
All this because I would like no single point of failure in software too (hardware is been taken care already, three proxmox servers, each VM is on a different host, networking is managed by two mikrotik switches with MLAG, each server has two network ports and is connected to both switches, disks are on a CEPH cluster with high availability and redundancy, etc etc).
So, community, for those of you who manage a highly available MariaDB cluster, what is the best option (also for consistency)? Virtual IP managed by keepalived? Træfik or not Træfik? If Træfik, VIP managed by Træfik? How many instances of Træfik? Or just configure the applications to connect to the servers' IPs directly? (please tell me no, lol)
PS: don't tell me "just use k8s", lol.
I know k8s exists, I'm learning it, I want to "jump through the hoops" before using the "latest and greatest" technologies, I want to know how the bloody things work under the hood, also I'm doing k8s the hard way, so that might give you an idea on how much time will it take me to grasp the concepts
I would put Traefik on each node where is your DB client and DB client would connect on localhost (one hop saved). Traefik would loadbalance and prioritize db server based on expected latency.
I was aming for the scenario, where the DB client and DB server runs on the same host to prioritize it to save another hop. The documentation suggest one can use service failover or weights to achieve that. But it looks to me, that traefik might not be able to use proper healtchecks to MariaDB endpoints (it looks as if it expecteded HTTP and gRPC upstreams only). I would suggest using local instance of HAProxy instead.
You mean to use docker healthcheck on the DB node to restart or shutdown failing node so that Traefik would mark it as unhealthy and remove from load balancing?