504 Gateway timeout errors on Kubernetes


I just started using Traefik and Kubernetes in an effort to create a single-node server for a staging/lab environment.

I elected to create a k8s cluster (albeit single node) with kubeadm on Debian 11 and installed Flannel as the network plugin.

Following various documentions and tutorials, I managed to get the cluster working and added my first IngressRoute to expose Traefik's dashboard on a domain name with basic auth. This works as expected.

Traefik has been installed with the latest Helm chart and very few of the default values changed.

One thing to note is that the server runs on a private IP behind a NAT from a pfSense router, behind our ISP. The external IP of the cluster is set to the private IP and this seems to work ok for now. I only added NAT rules for ports 80 and 443 from the Internet to the server.

I then wanted to try a simple whoami before going to more complicated loads like the Kubernetes Dashboard, Portainer and eventually real workloads.

The problem is that I can create IngressRoutes without issue, they show up correctly in Traefik's dashboard and they point to the correct internal service addresses, but whenever I try to access them from a domain name, I get 504 Gateway timeout errors. This only happens when the IngressRoute rules match (i.e I get the regular 404 when trying something that does not exist).

I activated the access logs in Traefik but I can't see any more useful information regarding the error. Browsing this forum I found this topic Kubernetes - IngressRoutes - Gateway Timeout - Namespaces and thought my problem was similar (namespaces) and tried to deploy whoami in the same namespace as Traefik, to no avail.

I think my problem might be somehow related to the network stack and its interaction with Traefik, but I can't be sure. To clarify, I can do cURL requests on the internal addresses of the services and they are reacheable. It's only through Traefik IngressRoutes that they don't work.

Any advice would be much appreciated! I can give configuration and details if needed.

Well... This is embarassing, but I managed to get it working.

I first thought that the network plugin was causing the problem and proceeded to replace Flannel by Calico. There were no visible changes afterwards.

Then, in a last ditch effort to try and eliminate any externalities, I decided to... reboot the server completely. And now everything works as intended... Go figure!

I won't know if the switch to Calico did it, or the reboot, or both, but I'm glad it works. If anyone has any insight as to what might have happened, I would be very interested to know why this worked out this way.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.