Intermittent bad gateway 504 from AWS load balancer


I am looking to have Traefik running in kubernetes on an AWS EC2 instance.
It is behind an AWS loadbalancer.

Traefik is exposed via a nodeport, e.g. 30080, and passes connections to the service containers; including an nginx instance, a nodejs instance, and a containous/whoami instance.

Traefik is setup to forward requests by host to the 3 services. For testing I have a set of host names pointed at the loadbalancer, and another set pointed at the EC2 public ip directly.

If i connect to the AWS loadbalancer, sometimes it works fine, sometimes I get "bad gateway 504 errors".
All 3 services exhibit the same intermittent behaviour.

If i connect to the EC2 instance on port 30080 directly it always works fine.

When i get the 504's from the AWS loadbalancer, nothing shows up in the traefik logs at all. (except the periodic healthchecks from the loadbalancer.)

Everything I've read seems to suggest that it's related to http keep alive. The AWS loadbalancer is set to the default 60 seconds. And the AWS docs suggest that that backend needs to have keep alive set slightly longer than this.

Q1 - Am I on the right track? I cannot find any keep alive settings in traefik. What are the defaults? Can I set them / how do i set them?

Q2 - Should SSL termination be done at the aws loadbalancer or at traefik? I've tried both ways (using an HTTP/HTTPS loadbalancer with AWS certificates pointing at traefik nodeport 30080; and using a classic TCP loadbalancer proxying 80 to 30080 and 443 to 30443 with traefik doing acme certs; and I had it working both ways, but both ways exhibited the same intermittent bad gateway 504 issues.