Problem with 404s when upgrading Helm Chart

Hello,

I have recently upgraded the helm chart from v12.0.0 to v21.2.1, corresponding to traefik version 2.9.1 to 2.9.9.

Unfortunately, after upgrade all of my Ingress and IngressRoutes are returning nothing but 404.

I have:

  • Fixed service/deployment labels as described in traefik helm chart upgrade notes
  • Turned on the Traefik Dashboard and confirmed that these routes exist
  • Ensured that CRDs are up to date
  • tried using ingressClassName: traefik on the ingress as well as what I currently had in place (the kubernetes.io/ingress.class: traefik annotation)
  • Checked logging and turned up verbosity. It looks like Ingresses are getting picked up properly
time="2023-04-24T02:14:30Z" level=debug msg="Adding route for prom.staging.url with TLS options default" entryPointName=websecure
time="2023-04-24T02:14:30Z" level=debug msg="Adding route for cost.staging.url with TLS options default" entryPointName=websecure

But then curl -i https://cost.staging.url returns 404

  • confirmed that requests are reaching traefik (access logs, packet tracing, etc.)
  • Access log entries look like this:
my.ip.address - - [24/Apr/2023:02:45:19 +0000] "GET / HTTP/1.1" - - "-" "-" 379 "-" "-" 0ms

Does anyone have ideas about what could be done to gather some more diagnostic information? Do I need to start from scratch on this cluster / delete and recreate traefik and its load balancer? I'd rather avoid this since it creates the need to redefine DNS and whatnot. Moreover, I am not optimistic that this will solve the problem... (again, the routes all exist as expected in the Traefik Dashboard!). In fact, the helm-chart created ingressRoute is the only one that I can get to work... is it possible that the fact that I almost exclusively use Host() rules is breaking things for some reason?

I'm hopeful I am missing something obvious :smile:

Hi @_cole! Thanks for your interest in Traefik!

Could you run Traefik in DEBUG mode and post the output?

Thanks!

Howdy @svx !

Am I understanding right that this is just running traefik with DEBUG log level? Or is there something more than that?

Thanks!

Hey @_cole,

Yes, that is right! Just run Traefik with log level set to DEBUG.

:muscle: !

1 Like

I will admit, it is a bit much :grimacing: I can try to create a simpler example if that would be helpful?

In any case, it was 10x the length of what I was allowed to post, so I put it into a gist :sweat_smile:

I'm wondering if something related to my plugin setup may be the problem. Just strange b/c it all worked pre-upgrade.

Hi @_cole,
We can't find something in the log about your configuration.
There should be a line "Configuration received ..."

Did you remove this part from the log file?

I didn't. :open_mouth: Is Configuration loaded from flags not sufficient? I am using the helm chart, so I believe just about everything is either from flags or env vars - no config map or anything?

Maybe another way of attacking this problem - does anyone have an example of a deployment to EKS with service type LoadBalancer that uses Ingresses to route traffic? A fairly vanilla deployment to my same cluster is also struggling, all without any clear logging :frowning_face: I'll keep digging - there is still something funky about this first non-default ingress class with a label selector.

Hi @_cole,
Just an observation from your log.
Are you sure that you're reaching Traefik?

We don't see anything in the logs.

Merp. Sorry about that. I must have forgotten to hit the endpoint after restarting :frowning_face:

I will generate a new log here in a bit. Basically, the only line I missed was the access log line that I mentioned above. The lack of a status is confounding - maybe that's just how things work if they don't match a router? But why it is not matching a router is the most surprising and befuddling:

my.ip.addr.90 - - [26/Apr/2023:14:33:27 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 51 "ping@internal" "-" 0ms
my.ip.addr.90 - - [26/Apr/2023:14:33:27 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 52 "ping@internal" "-" 0ms
my.ip.addr.90 - - [26/Apr/2023:14:33:37 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 53 "ping@internal" "-" 0ms
my.ip.addr.90 - - [26/Apr/2023:14:33:37 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 54 "ping@internal" "-" 0ms
my.ip.addr.90 - - [26/Apr/2023:14:33:47 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 56 "ping@internal" "-" 0ms
my.ip.addr.90 - - [26/Apr/2023:14:33:47 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 55 "ping@internal" "-" 0ms
my.ip.addr.32 - - [26/Apr/2023:14:33:50 +0000] "GET / HTTP/1.1" - - "-" "-" 57 "-" "-" 0ms
my.ip.addr.32 - - [26/Apr/2023:14:33:51 +0000] "GET / HTTP/1.1" - - "-" "-" 58 "-" "-" 0ms
my.ip.addr.32 - - [26/Apr/2023:14:33:52 +0000] "GET / HTTP/1.1" - - "-" "-" 59 "-" "-" 0ms

Could do a GET request to the /api/rawdata endpoint of Traefik and share the output?

Could you also share an example curl command of a request so that we can see which/if/how an entrypoint was reached?

~ curl -i https://staging.domain.com/
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Thu, 04 May 2023 17:01:54 GMT
Content-Length: 19

404 page not found
my.ip.addr.57 - - [04/May/2023:17:01:54 +0000] "GET / HTTP/1.1" - - "-" "-" 145165 "-" "-" 0ms

And the rawdata endpoint:

Unfortunately, this is still a very complicated example as I have been pulled into other teams and will probably have to revert out of this upgrade for now. This is a super useful debugging note, though, thank you!! If you see anything obvious, please do let me know :slight_smile: It is a bit much though :sweat_smile: