Issues with Prometheus/Traefik Metrics - "Context Deadline Exceeded" Error

FuriousGopher · August 14, 2024, 1:46pm

Hello Traefik community,

I'm currently running Traefik in a Kubernetes cluster, and I'm encountering an issue with Prometheus scraping the metrics from Traefik. Here's a detailed overview of my setup and the problem I'm facing:

Cluster Details:

Kubernetes Environment: Running on a cloud provider (please specify your cloud provider if applicable, e.g., GKE, EKS, DigitalOcean)
Traefik Version: Using the official Traefik Helm chart
Namespace: traefik
Prometheus Version: Deployed using the Prometheus Helm chart

globalArguments:
  - "--global.sendanonymoususage=false"
  - "--global.checknewversion=false"

additionalArguments:
  - "--serversTransport.insecureSkipVerify=true"
  - "--log.level=DEBUG"
  - "--accessLog=true"
  - "--accessLog.fields.headers.defaultMode=keep"
  - "--accessLog.fields.headers.names.Authorization=drop"
  - "--accessLog.fields.headers.names.Cookie=drop"
  - "--entrypoints.web.address=:8000"
  - "--entrypoints.websecure.address=:8443"
  - "--metrics.prometheus=true"
  - "--metrics.prometheus.entryPoint=metrics"
  - "--entryPoints.metrics.address=:9100"

ports:
  web:
    redirectTo:
      port: websecure
      priority: 10
  websecure:
    http3:
      enabled: true
    advertisedPort: 443 
    tls:
      enabled: true
  cockroachdb:
    port: 26257
  metrics:
    port: 9100
    expose:
      enabled: true
      port: 9100
      protocol: TCP
      entryPoint: metrics

The Problem:

In the Prometheus UI, the Traefik metrics endpoint appears as DOWN with the following error message:

http://5.5.176.214:9100/metrics  DOWN  
instance="5.5.176.214:9100" job="traefik"
Get "http://5.5.176.214:9100/metrics": context deadline exceeded

Troubleshooting Steps Taken:

Verified the Traefik Metrics Endpoint:

Attempted to curl the metrics endpoint from both inside and outside the cluster, but the request times out.

Checked Traefik Logs:

No errors related to metrics exposure were found in the logs.

Adjusted Prometheus Scrape Timeout:

Increased the scrape timeout to 30 seconds in the Prometheus config, but the issue persists.

Checked Network Policies and Firewall Settings:

Ensured that no network policies or firewall rules are blocking port 9100.

Checked Resource Usage:

Verified that the Traefik pod isn't resource-constrained (normal CPU and memory usage).

Restarted Traefik:

Tried restarting the Traefik pod, but the problem remains.

Request for Help:

I'm seeking advice on how to further troubleshoot and resolve this issue. Has anyone encountered similar problems with Prometheus scraping Traefik metrics? Are there any specific configurations or logs I should look into?

Thank you in advance for your help!

aloisiobilck · September 5, 2024, 7:49am

I have the same problem.
helm chart version 31.0.0.

Topic		Replies	Views
Can you help me getting traefik metrics ? (version 2.2.8) Traefik v2 metrics	9	4508	April 12, 2021
Monitor traefik with prometheus all on k8s Traefik v2 kubernetes-ingress , metrics , tracing	4	2015	October 31, 2021
Error from Traefik - DownstreamStatus":404,"OriginDuration":10790 Traefik v1 kubernetes-ingress , metrics	1	855	May 21, 2021
Prometheus metrics routing Traefik v2 kubernetes-ingress , metrics	2	1022	April 8, 2021
Exporting metrics to prometheus Traefik v2 consul-catalog , metrics	0	722	July 29, 2021

Issues with Prometheus/Traefik Metrics - "Context Deadline Exceeded" Error

Cluster Details:

The Problem:

Troubleshooting Steps Taken:

Request for Help:

Related topics