Metrics are not reflecting what is seen in the logs

hyperion · August 14, 2024, 7:35am

Hello, I am trying to get a better understanding of what is happening:
I got a sentry notification that I got a 503. I confirmed this by checking stackdriver logs, and yes I see a couple of 5XX errors within the last hour or so. When I query the traefik_service_requests_total on Grafana, I get no results back, here is the query I used:
traefik_service_requests_total{service=~"production.*", code=~"503"}

I read somewhere this could either be due to counters being reset when services are scaled down and scaled up AND querying the metrics will return counts for services that are currently live. To account for this, we should use rate/increase which tries to automatically adjust for resets in counters. docs

I changed my query to then use rate to see if it was able to detect the 503 error that I see in both sentry and the logs. I used a 3hr, 7hr, and 12hr time range within my rate query and it still returned a value of 0. Should the metrics accurately reflect the logs? Why is there a discrepancy between the 5XX we see in the logs vs what appears in the metrics? I am new to time series/analytics and may be misunderstanding how it works.

Thank you for your time!

Topic	Replies	Views
Service metrics does not show ratelimit Traefik v2 metrics	616	April 10, 2021
Question about traefik metric traefik_service_requests_total Traefik v2 ecs , metrics	359	January 11, 2023
RED metrics for traefik Traefik v2 metrics	351	February 17, 2022
Strange Prometheus metrics Traefik v3 (latest) metrics	174	August 28, 2024
Metric for error/warning count Traefik v2 metrics	284	June 22, 2022

Metrics are not reflecting what is seen in the logs

Related topics