Capture Traefik Metrics for Apps on Kubernetes with Prometheus

Monitoring distributed systems is one of the core precepts of site reliability engineering (SRE), as defined by Google. When Traefik is deployed as a Kubernetes ingress controller, it becomes an integral part of this practice.

This is the second in a series of blog posts on using Traefik to help enable SRE practices. The previous entry discussed how the Elastic stack of monitoring software can connect to Traefik to turn logs into visualizations and actionable intelligence. This installment explores how to use Prometheus and Grafana to derive similar insights from metrics generated by Traefik.

Prerequisites

As in the previous installment, if you want to follow along with this tutorial, you'll need to have a few things set up first.

  1. A Kubernetes cluster running at localhost. The Traefik Labs team often uses k3d for this purpose, which creates a local cluster in Docker containers. However, k3d comes bundles with the latest version of k3s, and k3s comes packaged with Traefik ver 1.7 and metrics-server. You'll want to disable both so that you can work with Prometheus and the latest version of Traefik:

    k3d cluster create dev -p "8081:80@loadbalancer" --k3s-server-arg --disable=traefik --k3s-server-arg --disable=metrics-server
    
  2. The kubectl command-line tool, configured to point to your cluster. (If you created your cluster using K3d and the instructions above, this will already be done for you.)

  3. A recent version of the Helm package manager for Kubernetes.

  4. The set of configuration files that accompany this article, which are available on GitHub:

    git clone https://github.com/traefik-tech-blog/traefik-sre-metrics/
    

You do not need to have Traefik 2.x preinstalled, as you'll do that in the next step.

Deploy Traefik

The easiest way to deploy Traefik on Kubernetes is to use the official Helm chart. Add Traefik to Helm's repositories using the below commands:

helm repo add traefik https://helm.traefik.io/traefik
helm repo update

Lastly, deploy the latest version of Traefik in the kube-system namespace. For this example, however, you'll want to ensure that Prometheus metrics are enabled in your cluster. This is done by passing Helm the --metrics.prometheus=true configuration flag, which you can do by applying the supplied traefik-values.yaml file when installing Traefik with Helm:

helm install traefik traefik/traefik -n kube-system -f ./traefik-values.yaml

You should also create a traefik-dashboard service for the traefik endpoint, which Prometheus will use to monitor the Traefik metrics:

kubectl apply -f traefik-dashboard-service.yaml

The Traefik Dashboard is not exposed by default, but you can make it accessible using port forwarding:

kubectl port-forward service/traefik-dashboard 9000:9000 -n kube-system

With the Traefik Dashboard accessible from your web browser, you should now see that Prometheus metrics are enabled in the "Features" section of the dashboard, which you can access at http://localhost:9000/dashboard/ (note the trailing slash in the URL, which is required):

Additionally, you can access the http://localhost:9000/metrics endpoint to see some generated metrics:

Deploy Prometheus Stack

The Prometheus metrics stack consists of number of components. Deployment of all these components, along with the required configuration, is beyond the scope of this blog. Instead, you'll use Prometheus Community Kubernetes Helm Charts to deploy the following components:

  • Prometheus Metrics server
  • Alert Manger
  • Metrics Exporter
  • Grafana

To add the repository:

helm repo add prometheus-community https://github.com/prometheus-community/helm-charts
helm repo update

The above repository provides many charts. To see the full list, you can use the search command:

helm search repo prometheus-community

From this list, you should install the kube-prometheus-stack chart, which will deploy the required components. (This process could take a few moments, so be patient.)

$ helm install prometheus-stack prometheus-community/kube-prometheus-stack
NAME: prometheus-stack
LAST DEPLOYED: Fri Jan 22 13:09:15 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace default get pods -l "release=prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

Configure Traefik Monitoring

The Prometheus custom resource definition (CRD) will be used to configure Traefik metrics. You also need to add a ServiceMonitor, which will be used to read the data.

# traefik-service-monitor.yaml
jobLabel: traefik-metrics
selector:
  matchLabels:
    app.kubernetes.io/instance: traefik
    app.kubernetes.io/name: traefik-dashboard
namespaceSelector:
  matchNames:
  - kube-system
endpoints:
- port: traefik
  path: /metrics

Per the above configuration, Prometheus will look at the /metrics endpoint of the traefik-dashboard service. The traefik-dashboard service is created in the kube-system namespace, while the ServiceMonitor is deployed in the default namespace:

kubectl apply -f traefik-service-monitor.yaml

Lets now validate whether Prometheus has started scraping Traefik metrics. The service should be available in the Prometheus Dashboard. To enable it, forward the 9090 port to localhost:9090:

kubectl port-forward service/prometheus-stack-kube-prom-prometheus 9090:9090

You can now open the Service Discovery dashboard:

This should show the default/traefik service. Details for the service should also be available under the Status > Targets view.

Configure Traefik Alerts

To see what else Prometheus can do you, can add a rule to raise alerts under matching conditions. Details about Prometheus rule expressions is beyond scope of the blog, but you can read more about it in the official documentation.

rules:
- alert: TooManyRequest
  expr: avg(traefik_entrypoint_open_connections{job="traefik-dashboard",namespace="kube-system"})
    > 5
  for: 1m
  labels:
    severity: critical

The above rule while will raise a TooManyRequest alert if there are more than 5 open requests for 1 minute. Go ahead and apply the rule:

kubectl apply -f traefik-rules.yaml

The Prometheus dashboard should show the newly created rule under Status > Rules:

Grafana Charts

Previously you deployed Grafana using the kube-prometheus-stack Helm chart. Now you can configure a dashboard for Traefik metrics. But first, you'll need to forward port 80 from the Grafana service to a local port, so you can reach it at :

kubectl port-forward service/prometheus-stack-grafana 10080:80

When you access the Grafana GUI at http://localhost:10080, it asks for a login and password. The default login username is admin and its password is prom-operator. The password can be read from the prometheus-operator-grafana Kubernetes secret.

Rather than building new Grafana dashboards from scratch, you can import them from Grafana's marketplace, which hosts community-created dashboards. Add a dashboard by navigating to Dashboards > Manage by clicking the four-square icon on the left navigation bar.

Click the Import button and input 11462 as the dashboard ID, which corresponds to the Traefik 2 dashboard contributed by user timoreymann.

After clicking Load, you should see the summary of the imported dashboard.

There is a dropdown at the bottom, select the Prometheus datasource and click Import to generate the following dashboard:

Deploy Application

Now that the cluster is working and metrics are being fed to Prometheus and Grafana, you'll need an application to monitor. For this, deploy the HTTPBin service, which provides many endpoints that can be used to generate different types of synthetic user traffic. The Service and the IngressRoute can be deployed using a single configuration file:

kubectl apply -f httpbin.yaml

The httpbin route will match the hostname for httpbin.local and forward the requests to the httpBin service. Lets lookup the service using curl command :

$ curl -I http://localhost:8081/  -H "host:httpbin.local"
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Length: 9593
Content-Type: text/html; charset=utf-8
Date: Sun, 10 Jan 2021 06:14:53 GMT
Server: gunicorn/19.9.0

As part of your setup, you attached k3d LoadBalancer port 80 to 8081. Thus, in the above command, the address to look up is localhost:8081.

Simulate User Traffic

You can hit HTTPBin with ab to generate some traffic. These requests will in turn generate metrics. Execute the following scripts:

ab -c 5 -n 10000  -m PATCH -H "host:httpbin.local" -H "accept: application/json" http://localhost:8081/patch
ab -c 5 -n 10000  -m GET -H "host:httpbin.local" -H "accept: application/json" http://localhost:8081/get
ab -c 5 -n 10000  -m POST -H "host:httpbin.local" -H "accept: application/json" http://localhost:8081/post

If you consult the Grafana dashboard after a while, you'll notice it shows rich information:

Some salient data points include:

  • Uptime
  • Average response time
  • Total requests
  • Request counts based on HTTP method and service.

The above information is provided by the community chart, but you can also generate more charts for your observability needs.

Alerts

Finally, as you simulate application traffic, Prometheus may raise alerts. For the example, if you generate a lots of traffic, the alert for TooManyRequest that you created earlier will be shown on the Alertmanager dashboard.

$ kubectl port-forward service/prometheus-stack-kube-prom-alertmanager 9093:9093
Forwarding from 127.0.0.1:9093 -> 9093

Wrap Up

The more visibility you have into the services running on your Kubernetes clusters, the better-equipped you will be to take action in the event of anomalous performance. In this tutorial, you've seen how easy it is to connect Traefik to Prometheus and Grafana to create visualizations from Traefik metrics.

As you familiarize yourself with these tools, you'll be able to create unique dashboards that expose the data points that are most critical for your environment.

The next entry in this series on SRE techniques will focus on application tracing with Traefik and Jaeger. Until then, if you'd like to learn more about gaining visibility and control over your Traefik instances, check out Traefik Pilot, our SaaS monitoring and management platform for Traefik.


This is a companion discussion topic for the original entry at https://traefik.io/blog/capture-traefik-metrics-for-apps-on-kubernetes-with-prometheus/
1 Like

Dear ! Thank you for your tutorial ! I had some issues when I tried to access the traefik metrics even after following your instructions. If you have a couple of minutes, could you give me some advice ?
I described my problem here: Can you help me getting traefik metrics ? (version 2.2.8)
Regards

thanks for your time, appreciate the intention but i'd like these core know-how sagas to be edited and checked by someone before gets in the front page of GOOG.

why ? among some others:

  • You're using a helm chart and deploying a dashboard service manually - why? - it's in the helm chart at ports:
  • The Prometheus custom resource definition (CRD) will be used to configure Traefik metrics. #saywhat?
  • Deploying the service monitor in default namespace while traefik is in kube-system namespace. Credibility completly shattered. Who does that?
    We're in year 7 of k8s, best practices are written litteraly all over the place.
  • you have a typo in the service discovery @ prometheus screenshot - default/treafik
  • also one at TooManyRequests instead of TooManyRequest

Hello @jbtruffault

Thanks for using Traefik and trying to configuring Traefik with Prometheus. I wonder, have you already managed the issue you reported. I have noticed lots of values already provided in the answers but I would like to make sure that you were able to go through the tutorial and complete it successfully.

Please let me know,
Thank you,

Dear,
I succeeded to link Prometheus and Traefik and to get metrics.
But, I don't like the way it's done: I had to expose a TCP port to the internet where I only want it to be internly used.

Maybe you have an idea about exposing a TCP port only to my cluster ?

Regards

Hello @jbtruffault

Can you please share your configuration files? You don't have anything externally. The entire network communication is managed within a cluster.

You can refer to the example I used during my laster workshop:

Hope that helps,

Jakub