Monitoring distributed systems is one of the core precepts of site reliability engineering (SRE), as defined by Google. When Traefik is deployed as a Kubernetes ingress controller, it becomes an integral part of this practice.
This is the second in a series of blog posts on using Traefik to help enable SRE practices. The previous entry discussed how the Elastic stack of monitoring software can connect to Traefik to turn logs into visualizations and actionable intelligence. This installment explores how to use Prometheus and Grafana to derive similar insights from metrics generated by Traefik.
Prerequisites
As in the previous installment, if you want to follow along with this tutorial, you'll need to have a few things set up first.
-
A Kubernetes cluster running at
localhost
. The Traefik Labs team often uses k3d for this purpose, which creates a local cluster in Docker containers. However, k3d comes bundles with the latest version of k3s, andk3s
comes packaged with Traefik ver 1.7 and metrics-server. You'll want to disable both so that you can work with Prometheus and the latest version of Traefik:k3d cluster create dev -p "8081:80@loadbalancer" --k3s-server-arg --disable=traefik --k3s-server-arg --disable=metrics-server
-
The
kubectl
command-line tool, configured to point to your cluster. (If you created your cluster using K3d and the instructions above, this will already be done for you.) -
A recent version of the Helm package manager for Kubernetes.
-
The set of configuration files that accompany this article, which are available on GitHub:
git clone https://github.com/traefik-tech-blog/traefik-sre-metrics/
You do not need to have Traefik 2.x preinstalled, as you'll do that in the next step.
Deploy Traefik
The easiest way to deploy Traefik on Kubernetes is to use the official Helm chart. Add Traefik to Helm's repositories using the below commands:
helm repo add traefik https://helm.traefik.io/traefik
helm repo update
Lastly, deploy the latest version of Traefik in the kube-system
namespace. For this example, however, you'll want to ensure that Prometheus metrics are enabled in your cluster. This is done by passing Helm the --metrics.prometheus=true
configuration flag, which you can do by applying the supplied traefik-values.yaml
file when installing Traefik with Helm:
helm install traefik traefik/traefik -n kube-system -f ./traefik-values.yaml
You should also create a traefik-dashboard
service for the traefik
endpoint, which Prometheus will use to monitor the Traefik metrics:
kubectl apply -f traefik-dashboard-service.yaml
The Traefik Dashboard is not exposed by default, but you can make it accessible using port forwarding:
kubectl port-forward service/traefik-dashboard 9000:9000 -n kube-system
With the Traefik Dashboard accessible from your web browser, you should now see that Prometheus metrics are enabled in the "Features" section of the dashboard, which you can access at http://localhost:9000/dashboard/ (note the trailing slash in the URL, which is required):
Additionally, you can access the http://localhost:9000/metrics endpoint to see some generated metrics:
Deploy Prometheus Stack
The Prometheus metrics stack consists of number of components. Deployment of all these components, along with the required configuration, is beyond the scope of this blog. Instead, you'll use Prometheus Community Kubernetes Helm Charts to deploy the following components:
- Prometheus Metrics server
- Alert Manger
- Metrics Exporter
- Grafana
To add the repository:
helm repo add prometheus-community https://github.com/prometheus-community/helm-charts
helm repo update
The above repository provides many charts. To see the full list, you can use the search
command:
helm search repo prometheus-community
From this list, you should install the kube-prometheus-stack
chart, which will deploy the required components. (This process could take a few moments, so be patient.)
$ helm install prometheus-stack prometheus-community/kube-prometheus-stack
NAME: prometheus-stack
LAST DEPLOYED: Fri Jan 22 13:09:15 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace default get pods -l "release=prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
Configure Traefik Monitoring
The Prometheus custom resource definition (CRD) will be used to configure Traefik metrics. You also need to add a ServiceMonitor, which will be used to read the data.
# traefik-service-monitor.yaml
jobLabel: traefik-metrics
selector:
matchLabels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/name: traefik-dashboard
namespaceSelector:
matchNames:
- kube-system
endpoints:
- port: traefik
path: /metrics
Per the above configuration, Prometheus will look at the /metrics
endpoint of the traefik-dashboard
service. The traefik-dashboard
service is created in the kube-system
namespace, while the ServiceMonitor is deployed in the default
namespace:
kubectl apply -f traefik-service-monitor.yaml
Lets now validate whether Prometheus has started scraping Traefik metrics. The service should be available in the Prometheus Dashboard. To enable it, forward the 9090
port to localhost:9090
:
kubectl port-forward service/prometheus-stack-kube-prom-prometheus 9090:9090
You can now open the Service Discovery dashboard:
This should show the default/traefik
service. Details for the service should also be available under the Status > Targets
view.
Configure Traefik Alerts
To see what else Prometheus can do you, can add a rule to raise alerts under matching conditions. Details about Prometheus rule expressions is beyond scope of the blog, but you can read more about it in the official documentation.
rules:
- alert: TooManyRequest
expr: avg(traefik_entrypoint_open_connections{job="traefik-dashboard",namespace="kube-system"})
> 5
for: 1m
labels:
severity: critical
The above rule while will raise a TooManyRequest
alert if there are more than 5 open requests for 1 minute. Go ahead and apply the rule:
kubectl apply -f traefik-rules.yaml
The Prometheus dashboard should show the newly created rule under Status > Rules
:
Grafana Charts
Previously you deployed Grafana using the kube-prometheus-stack
Helm chart. Now you can configure a dashboard for Traefik metrics. But first, you'll need to forward port 80
from the Grafana service to a local port, so you can reach it at :
kubectl port-forward service/prometheus-stack-grafana 10080:80
When you access the Grafana GUI at http://localhost:10080, it asks for a login and password. The default login username is admin
and its password is prom-operator
. The password can be read from the prometheus-operator-grafana
Kubernetes secret.
Rather than building new Grafana dashboards from scratch, you can import them from Grafana's marketplace, which hosts community-created dashboards. Add a dashboard by navigating to Dashboards > Manage
by clicking the four-square icon on the left navigation bar.
Click the Import button and input 11462
as the dashboard ID, which corresponds to the Traefik 2 dashboard contributed by user timoreymann
.
After clicking Load, you should see the summary of the imported dashboard.
There is a dropdown at the bottom, select the Prometheus
datasource and click Import to generate the following dashboard:
Deploy Application
Now that the cluster is working and metrics are being fed to Prometheus and Grafana, you'll need an application to monitor. For this, deploy the HTTPBin service, which provides many endpoints that can be used to generate different types of synthetic user traffic. The Service and the IngressRoute can be deployed using a single configuration file:
kubectl apply -f httpbin.yaml
The httpbin
route will match the hostname for httpbin.local
and forward the requests to the httpBin
service. Lets lookup the service using curl
command :
$ curl -I http://localhost:8081/ -H "host:httpbin.local"
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Length: 9593
Content-Type: text/html; charset=utf-8
Date: Sun, 10 Jan 2021 06:14:53 GMT
Server: gunicorn/19.9.0
As part of your setup, you attached k3d
LoadBalancer port 80
to 8081
. Thus, in the above command, the address to look up is localhost:8081
.
Simulate User Traffic
You can hit HTTPBin with ab
to generate some traffic. These requests will in turn generate metrics. Execute the following scripts:
ab -c 5 -n 10000 -m PATCH -H "host:httpbin.local" -H "accept: application/json" http://localhost:8081/patch
ab -c 5 -n 10000 -m GET -H "host:httpbin.local" -H "accept: application/json" http://localhost:8081/get
ab -c 5 -n 10000 -m POST -H "host:httpbin.local" -H "accept: application/json" http://localhost:8081/post
If you consult the Grafana dashboard after a while, you'll notice it shows rich information:
Some salient data points include:
- Uptime
- Average response time
- Total requests
- Request counts based on HTTP method and service.
The above information is provided by the community chart, but you can also generate more charts for your observability needs.
Alerts
Finally, as you simulate application traffic, Prometheus may raise alerts. For the example, if you generate a lots of traffic, the alert for TooManyRequest
that you created earlier will be shown on the Alertmanager dashboard.
$ kubectl port-forward service/prometheus-stack-kube-prom-alertmanager 9093:9093
Forwarding from 127.0.0.1:9093 -> 9093
Wrap Up
The more visibility you have into the services running on your Kubernetes clusters, the better-equipped you will be to take action in the event of anomalous performance. In this tutorial, you've seen how easy it is to connect Traefik to Prometheus and Grafana to create visualizations from Traefik metrics.
As you familiarize yourself with these tools, you'll be able to create unique dashboards that expose the data points that are most critical for your environment.
The next entry in this series on SRE techniques will focus on application tracing with Traefik and Jaeger. Until then, if you'd like to learn more about gaining visibility and control over your Traefik instances, check out Traefik Pilot, our SaaS monitoring and management platform for Traefik.
This is a companion discussion topic for the original entry at https://traefik.io/blog/capture-traefik-metrics-for-apps-on-kubernetes-with-prometheus/