Learn how to install the Kubernetes metrics server to collect CPU and memory usage metrics for autoscaling and how to differentiate it from kube-state-metrics.

Kubernetes Metrics Server

Monitoring and managing resources is an important daily task for Kubernetes administrators. The Kubernetes Metrics Server is a lightweight add-on that collects CPU and memory metrics, which are helpful for understanding which applications are taking up the most resources. It is not meant to be a stand-alone tool, but instead it is meant to enable the cluster auto-scalers .

The Kubernetes Metrics Server collects metrics on how much CPU and memory a pod uses and makes them available via the API. The Kubernetes API server exposes these metrics, which are used by the Kubernetes controllers to make pod scaling work. It is required by the Horizontal Pod Autoscaler (HPA) and the Vertical Pod Autoscaler (VPA) to adjust to your application’s demands automatically.

This article discusses the Kubernetes Metrics Server and the benefits of installing it. Note that the Kubernetes Metrics Server differs from the also popular kube-state-metrics add-on, and we’ll discuss some of the differences between the two in this article.

Summary of key Kubernetes Metrics Server and autoscaling concepts

Concept Description
What is the Kubernetes Metrics Server? The Kubernetes Metrics Server exposes near real-time CPU and memory utilization via an API that can be consumed by autoscalers.
The Horizontal Pod Autoscaler (HPA) The HPA uses data from the Metrics Server to scale an application by increasing or decreasing the number of replicas to maintain a defined utilization of CPU and memory.
The Vertical Pod Autoscaler (VPA) The VPA uses data from the Metrics Server to scale an application by increasing or decreasing the values of requests and limits.
Kubernetes Metrics Server vs. kube-state-metrics These two similarly named Kubernetes projects are often confused with each other. The Kubernetes Metrics Server is meant for near real-time status, while kube-state-metrics is meant more for long-term monitoring and consumption through Prometheus.

What is the Kubernetes Metrics Server?

The primary role of the Kubernetes Metrics Server is to make CPU and memory metrics available via the API so they can be used by the autoscalers. In addition to enabling autoscaling, the Metrics Server provides a lightweight, real-time view of what pods consume which resources via the kubectl top command.

The Metrics Server is a lightweight deployment that collects metrics every fifteen seconds. As a result, scaling quickly responds to resource demands and is not a resource burden to the cluster overall.

Since the main purpose of the Metrics Server is to inform the HPA and VPA, only the latest polling of the metrics is kept. This means that it is not suitable for forwarding metrics to third parties. The kube-state-metrics add-on is better suited for this, which we’ll discuss later in this article.

Pod autoscalers

The Metrics Server enables two types of scaling: using the Horizontal Pod Autoscaler (HPA) and the Vertical Pod Autoscaler (VPA). The HPA functionality is built into the cluster, while the VPA needs to be installed before use.

Both the HPA and the VPA adjust the resources available to an application, but they do it differently. We’ll explain them briefly here and then go further into the details when we show how to install and configure them.

The HPA scales the number of pods running your application and adjusts the replica count for resources like a deployment or a replica set. For example, you can define settings to start your deployment with five replicas—five pods running your application. Based on resource demands, the HPA can scale up to a predefined number of pods automatically and can also reduce the number of pods once demands have died down.

The VPA scales the resources allocated to the pod within a deployment or replica set. For example, if you set the requests to 50 millicores and 1Gi of memory, the VPA could periodically adjust those resources to 100 millicores and 2Gi of memory. Like the HPA, the VPA can also scale down as demand decreases.

Comprehensive Kubernetes cost monitoring & optimization

Installing the Kubernetes Metrics Server

Now that we know how the Kubernetes Metrics Server integrates into your cluster, let’s look at installing it. We will be using minikube to work through the demos in this article. If you are unfamiliar with or need to install minikube, you can follow the steps here and return to the article when it is up and running.

Set up a test cluster

Let’s create a local testing cluster using minikube. First, start minikube on your local machine:

$ minikube start --memory 8000 --cpus 2 --nodes 2

After a few minutes, verify that your cluster is ready:

$ kubectl get nodes
NAME           STATUS   ROLES           AGE     VERSION
minikube       Ready    control-plane   2m16s   v1.27.4
minikube-m02   Ready    <none>          38s     v1.27.4

Now let’s have a look at what happens if you try to run kubectl top:

$ kubectl top nodes
error: Metrics API not available

As anticipated, this command fails. We need to install the Metrics Server first.

Install the Kubernetes Metrics Server

The easiest way to install the Metrics Server is by applying the official manifest file to your cluster. Another way is to use the official Helm chart, which gives you more control over server deployment. If you are familiar with Helm charts, then the easiest way to install is via the Helm chart; otherwise, using the official manifest file is straightforward.

The example below installs the Metrics Server via the official manifest file:

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

This next step patches the deployment to accept insecure certificates. This is needed because we are running a test cluster. In production, you should ensure that the Kubelet certificates are signed by the cluster CA:

$ kubectl -n kube-system patch deployment/metrics-server --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

After a few seconds, the deployment will restart. Verify that the kubectl top command works as follows:

$ kubectl top nodes   
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
minikube       100m         10%     1552Mi          20%       

Use Kubectl top to view usage

Once the metrics server is installed, you can view real-time usage for the top pods and nodes using the kubectl top sub-command. Here are some examples.

View the usage for the top nodes:

$ kubectl top nodes   
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
minikube       155m         7%     1442Mi          18%       
minikube-m02   72m          3%     1039Mi          13% 

View the usage for the top pods across all namespaces:

$ kubectl top pods -A
NAMESPACE     NAME                               CPU(cores)   MEMORY(bytes)   
kube-system   coredns-7db6d8ff4d-ks4lr           4m           12Mi            
kube-system   etcd-minikube                      51m          34Mi            
kube-system   kindnet-bfl8s                      1m           7Mi             
kube-system   kindnet-rg6kv                      1m           7Mi             
kube-system   kube-apiserver-minikube            118m         184Mi           
kube-system   kube-controller-manager-minikube   35m          41Mi            
kube-system   kube-proxy-hkrml                   1m           10Mi            
kube-system   kube-proxy-splqc                   1m           10Mi            
kube-system   kube-scheduler-minikube            5m           14Mi            
kube-system   metrics-server-d994c478f-mzkd2     17m          16Mi            
kube-system   storage-provisioner                4m           7Mi  

You can also break down usage at the container level:

kubectl top pods -A --containers
NAMESPACE     POD                                NAME                      CPU(cores)   MEMORY(bytes)   
kube-system   coredns-7db6d8ff4d-ks4lr           coredns                   5m           12Mi            
kube-system   etcd-minikube                      etcd                      51m          34Mi            
kube-system   kindnet-bfl8s                      kindnet-cni               1m           7Mi             
kube-system   kindnet-rg6kv                      kindnet-cni               1m           7Mi             
kube-system   kube-apiserver-minikube            kube-apiserver            149m         184Mi           
kube-system   kube-controller-manager-minikube   kube-controller-manager   40m          41Mi            
kube-system   kube-proxy-hkrml                   kube-proxy                1m           10Mi            
kube-system   kube-proxy-splqc                   kube-proxy                1m           10Mi            
kube-system   kube-scheduler-minikube            kube-scheduler            7m           14Mi            
kube-system   metrics-server-d994c478f-mzkd2     metrics-server            11m          16Mi            
kube-system   storage-provisioner                storage-provisioner       4m           7Mi
K8s clusters handling 10B daily API calls use Kubecost

Because top is a subcommand, you can filter based on labels like the rest of the kubectl commands.

Get the raw metrics

It is possible to get the raw metrics from the metrics server using this command for the nodes:

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{},"items":[{"metadata":{"name":"minikube","creationTimestamp":"2024-02-23T08:30:26Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"minikube","kubernetes.io/os":"linux","minikube.k8s.io/commit":"fd7ecd9c4599bef9f04c0986c4a0187f98a4396e","minikube.k8s.io/name":"minikube","minikube.k8s.io/primary":"true","minikube.k8s.io/updated_at":"2024_02_22T19_41_45_0700","minikube.k8s.io/version":"v1.31.2","node-role.kubernetes.io/control-plane":"","node.kubernetes.io/exclude-from-external-load-balancers":""}},"timestamp":"2024-02-23T08:30:13Z","window":"22.119s","usage":{"cpu":"75613805n","memory":"1473008Ki"}},{"metadata":{"name":"minikube-m02","creationTimestamp":"2024-02-23T08:30:26Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"minikube-m02","kubernetes.io/os":"linux"}},"timestamp":"2024-02-23T08:30:07Z","window":"11.033s","usage":{"cpu":"23841678n","memory":"991528Ki"}}]}

Here is a similar command for the pods:

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
{"kind":"PodMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{},"items":[{"metadata":{"name":"coredns-5d78c9869d-vd5hx","namespace":"kube-system","creationTimestamp":"2024-02-23T08:30:29Z","labels":{"k8s-app":"kube-dns","pod-template-hash":"5d78c9869d"}},"timestamp":"2024-02-23T08:30:15Z","window":"14.505s","containers":[{"name":"coredns","usage":{"cpu":"673303n","memory":"12432Ki"}}]}, etc.

You can run the following if you’d like to display the raw metrics in a nicer format:

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods | python -m json.tool

Install and configure the autoscalers

Now that the Metrics Server is installed and we have verified that it works, let’s look at configuring the autoscalers.

For these examples, we will start with the following example deployment, which starts with one replica and sets the CPU requests and limits to 100m and 200m, respectively:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: example-container
        image: nginx
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 200m

Save the example as example-demo.yaml and apply it using the following command:

kubectl apply -f example-demo.yaml

Horizontal Pod Autoscaler

Shown below is an example of a configuration for an HPA. In this example, we are setting a minimum number of two replicas for the example deployment and a maximum of 10.

The targetCPUUtilizationPercentage value dictates that when utilization goes over 50% compared to the requested CPU level, additional pods will be deployed to try to maintain less than 50% utilization across all pods.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

For this example, you can manually scale the deployment to one or three and watch the HPA scale the number of pods to two.

Learn how to manage K8s costs via the Kubecost APIs

The deployment scales to three replicas and then executes a watch command, which shows the HPA scaling down the deployment back to two since it does not need three pods to maintain CPU utilization:

kubectl scale --replicas=3 deployment/example-deployment && watch kubectl get pods

Refer to the Kubernetes documentation for a full breakdown of how you can scale your pods using the HPA.

Vertical Pod Autoscaler

The VPA adjusts the request and limits of the pod as opposed to the number of pods. You end up with a pod that consumes more resources instead of having more pods. This is helpful if your application is not set up to run across multiple pods, but be aware that your pod will be restarted when resources need to be updated.

The VPA is not installed by default, but installation is easy. Per its GitHub, you install the VPA using the following steps:

git clone https://github.com/kubernetes/autoscaler.git
./autoscaler/vertical-pod-autoscaler/hack/vpa-up.sh 

After installation, you should have three new deployments correlated to the VPA:

kubectl get deployments -n kube-system  | grep vpa
vpa-admission-controller   1/1     1            1           11h
vpa-recommender            1/1     1            1           11h
vpa-updater                1/1     1            1           11h

These three deployments correlate to three components of the VPA:

  • Admission controller: Sets the updated resource requests on new pods as they are created.
  • Recommender: Monitors current and past usage and recommends CPU and memory requests.
  • Updater: Verifies that the requests values are updated, and if not, restarts the pod by first terminating it.

Here is an example of a VPA configuration that will automatically update the deployment called example-deployment:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: example-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       example-deployment
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: example-container
      minAllowed:
        cpu: 50m
        memory: 100Mi
      maxAllowed:
        cpu: 500m
        memory: 1Gi
      controlledValues: "RequestsAndLimits"

Additionally, in this example we see that we can be specific about how we allocate resources down to the container level via the containerPolicies section.

Other resource management tools

As this article shows, the Kubernetes metric server is primarily used to enable the HPA and the VPA. It is not meant for long-term resource monitoring. There are better tools for that.

Sometimes, the Kubernetes Metrics Server can be confused with kube-state-metrics add-on. On the outside, they both provide metrics on resource usage. The kube-state-metrics service scrapes the Kubernetes API and exposes metrics in a way that a tool such as Prometheus can consume. Unlike the Metrics Server, it is intended for a third party to consume.

Using Prometheus, you can track your resource usage over a historical period. This data can help you better understand how you are allocating your resources.

When you want to tie those resources to costs, Kubecost can help you identify which deployments or resources are costing you the most money and recommend right-sizing and overall resource optimization. As you automate the scaling of your cluster through the HPA and VPA, you will gain insight into how your costs are fluctuating.

Conclusion

A key benefit of running containerized applications on a platform like Kubernetes is the ability to autoscale your application to meet its demands. This article covered the Kubernetes Metrics Server, which enables both the HPA and VPA to scale your application.

You want to scale your application automatically to respond to demands and continue to serve requests, but you don’t want to let something run automatically without some checks and balances. Using the HPA and VPA with Kubecost combines the ability to scale automatically with insight into how much it costs you over time. This combination provides a complete picture of automation and cost monitoring for your application.

The Kubernetes metrics server is essential as it is the key technology enabling auto-scaling for your pods.

Comprehensive Kubernetes cost monitoring & optimization

Continue reading this series