Learn Kubernetes VPA’s functionality and limitations with demo examples and understand how to use it with other Kubernetes autoscaling methods.
🎉 Kubecost 2.0 is here! Learn more about the massive new feature additions and predictive learning

Chapter 1:

Kubernetes VPA

Kubernetes Vertical Pod Autoscaler (VPA) is an autoscaler that enables automatic CPU and memory request and limit adjustments based on historical resource usage measurements. When used correctly, it can help you efficiently and automatically allocate the resources of a Kubernetes cluster—at the container level.

Of course, as with any technology in the world of Kubernetes (K8s), understanding how VPA works and precisely what it does from a technical perspective allows you to implement it effectively.

This article will present the three types of autoscalers and then explore VPA’s usage and benefits in detail—so you can hit the ground running with Kubernetes Vertical Pod Autoscaler!

The three types of Kubernetes autoscalers

There are three types of K8s autoscalers, each serving a different purpose. They are:

  1. Horizontal Pod Autoscaler (HPA): adjusts the number of replicas of an application. HPA scales the number of pods in a replication controller, deployment, replica set, or stateful set based on CPU utilization. HPA can also be configured to make scaling decisions based on custom or external metrics.
  2. Cluster Autoscaler (CA): adjusts the number of nodes in a cluster. The Cluster Autoscaler automatically adds or removes nodes in a cluster when nodes have insufficient resources to run a pod (adds a node) or when a node remains underutilized, and its pods can be assigned to another node (removes a node).
  3. Vertical Pod Autoscaler (VPA): adjusts the resource requests and limits (which we’ll define in this article) of containers in the cluster.

What is Kubernetes VPA?

Kubernetes Vertical Pod Autoscaler (VPA) is a component you install in your cluster. It increases and decreases container CPU and memory resource configuration to align cluster resource allotment with actual usage.

Next, we’ll review some important VPA concepts.

Kubernetes VPA resource configuration types

With VPA, there are two different types of resource configurations that we can manage on each container of a pod:

  1. Requests
  2. Limits

What is a request?

Requests define the minimum amount of resources that containers need. For example, an application can use more than 256MB of memory, but Kubernetes will guarantee a minimum of 256MB to the container if its request is 256MB of memory.

What are limits?

Limits define the maximum amount of resources that a given container can consume. Your application might require at least 256MB of memory, but you might want to ensure that it doesn't consume more than 512MB of memory, i.e., to limit its memory consumption to 512MB

Kubernetes VPA vs. HPA

Fundamentally, the difference between VPA and HPA lies in how they scale. HPA scales by adding or removing pods—thus scaling capacity horizontally. VPA, however, scales by increasing or decreasing CPU and memory resources within the existing pod containers—thus scaling capacity vertically. The table below explains the differences between Kubernetes VPA and HPA in more detail.

CAPACITY ADJUSTMENT DESIRED HORIZONTAL SCALING (HPA) VERTICAL SCALING (VPA)
More resources Add more pods Increase CPU or memory resources of existing pod containers
Less resources Remove pods Decrease CPU or memory resources of existing pod containers

Refer to the below diagram to understand how VPA works:

A visual explanation of the VPA functionality.

Components of VPA

A VPA deployment has three main components: VPA Recommender, VPA Updater, and VPA Admission Controller. Let’s take a look at what each component does.

The VPA Recommender:

  • Monitors resource utilization and computes target values.
  • Looks at the metric history, OOM events, and the VPA deployment spec and suggests fair requests. The limits are raised/lowered based on the limits-requests proportion defined.

The VPA Updater:

  • Evicts those pods that need the new resource limits.
  • Implements whatever the Recommender recommends if “updateMode: Auto“ is defined.

The VPA Admission Controller:

  • Changes the CPU and memory settings (using a webhook) before a new pod starts whenever the VPA Updater evicts and restarts a pod.
  • Evicts a pod if it needs to change the pod's resource requests when the Vertical Pod Autoscaler is set with an updateMode of "Auto.” Due to the design of Kubernetes, the only way to modify the resource requests of a running pod is to recreate the pod.

Comprehensive Kubernetes cost monitoring & optimization

How does Kubernetes VPA work?

Now that we’ve defined the components of VPA, let’s explore how they work together in practice.

The diagram below provides a practical example of how Kubernetes VPA works and is followed by a numbered explanation of each step.

How Kubernetes VPA allocates resources.

Let’s walk through exactly what’s happening in the diagram:

  1. The user configures VPA.
  2. VPA Recommender reads the VPA configuration and the resource utilization metrics from the metric server.
  3. VPA Recommender provides pod resource recommendations.
  4. VPA Updater reads the pod resource recommendations.
  5. VPA Updater initiates the pod termination.
  6. The deployment realizes the pod was terminated and will recreate the pod to match its replica configuration.
  7. When the pod is in the recreation process, the VPA Admission Controller gets the pod resource recommendation. Since Kubernetes does not support dynamically changing the resource limits of a running pod, VPA cannot update existing pods with new limits. It terminates pods that are using outdated limits. When the pod’s controller requests the replacement from the Kubernetes API service, the VPA Admission Controller injects the updated resource request and limit values into the new pod’s specification.
  8. Finally, the VPA Admission Controller overwrites the recommendations to the pod. In our example, the VPA admission controller adds a “250m” CPU to the pod.

Note:

We can also run VPA in recommendation mode. In this mode, the VPA Recommender will update the status field of the workload’s Vertical Pod Autoscaler resource with its suggested values, but will not terminate pods or alter pod API requests.

Limitations of Kubernetes VPA

VPA is useful in many applications, but there are several important limitations to keep in mind.

  • Do not use Vertical Pod Autoscaler with the Horizontal Pod Autoscaler, which scales based on the same resource metrics such as CPU and MEMORY usage. This is because when a metric (CPU/MEMORY) reaches its defined threshold, the scaling event will happen for both VPA and HPA at the same time, which may have unknown side effects and may lead to issues.
  • VPA might recommend more resources than available in the cluster, thus causing the pod to not be assigned to a node (due to insufficient resources) and therefore never run. To overcome this limitation, it’s a good idea to set the LimitRange to the maximum available resources. This will ensure that pods do not ask for more resources than the LimitRange defines.

EKS Example: How to configure VPA

Now that we’ve reviewed VPA concepts, let’s look at a real-world example of how to install and use VPA. In this section, we’ll walk through a VPA deployment on Amazon Elastic Kubernetes Service (Amazon EKS) by following these high-level steps:

  1. Create an EKS cluster
  2. Install the metrics server
  3. Install the VPA
  4. Demo: example of VPA

Create an EKS Cluster:

To begin, we create an EKS cluster on AWS. There are multiple ways of doing this, but in this article, we will use “eksctl”, a simple CLI tool that AWS recommends. To learn more about “eksctl”, refer to the official eskctl website

Make sure you have the active AWS account configured in your local workstation/laptop. If not, please refer to this AWS doc. Once you have your account configured, create the below file and run the below command to create the EKS cluster:

$ cat eks.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: us-east-1
  version: "1.20"
availabilityZones:
- us-east-1a
- us-east-1b
managedNodeGroups:
- name: general
  labels:
    role: general
  instanceType: t3.medium
  minSize: 1
  maxSize: 10
  desiredCapacity: 1
  volumeSize: 20

Create the cluster:

$ eksctl create cluster -f eks.yaml

Verify that you can connect to the cluster:

$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1  443/TCP 13m

Install the metrics server

Now we have the EKS cluster; the next step is to install the metrics server on it. We can confirm whether it is already installed by running the below commands:

$ kubectl get apiservice | grep -i metrics

If there is no output, we don't have a metrics server configured in our EKS cluster. We also can use the below command to see if we have metrics available:

$ kubectl top pods -n kube-system
error: Metrics API not available

Let’s install the metrics server. Clone the below repo:

$ git clone --branch v1.0.0 git@github.com:nonai/k8s-example-files.git

Apply the changes on the entire files as shown below:

$ kubectl apply -f .
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

Verify the deployment:

$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-8g4wk 1/1 Running 0 29m
coredns-86d9946576-g49sk 1/1 Running 0 38m
coredns-86d9946576-kxw4h 1/1 Running 0 38m
kube-proxy-64gjd 1/1 Running 0 29m
metrics-server-9f459d97b-j4mnr 1/1 Running 0 117s

List API services and check for metrics server:

$ kubectl get apiservice |grep -i metrics
v1beta1.metrics.k8s.io kube-system/metrics-server True 2m26s

List services in the kube-system namespace:

$ kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.100.0.10  53/UDP,53/TCP 40m
metrics-server ClusterIP 10.100.152.164  443/TCP 2m58s

We can access metrics API directly:

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1 | jq

Use kubectl to get metrics:

$ kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
aws-node-8g4wk 4m 40Mi
coredns-86d9946576-g49sk 2m 8Mi
coredns-86d9946576-kxw4h 2m 8Mi
kube-proxy-64gjd 1m 11Mi
metrics-server-9f459d97b-j4mnr 3m 17Mi
Kubecost + StormForge = Automated K8s Cost Optimization

Install the VPA

Now that we have created the EKS cluster and deployed the metrics server onto the cluster, let’s create the VPA.

Clone the below repository, check out the specific commit (which is used for this tutorial), and change the directory to “autoscaler/vertical-pod-autoscaler.”

$ git clone https://github.com/kubernetes/autoscaler.git

$ git checkout bb860357f691313fca499e973a5241747c2e38b2

$ cd autoscaler/vertical-pod-autoscaler

We can preview installation by using the below command:

./hack/vpa-process-yamls.sh print

Then, install the VPA:

./hack/vpa-up.sh
customresourcedefinition.apiextensions.k8s.io/verticalpodautoscalercheckpoints.autoscaling.k8s.io created
customresourcedefinition.apiextensions.k8s.io/verticalpodautoscalers.autoscaling.k8s.io created
clusterrole.rbac.authorization.k8s.io/system:metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:vpa-actor created
clusterrole.rbac.authorization.k8s.io/system:vpa-checkpoint-actor created
clusterrole.rbac.authorization.k8s.io/system:evictioner created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-actor created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-checkpoint-actor created
clusterrole.rbac.authorization.k8s.io/system:vpa-target-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-target-reader-binding created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-evictionter-binding created
serviceaccount/vpa-admission-controller created
clusterrole.rbac.authorization.k8s.io/system:vpa-admission-controller created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-admission-controller created
clusterrole.rbac.authorization.k8s.io/system:vpa-status-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-status-reader-binding created
serviceaccount/vpa-updater created
deployment.apps/vpa-updater created
serviceaccount/vpa-recommender created
deployment.apps/vpa-recommender created
Generating certs for the VPA Admission Controller in /tmp/vpa-certs.
Generating RSA private key, 2048 bit long modulus (2 primes)
........................+++++
.................................+++++
e is 65537 (0x010001)
Generating RSA private key, 2048 bit long modulus (2 primes)
.....................................................+++++
..........+++++
e is 65537 (0x010001)
Signature ok
subject=CN = vpa-webhook.kube-system.svc
Getting CA Private Key
Uploading certs to the cluster.
secret/vpa-tls-certs created
Deleting /tmp/vpa-certs.
deployment.apps/vpa-admission-controller created
service/vpa-webhook created

Check the status of the pods (you should see some VPA related pods running):

$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-8g4wk 1/1 Running 0 42m
coredns-86d9946576-g49sk 1/1 Running 0 51m
coredns-86d9946576-kxw4h 1/1 Running 0 51m
kube-proxy-64gjd 1/1 Running 0 42m
metrics-server-9f459d97b-j4mnr 1/1 Running 0 14m
vpa-admission-controller-6cd546c4f-qsf82 1/1 Running 0 3m11s
vpa-recommender-6855ff754-6bb4g 1/1 Running 0 3m15s
vpa-updater-9fd7bfbd5-pvfrj 1/1 Running 0 3m18s

Demo: example of VPA

With everything configured, let’s take a basic application and deploy it on the cluster we just built. We’ll focus only on CPU usage metrics for this demo and scale the cluster based on the VPA recommendation.

Since we already have cloned the repository, just change the directory to “kubernetes-tutorials/src/master/001/vpa-demo/”:

$ cd kubernetes-tutorials/src/master/001/vpa-demo/

$ ls -1
deployment.yaml
vpa.yaml

Here, we have two files:

deployment.yaml # ---> will have a config for the application.
vpa.yaml # ---> will contain the config for the VPA.

We are going to perform these steps to test the VPA:

  1. Deploy the sample application with the CPU 100m
  2. Allowing the pod to run at least 5 mins and check its CPU usage
  3. Check the VPA recommendation
  4. Manually update the CPU to 200m
  5. Apply the changes
  6. Check the status of the pod

Deploy the sample application with the CPU 100m:

$ kubectl apply -f vpa-demo
deployment.apps/hamster created
verticalpodautoscaler.autoscaling.k8s.io/hamster-vpa created

Allow the pod to run at least 5 mins and check its CPU usage:

$ kubectl get vpa
NAME MODE CPU MEM PROVIDED AGE
hamster-vpa Off 271m 262144k True 5m10s

Check the VPA recommendation:

$kubectl describe vpa hamster-vpa

Sample output:

K8s clusters handling 10B daily API calls use Kubecost

Here’s what we can see:

  • The lower bound is the minimum estimation for the container.
  • The upper bound is the maximum recommended resource estimation for the container.
  • Target estimation is the one we will use for setting resource requests.
  • All of these estimations are capped based on min allowed and max allowed container policies.
  • The uncapped target estimation is a target estimation produced if there were no minAllowed and maxAllowed restrictions.

For this demo, we will manually update the CPU from 100m to 180m in deployment.yaml file:

---
apiVersion: apps/v1
kind: Deployment
…
...
requests:
cpu: 180m
memory: 50Mi
limits:
cpu: 600m
memory: 100Mi
…
...

Apply the changes:

$ kubectl apply -f deployment.yaml
deployment.apps/hamster created
verticalpodautoscaler.autoscaling.k8s.io/hamster-vpa created

Check the status of the pod. When we change the CPU metrics, the hamster pod gets terminated, and a new pod is provisioned with a new CPU value as we declared.

If we want to automate this, we should update the “updateMode: off” parameter in deployment.yaml file.

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hamster-5dff8d44d6-w42hm 1/1 Running 0 3s
hamster-7cfc7d5644-kq4qw 1/1 Terminating 0 77s

Check the events to see VPA in action in the background,

$ kubectl get event -n kube-system

Sample output:

$ kubectl get event -n kube-system

Kubernetes VPA Auto-Update Mode

There are multiple valid options for updateMode in VPA. They are:

  • Off – VPA will only provide the recommendations, and it will not automatically change resource requirements.
  • Initial – VPA only assigns resource requests on pod creation and never changes them later.
  • Recreate – VPA assigns resource requests on pod creation time and updates them on existing pods by evicting and recreating them.
  • Auto mode – It recreates the pod based on the recommendation.

We increased the CPU metrics in the above demo and then manually applied the changes to scale the pod. We can do this automatically by using the updateMode: "Auto" parameter.

Here is an example using Auto mode:

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
  namespace: vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: nginx
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "nginx"
      minAllowed:
        cpu: "250m"
        memory: "100Mi"
      maxAllowed:
        cpu: "500m"
        memory: "600Mi"

Because we declared updateMode: "Auto,", VPA will automatically scale the cluster based on the VPA recommendations.

Excluding Scaling for a Container

Let’s assume we have a pod running two containers, and we want only one container to scale based on VPA. The other container (say a container used to host a performance monitoring agent) should not scale, because it does not require scaling.

We can achieve this by opting out the containers which do not need scaling. In this case, we should mode: "Off" for the container hosting the performance monitoring tool’s agent, which does not require the scaling:

Example: Exclude scaling

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-opt-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       my-vpa-deployment
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: newrelic-sidecar-container
      mode: "Off"

Usage and Cost Reporting With VPA

It’s easier to measure usage and calculate costs in a static Kubernetes cluster. It’s much harder to do it when the resources allocated to pods routinely change.

Vertical autoscaling adds to this challenge, and requires administrators to rely on specialized tools to measure and allocate usage by cluster tenant, application, environment, and label (read our guide on Kubernetes labels).

The open source Kubecost tool addresses this very problem, by analyzing usage metrics and correlating them with your actual cloud billing data to provide dashboards and reports for a complete picture of usage and cost data. The screenshot below shows the main Kubecost dashboard summarizing the cluster costs, efficiency, and health:


Kubecost is easy to install using a single Helm command and can also leverage your existing Prometheus and Grafana installations to facilitate data collection and display. You can start here and try it out for free.

Summary

We’ve covered a lot of ground on Kubernetes VPA in this article. Here is a summary of the key takeaways:

  • There are three types of autoscaling in Kubernetes: Horizontal Pod Autoscaler, Cluster Autoscaler, and Vertical Pod Autoscaler.
  • Each of the three autoscaler types is different, and understanding how different autoscalers work will help you better configure your cluster.
  • Using HPA without VPA could accumulate wasted resources by replicating under-utilized pods to meet an increasing workload.
  • You can combine HPA and VPA for many cases, but make sure to use custom metrics to drive HPA. If you only use CPU and memory metrics, you cannot use HPA and VPA together.
  • VPA should be used in stateful workloads because stateful workloads are harder to scale horizontally. VPA provides an automated way to scale up resource consumption.
  • When using VPA, make sure to set the maximum resources per pod in the Vertical Pod Autoscaler object as the VPA recommendation might exceed the available resources in the cluster.
  • VPA makes measuring usage, cost, and efficiency more challenging by introducing variability in resource allocation. You can use Kubecost to overcome these challenges by automating the measurement and analysis processes.
Learn how to manage K8s costs via the Kubecost APIs

Continue reading this series