Create the Kubernetes service
For our next step, we have to create a service. The sample application will listen to the public endpoint by using this service. Create a service configuration file with the following content:
$ cd /Users/bob/hpa/
$ cat service.yaml
apiVersion: v1
kind: Service
metadata:
name: hpa-demo-deployment
labels:
run: hpa-demo-deployment
spec:
ports:
- port: 80
selector:
run: hpa-demo-deployment
This service will be a front-end to the deployment we created above, which we can access via port 80.
Apply the changes:
$ kubectl apply -f service.yaml
service/hpa-demo-deployment created
We have created the service. Next, let’s list the service and see the status:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hpa-demo-deployment ClusterIP 10.100.124.139 80/TCP 7s
kubernetes ClusterIP 10.100.0.1 443/TCP 172m
Here, we can see:
- hpa-demo-deployment = Service Name
- 10.100.124.139 = IP address of the service, and it is open on Port 80/TCP
Install the Horizontal Pod Autoscaler
We now have the sample application as part of our deployment, and the service is accessible on port 80. To scale our resources, we will use HPA to scale up when traffic increases and scale down the resources when traffic decreases.
Let’s create the HPA configuration file as shown below:
$ cd /Users/bob/hpa/
$ cat hpa.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
Apply the changes:
$ kubectl apply -f hpa.yaml
horizontalpodautoscaler.autoscaling/hpa-demo-deployment created
Verify the HPA deployment:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-demo-deployment Deployment/hpa-demo-deployment 0%/50% 1 10 0 8s
The above output shows that the HPA maintains between 1 and 10 replicas of the pods controlled by the hpa-demo-deployment. In the example shown above (see the column titled “TARGETS”), the target of 50% is the average CPU utilization that the HPA needs to maintain, whereas the target of 0% is the current usage.
If we want to change the MIN and MAX values, we can use this command:
📝Note: Since we already have the same MIN/MAX values, the output throws an error that says it already exists.
Increase the load
So far, we have set up our EKS cluster, installed the Metrics Server, deployed a sample application, and created an associated Kubernetes service for the application. We also deployed HPA, which will monitor and adjust our resources.
To test HPA in real-time, let’s increase the load on the cluster and check how HPA responds in managing the resources.
First, let’s check the current status of the deployment:
$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
hpa-demo-deployment 1/1 1 1 23s
Next, we will start a container and send an infinite loop of queries to the ‘php-apache’ service, listening on port 8080. Open a new terminal and execute the below command:
# kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-demo-deployment; done"
📝Note: If you do not have DNS entries for the service, use the service name.
To view the service name:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hpa-demo-deployment ClusterIP 10.100.95.188 80/TCP 10m
Before we increase the load, the HPA status will look like this:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-demo-deployment Deployment/hpa-demo-deployment 0%/50% 1 10 1 12m