Learn how to install Kubespray and deploy Kubernetes clusters. See how Kubespray help you with high availability, error free upgrades, scalability & much more.
🎉 Kubecost 2.0 is here! Learn more about the massive new feature additions and predictive learning

Kubespray

The most common question with Kubernetes is: “Where do I begin?” In nearly every case, the answer is “with a Kubernetes cluster.” The Kubernetes ecosystem has come a long way since its initial release in 2014. Back then, you were on your own. Little assistance was available, and few specialists knew how to set up a Kubernetes cluster and administer it correctly. Fortunately, there are several convenient options to help you today.

Primarily, we can categorize deployed Kubernetes clusters into two groups: fully-managed and self-managed. Amazon Web Services EKS, Google Cloud GKE, and Azure AKS are some of the most popular fully-managed offerings, whereas the second category, naturally, relies on personal setup and maintenance.

Why would you need to create and manage your own Kubernetes clusters? Well, it mainly depends on your needs and requirements. You might want to run a Kubernetes cluster using on-premise infrastructure, use a version of Kubernetes not supported by your cloud vendor, or you may have specific security requirements.

So, how do you start? You could manually install all Kubernetes components onto your compute nodes yourself, but this can be complex, error-prone, and time-consuming. Simpler, more scalable, and robust solutions are available!

Automation tools such as Kubespray make deploying a Kubernetes cluster almost effortless. Kubespray is an open-source tool that allows for the automated deployment of Kubernetes clusters across computing resources, such as virtual machines. Built to be configurable, fast, and lightweight, Kubespray meets most needs.

Kubespray: features and components

Before we begin this tutorial on Kubespray, let's quickly summarize the main features and components we will cover later in this article.

Feature Description
High Availability Kubespray has built-in support for provisioning and managing a highly available cluster
Upgrades Kubespray significantly simplifies upgrading a self-managed Kubernetes cluster.
Maintenance & Scaling Kubespray has additional Ansible playbooks that administrators can use to manage the cluster. For instance, they can easily increase the number of nodes in a cluster
Configurations An essential aspect of Kubespray is the ability to configure almost every component of the deployment. Some examples of these are DNS, CRI, and CNI.
Kubespray vs. other tools Other alternatives to Kubespray are Kubeadm or Kops

Using Kubespray

First, we want to point out that Kubespray doesn’t provision objects like nodes or virtual machines – that’s your responsibility. Instead, Kubespray performs specific operations against those resources. For instance, it sets up the OS and installs packages, downloads and runs Container Runtime Interfaces (Containerd or Docker), creates infrastructure for secure connections between Kubernetes components, and more. Please be aware that the node you run Kubespray commands from needs SSH access and connectivity to all devices.

Comprehensive Kubernetes cost monitoring & optimization

Kubespray Tutorial

This tutorial explains how to provision a set of EC2 instances on AWS and deploy a Kubernetes cluster using Kubespray. To save time, we will use Terraform (an open-source tool for provisioning cloud and on-premise infrastructure) for the first section. Then, we will walk through how to use Kubespray to deploy a Kubernetes cluster using EC2 instances.

Before we start, ensure you have the following tools installed:

Requirements

Download the Kubespray GitHub repository and check out the latest version. At the time of writing, we are using 2.19. Kubespray is under constant development, and you can run into problems if you don’t use a stable release.

$ git clone git@github.com:Kubernetes-sigs/Kubespray.git

$ cd Kubespray

$ git checkout release-2.19

Let’s install the required Python dependencies using pip

$ pip3 install  -r ./requirements.txt

Infrastructure

Next, we’re going to use the Terraform template provided in the Kubespray repository to deploy a set of AWS EC2 instances. After that, we should have a cluster of EC2 instances configured as pictured below.

The AWS cloud infrastructure created by Terraform

The AWS cloud infrastructure created by Terraform (source)

In the Kubespray repository, navigate to the AWS terraform directory.

$ cd contrib/terraform/aws

Run Terraform initialization to download the dependencies.

$ terraform init

Initializing modules...

Initializing the backend...

Initializing provider plugins...
- Reusing previous version of hashicorp/template from the dependency lock file
- Reusing previous version of hashicorp/null from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Using previously-installed hashicorp/template v2.2.0
- Using previously-installed hashicorp/null v3.1.1
- Using previously-installed hashicorp/aws v4.22.0

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Set AWS credentials for Terraform to be able to interact with your AWS account.

export TF_VAR_AWS_ACCESS_KEY_ID="<YOUR_ACCESS_KEY_HERE>"
export TF_VAR_AWS_SECRET_ACCESS_KEY="<YOUR_SECRET_KEY_HERE>"
export TF_VAR_AWS_DEFAULT_REGION="us-west-2" # Or any other AWS region

If you don’t have an AWS EC2 key pair, create one by following this guide before proceeding. We will use this key pair later to allow Kubespray to connect to our EC2 instances. We have already created and downloaded a key pair called cluster-1.

The “cluster-1” EC2 key pair used for SSH access to EC2 instances

The “cluster-1” EC2 key pair used for SSH access to EC2 instances

Set the following variable for Terraform to use your keypair.

$ export TF_VAR_AWS_SSH_KEY_NAME="cluster-1"

During this tutorial, you may experience connectivity issues to either the Bastion instance or any of the EC2 instances. In that case, you can try to add the EC2 key pair to your SSH Agent using the following commands.

$ eval $(ssh-agent)

$ ssh-add <PATH_TO_YOUR_KEY_PAIR_PEM>

Next, we will use Terraform to create our instances and other resources. The ‘apply’ command will kick off the creation process, but you could also use the ‘plan’ command to view an execution plan before applying it. If everything is set up correctly, Terraform will create the resources within minutes.

$ terraform apply

# or

$ terraform plan -out cluster-1.plan
$ terraform apply cluster-1.plan

.
.
.

Apply complete! Resources: 42 added, 0 changed, 0 destroyed.

Outputs:

aws_nlb_api_fqdn = "kubernetes-nlb-cluster-1-9380bb7099c714a4.elb.us-west-2.amazonaws.com:6443"
bastion_ip = "35.89.103.85"
default_tags = tomap({})
etcd = <<EOT
      10.250.193.75
      10.250.212.67
      10.250.198.201
      EOT
      inventory = <<EOT
      [all]
      ip-10-250-193-75.us-west-2.compute.internal ansible_host=10.250.193.75
      ip-10-250-212-67.us-west-2.compute.internal ansible_host=10.250.212.67
      ip-10-250-198-201.us-west-2.compute.internal ansible_host=10.250.198.201
      ip-10-250-203-209.us-west-2.compute.internal ansible_host=10.250.203.209
      ip-10-250-221-239.us-west-2.compute.internal ansible_host=10.250.221.239
      ip-10-250-196-81.us-west-2.compute.internal ansible_host=10.250.196.81
      ip-10-250-223-226.us-west-2.compute.internal ansible_host=10.250.223.226

      bastion ansible_host=35.89.103.85

      [bastion]
      bastion ansible_host=35.89.103.85

      [kube_control_plane]
      ip-10-250-193-75.us-west-2.compute.internal
      ip-10-250-212-67.us-west-2.compute.internal
      ip-10-250-198-201.us-west-2.compute.internal

      [kube_node]
      ip-10-250-203-209.us-west-2.compute.internal
      ip-10-250-221-239.us-west-2.compute.internal
      ip-10-250-196-81.us-west-2.compute.internal
      ip-10-250-223-226.us-west-2.compute.internal

      [etcd]
      ip-10-250-193-75.us-west-2.compute.internal
      ip-10-250-212-67.us-west-2.compute.internal
      ip-10-250-198-201.us-west-2.compute.internal

      [calico_rr]

      [k8s_cluster:children]
      kube_node
      kube_control_plane
      calico_rr

      [k8s_cluster:vars]
      apiserver_loadbalancer_domain_name="kubernetes-nlb-cluster-1-9380bb7099c714a4.elb.us-west-2.amazonaws.com"

      EOT
      masters = <<EOT
      10.250.193.75
      10.250.212.67
      10.250.198.201
      EOT
      workers = <<EOT
      10.250.203.209
      10.250.221.239
      10.250.196.81
      10.250.223.226
      EOT

The output above shows the resources created by Terraform. It also creates an inventory/hosts file, which Kubespray and Ansible use to manage the nodes and their Kubernetes roles. Later, we’ll use it to install some essential components into the nodes. See the Ansible documentation to learn more about the use cases and syntax of the inventory file. The code block below shows the content of the inventory/hosts file generated for this tutorial. Its content will be referenced throughout the post.

[all]
ip-10-250-193-75.us-west-2.compute.internal ansible_host=10.250.193.75
ip-10-250-212-67.us-west-2.compute.internal ansible_host=10.250.212.67
ip-10-250-198-201.us-west-2.compute.internal ansible_host=10.250.198.201
ip-10-250-203-20[9.us-west-2.compute.internal ansible_host=10.250.203.209
ip-10-250-221-239.us-west-2.compute.internal ansible_host=10.250.221.239
ip-10-250-196-81.us-west-2.compute.internal ansible_host=10.250.196.81
ip-10-250-223-226.us-west-2.compute.internal ansible_host=10.250.223.226

bastion ansible_host=35.89.103.85

[bastion]
bastion ansible_host=35.89.103.85

[kube_control_plane]
ip-10-250-193-75.us-west-2.compute.internal
ip-10-250-212-67.us-west-2.compute.internal
ip-10-250-198-201.us-west-2.compute.internal

[kube_node]
ip-10-250-203-209.us-west-2.compute.internal
ip-10-250-221-239.us-west-2.compute.internal
ip-10-250-196-81.us-west-2.compute.internal
ip-10-250-223-226.us-west-2.compute.internal

[etcd]
ip-10-250-193-75.us-west-2.compute.internal
ip-10-250-212-67.us-west-2.compute.internal
ip-10-250-198-201.us-west-2.compute.internal

[calico_rr]

[k8s_cluster:children]
kube_node
kube_control_plane
calico_rr

[k8s_cluster:vars]
apiserver_loadbalancer_domain_name="kubernetes-nlb-cluster-1-9380bb7099c714a4.elb.us-west-2.amazonaws.com"

As you can see, this file lists all the EC2 instances and their role in the cluster. For example, the content above shows that we will install the Kube control components and etcd into ip-10-250-193-75.us-west-2.compute.internal.

K8s clusters handling 10B daily API calls use Kubecost

Deploy Kubernetes cluster using Kubespray

Now that our EC2 instances are available, we can use Kubespray to deploy a Kubernetes cluster. Kubespray uses Ansible under the hood to accomplish this task. Ansible allows you to define a set of procedures or commands that should run on one or more machines. Tasks are defined using a human-readable YAML format that Ansible uses when connecting to remote machines.

Navigate to the root of the Kubespray repository and follow the steps below to set up the inventory directory for cluster-1.

$ mkdir inventory/cluster-1

# Copy the inventory file
$ cp inventory/hosts inventory/cluster-1

# Copy the configurations
$ cp -r inventory/sample/* inventory/cluster-1/

Run the Ansible playbook cluster.yml to deploy the cluster. This Ansible playbook connects to all nodes listed in the inventory, installs the components, and then starts the cluster. The ansible_user is the user ID Ansible uses to SSH into the instances. The ansible_ssh_private_key_file is the path to the ec2 key pair PEM file downloaded earlier.

$ ansible-playbook -i ./inventory/cluster-1/hosts ./cluster.yml -e ansible_user=admin -e ansible_ssh_private_key_file=<PATH_TO_YOUR_KEY_PAIR_PEM> -b --become-user=root --flush-cache
The Ansible playbook successfully created the Kubernetes cluster

The Ansible playbook successfully created the Kubernetes cluster

This command will take some time, but you will have a highly available and self-managed Kubernetes cluster once it has completed! Compared to other methods, Kubespray has made the entire process relatively painless.

To access the cluster, we’ll copy the kubectl configuration created by Kubespray from one of the control plane nodes. In our case, let’s retrieve it from the first Kubernetes control node in the inventory file, ip-10-250-193-75.us-west-2.compute.internal.

The IP address of the host specified by the ansible_host parameter is 10.250.193.75. In addition, note the load balancer address for the Kube API Server. In our case kubernetes-nlb-cluster-1-9380bb7099c714a4.elb.us-west-2.amazonaws.com.

Based on this information, let’s create the kubectl configuration.

$ HOSTNAME=<HOST_NAME_HERE>
$ IP=<YOUR_IP_HERE>
$ SERVER=<LOAD_BALANCER_ADDRESS>

# Download the kubeconfig file
$ ssh -F ssh-bastion.conf admin@${IP} "sudo chmod 644 /etc/kubernetes/admin.conf"
$ scp -F ssh-bastion.conf admin@${IP}:/etc/kubernetes/admin.conf ./kubeconfig

# Update the kubeconfig to use the load balancer address
$ kubectl --kubeconfig kubeconfig config set-cluster cluster.local --server https://${SERVER}:6443

# Reset the permissions to the original one
$ ssh -F ssh-bastion.conf admin@${IP} "sudo chmod 600 /etc/kubernetes/admin.conf"

We can now use the kubectl file to interact with the Kubernetes cluster.

The list of nodes, using kubectl, available in the Kubernetes cluster after creation

The list of nodes, using kubectl, available in the Kubernetes cluster after creation.

Congratulations! You have successfully set up and accessed a Kubernetes cluster.

High availability

A highly available Kubernetes cluster has 3 (or more) control-plane and etcd instances running on separate nodes, and the cluster we created earlier via Kubespray was an example of this. To allow Kubespray to create highly available clusters, you must provide multiple instances under the kube_control_plane and etcd sections.

[kube_control_plane]
ip-10-250-193-75.us-west-2.compute.internal
ip-10-250-212-67.us-west-2.compute.internal
Ip-10-250-198-201.us-west-2.compute.internal

[etcd]
ip-10-250-193-75.us-west-2.compute.internal
ip-10-250-212-67.us-west-2.compute.internal
ip-10-250-198-201.us-west-2.compute.internal

In terms of Kubernetes worker node numbers, you can theoretically have as many as you wish. However, the more nodes and machines Kubespray has to deploy to, the longer it takes. Check here for more information on tuning your instructions for larger clusters. As a general rule, for clusters of more than 1000 nodes, separate the control plane nodes from the etcd servers.

Upgrades

Regularly upgrading your cluster to the latest stable release is essential to security and functionality. But from our experience, manual upgrades can be painful and error-prone, so automation tools like Kubespray enhance the upgrade process.

In addition, Kubespray has extensive support for upgrading clusters. You can upgrade a specific cluster component, such as Docker or etcd, or upgrade the cluster one node at a time. Although not recommended, you can even upgrade the entire cluster simultaneously.

Maintenance & scaling

Kubespray comes with Ansible playbooks for adding, replacing, or removing nodes. These are extraordinarily useful if a node has an unrecoverable issue such as a hardware failure. These playbooks apply to the Kubernetes control plane and worker nodes, but each procedure is slightly different. Make sure to read the detailed documentation here for more information.

In the previous example, we could remove a single worker node (ip-10-250-203-209.us-west-2.compute.internal) by running the playbook remove-node.yml.

$ ansible-playbook -i ./inventory/cluster-1/hosts ./remove-node.yml -e ansible_user=admin -e node='ip-10-250-203-209.us-west-2.compute.internal' -b --become-user root
Image shows Kubectl listing all nodes after one was removed

Image shows Kubectl listing all nodes after one was removed

To simulate adding a new node to the cluster, we will append the same node (ip-10-250-203-209.us-west-2.compute.internal) back to the /inventory/cluster-1/hosts file and run the command below.

[kube_node]
ip-10-250-221-239.us-west-2.compute.internal
ip-10-250-196-81.us-west-2.compute.internal
ip-10-250-223-226.us-west-2.compute.internal
ip-10-250-203-209.us-west-2.compute.internal
$ ansible-playbook -i ./inventory/cluster-1/hosts ./scale.yml -e ansible_user=admin -e node='ip-10-250-203-209.us-west-2.compute.internal' -b --become-user root --limit='ip-10-250-203-209.us-west-2.compute.internal' 

Once completed, you should see the node added to the Kubernetes cluster.

The list of nodes, using kubectl, available in the Kubernetes cluster after addition of one node.

The list of nodes, using kubectl, available in the Kubernetes cluster after addition of one node.

Unfortunately, Kubespray doesn’t support integration with cluster auto scalers. Since Kubespray can’t deploy nodes by itself, it cannot add more to the cluster dynamically. To add more nodes, you must deploy them using another mechanism and instruct Kubespray to bootstrap them before placing them into a cluster.

Learn how to manage K8s costs via the Kubecost APIs

Configurations

In the next section, we’ll discuss other cluster components that Kubespray can deploy. Although not an exhaustive list, it’s enough to give you a good idea of what can be done, and you can learn more via the Kubespray docs.

DNS

Domain Name System (DNS) is vital to any Kubernetes cluster. The Kubernetes model dictates that each pod has a unique IP address used to communicate with other pods. However, IP addresses are not static and will change as pods are created and destroyed. DNS allows pods to communicate via DNS address instead, and the DNS address of a pod or service is stable and unchanging. To learn more about pod and service DNS addresses, refer to the Kubernetes documentation.

You can use Kubespray to configure the DNS server of a cluster. It installs CoreDNS, the most commonly used Kubernetes DNS server, but you can use a custom DNS solution if you wish. In the configuration file, change the dns_mode to manual and provide a value for manual_dns_server. For example, if the custom DNS server were at address 10.0.0.1, you would update the configuration file to the following:

# file: inventory/cluster-1/group_vars/k8s_cluster/k8s-cluster.yml

# Can be coredns, coredns_dual, manual or none
dns_mode: manual
# Set manual server if using a custom cluster DNS server
manual_dns_server: 10.0.0.1

CRI

Container Runtime Interface (CRI) is an abstraction for the software that runs containers, and Kubernetes relies on a CRI for this reason. CRI software must exist on all Kubernetes nodes and is responsible for downloading and executing container images. A common CRI used with Kubernetes is Docker, although other options are available. Kubespray also supports сontainerd (default) and CRI-O. To change the CRI installed in your cluster, change the container_manager parameter. For example, you could update the value to docker.

# file: inventory/cluster-1/group_vars/k8s_cluster/k8s-cluster.yml

## Container runtime
## docker for docker, crio for cri-o and containerd for containerd.
## Default: containerd
container_manager: docker

CNI

Container Network Interface (CNI) is another abstraction to simplify the container's network connectivity. For example, CNIs configure network interfaces when a container is created or removed.

Kubernetes takes advantage of CNI plugins to set up container connectivity. Networking is complicated, and every organization has different requirements. Thus, Kubespray is a great choice since it supports a variety of CNIs, including Calico, Flannel, Kube Router, Kube OVN, and more. By default, Kubespray installs the Calico plugin. If you’d like to use Flannel instead of Calico, update the configuration file as below:

# file: inventory/cluster-1/group_vars/k8s_cluster/k8s-cluster.yml

# Choose network plugin (cilium, calico, kube-ovn, weave or flannel. Use cni for generic cni plugin)
# Can also be set to 'cloud', which lets the cloud provider setup appropriate routing
kube_network_plugin: flannel

Kubespray vs. other tools

Other options are available if you want to deploy a self-managed Kubernetes cluster, including Kubeadm and Kops.

The main advantage of Kubespray over Kops is that Kops is more tightly coupled with the cloud provider, particularly AWS. If you wish to deploy the cluster to other cloud providers or on-premises, Kubespray is easier and faster.

Interestingly, Kubespray uses Kubeadm under the hood, but Kubespray performs tasks such as OS preparation and cluster initialization which Kubeadm cannot.

Conclusion

In a matter of minutes, with only a few commands, we used Kubespray to deploy a fully functional, highly available Kubernetes cluster. Kubespray is a powerful automation tool that is configurable, adaptable, and extensible. It incorporates operations and security practices that allow you to focus less on DevOps and more on building your application.

Compared to fully managed Kubernetes offerings such as EKS or GCK, it’s easier to configure since you’re fully in control. Where fully managed offerings are not feasible, such as on-premise, it provides the capability to deploy clusters quickly and cleanly. In short, Kubespray is the perfect solution for creating highly available, self-managed, and scalable Kubernetes clusters.

As a reminder, if you no longer wish to use this Kubernetes cluster, you should clean up the resources to avoid incurring costs. Use the following Terraform command to delete all the resources we created earlier.

$ cd contrib/terraform/aws

$ terraform destroy

Comprehensive Kubernetes cost monitoring & optimization

Continue reading this series