Close

Kubernetes – Test Cluster

I had some time to spare whilst waiting for my access and laptop to be sorted for my new work role so decided to use the time to learn Kubernetes (K8s – K 8 letters s).

Kubernetes is an orchestration tool for Docker containers that also ensures that containers are restarted automatically elsewhere in the cluster if a node fails.

I had been looking into how to make my various websites more resilient from a database point of view and scalable from an app point of view and Kubernetes looked like it could be the way to go.

I watched Jeff Geerling’s Kubernetes 101 series of Youtube videos to obtain a basic understanding of the concepts, these gave me a good basic understanding of its history and how it fits together.

Next I decided to try building a cluster on a few virtual machines.

I initially created 3 VMs, each with 4GB of RAM and 2 CPUs, as clones of my Ubuntu template that I keep shut down as a cloning source.

I setup a load balancer with HAProxy on port 6443 on my PFsense firewall and then installed Docker and then the Kubernetes kubeadm software on the first node with apt. (instructions here).

The only problem was that the bootstrap environment did not like using my load balancer and had major issues with the self certified certificates. The local ones did not match the ones it had generated for the cluster via the load balancer.

After much playing around with attempting to generate my own certificates, I have a wildcard domain certificate for hslracing.com, so thought it was worth a try, but I was really struggling to work out exactly how many certs of what kind the cluster required, I decided to watch some more videos.

This is when I stumbled across some videos by Adrian Goins, where he shows various installs of K3s (half the letters in Kubernetes), a much slimmer version of Kubernetes which is potentially much more suited for homelab use.

I reset and cleared down my VMs, I set up 3 master nodes and 3 worker nodes and then followed his video on HA K3s with etcd, Kube-VIP, MetalLB and Rancher to build a cluster. This had none of the certificate issues I had previously encountered and seemed to be going well.

The first problem came as soon as I installed Rancher, the CPU on the various nodes went haywire and the entire cluster seemed to be in some sort of never ending death spiral.

After much searching for suitable commands to use to interrogate the cluster I discovered that it was complaining about disk pressure on ephemeral storage which meant next to nothing to me and all of the searches I did implied that the resolution was to make use of faster SSD storage.

The real issue I eventually realised is that because of where containerd (K3s Docker equivilent) installs under /var/lib and then proceeds to generate loop back mounts from here as well as generate log files under here.

I had only provisioned a 10GB OS file system and once the installs were complete and it attempted to create a new container for Rancher the filesystems showed as more than 85% full and so killed off the container and tried again on the next worker node. This lead to High CPU on the worker node, which in turn affected the underlying physical server, impacted the responsiveness of the active database node and then caused that to fail over to the next node, in turn generating yet more CPU load – as mentioned above, a never ending death spiral.

Once I realised what was happening, I cleared down the cluster, presented an additional 10GB of disk to each cluster node migrated the /var file system onto this new storage and then started from scratch once again. This time I had a stable cluster.

After further reading I discovered that ideally Rancher should be installed on its own cluster and then used to remotely manage the rest of your Kubernetes clusters.

At this point I decided to simply install the Rancher Docker container and use that to manage my cluster instead of having Rancher installed on the cluster itself. I also found a more recent video by Adrian that skips the use of MetalLB and simply uses Kube-VIP as the load balancer (simpler is always better in my opinion).

I then decided to install Longhorn to manage local dedicated file systems for stateful environments like MySQL. I added an additional 40GB of local storage to each of the worker nodes that I mounted under /var/lib/longhorn. I then found a bug in Longhorn whereby it attempts to use the wrong arguments to mount NFS for its Backup storage.

I also installed the NFS Subdirectory External Provisioner for shared volume claims within the cluster such that I can have multiple WordPress instances running for the same website.

I installed the Bitnami MYSQL cluster via Helm within Rancher after adding the Bitnami repository as a backend for my WordPress sites and then installed Bitnami WordPress again from the App store in Rancher.

This has all been good testing and I may/may not keep this test cluster, but I have decided to build a bare metal Kubernetes cluster running K3s on a bunch of Lenovo Tiny desktop machines.

Leave a Reply