Close

Yet More Network Upgrades

Since the last network update post, I have added a couple of additional items to my office rack.

I moved the Unifi US-XG-16 switch from the garage to the office and removed the Unifi USW-Flex-XG switch – this will move out to the garage when I finally configure the new storage server.

The Home Assistant installation was moved from a Dell Wyse 5020 thin client to a Lenovo m93p server

I temporarily installed the first 3 nodes of a new Proxmox cluster, and sat them on top of the Unifi US-16-150W switch, the 4th node was sitting just outside via a longer set of cables

Having migrated all of the workloads from my old 12 node Lenovo m92p cluster to my new 4 node Lenovo m920q cluster, I then set about switching off the old nodes.

I switched off 4 of them a few weeks ago to free up a couple of power connectors for the new cluster, but now all of the workloads had moved, I proceeded to shutdown the rest.

This resulted in a relatively significant drop in power usage on the Office UPS:The new m920q cluster was up and running at this point connected to the same UPS, so the 100W drop is almost entirely due to the old cluster being switched off.

This also allowed me to remove the Unifi US-24 Non PoE switch and replace it with a Unifi USW-Aggr 10Gb switch as I want to provide a dedicated 10Gb network for my Ceph cluster to talk over.

I have bought a number of Unifi switches over the years that I swap in and out depending on how I am configuring my cabling at the time, I don’t tend to sell them on, I just set them aside until I next need them.

The next challenge was moving the nodes of the new cluster without major disruption to the running services.

I had recently bought the parts to make 2 more cluster nodes, so after having installed Proxmox, joined them to the cluster, setup Ceph and then created additional K3s nodes, I could start moving the first 4 servers.

Step 1 was to prevent Ceph from moving any data whilst a given server was offline, this simply involved setting the noout and norebalance flags, then shutting down one of the physical servers, moving and recabling before starting it back up again.

Once the server was back up and had rejoined the Proxmox cluster, I unset the flags and allowed any resync’s to happen before moving the next node.

Any K3s workloads that had been affected by the server outage simply migrated themselves to a different node of the k3s cluster during the outage.

Rinse and repeat for the remaining 3 servers.

I also decided to move my Home Assistant server and then shuffle a couple of the switches around, I moved the Unifi US-16-150W switch up a slot and moved the Unifi US-16-XG switch down to the now vacant slot below the 16 port switch as I currently only have 0.5m SFP+ DAC cables which won’t reach up that extra slot from the cluster nodes.