Lab Architecture
Note
The lab hosts most of the services deployed in the Tatsu landscape: home automation services, the observability stack, even these docs.
Summary
The lab architecture is based on Kubernetes, provided by RKE2, with storage within Kubernetes provided by Rook Ceph. The lab machines run a minimal install of Debian (see OS install notes), within which RKE2 runs as a service.
RKE2
RKE2 (Rancher Kubernetes Engine 2) is an open-source distribution of Kubernetes that runs from a single binary. This makes is easy to manage and deploy, which we do via Ansible. RKE2 installations are either “servers” or “agents” - both of them will run normal workloads, but the server nodes will also run the control plane that manages Kubernetes itself. We run three server nodes for redundancy.
Rook Ceph
Ceph is a widely-used and extremely reliable storage provider that is designed to avoid any single points of failure. It is famously complicated to setup and configure, so we use the Rook operator to abstract away that complexity.
Rook is a Kubernetes-native orchestrator for Ceph that allows us to declare what the Ceph cluster should look like and leave it to handle how that should be done.
Ceph can provide three different types of storage:
Block devices (RADOS Block Devices, or RBD) - these are presented as raw devices, comparable to a brand new hard drive with no file system applied.
A file system (CephFS) - this is a fully-featured file system that implements the POSIX standard (i.e. the same as most Unix systems).
Object storage (Ceph Object Gateway, RADOS Gateway, or RGW) - this is an S3-compatible API for “simple” object storage.
Note
RADOS stands for Reliable Autonomic Distributed Object Store. It is one of the core technologies behind Ceph, which is why it appears in some of the names above. The other acronym you’ll see is CRUSH, or Controlled Replication Under Scalable Hashing, which determines how data is replicated and assigned to physical nodes.
Only CephFS is used at the moment, which backs every persistent volume in the Kubernetes cluster. If we use either of the other two storage types in the future they will all share the same underlying storage.
Ceph includes a dashboard which is avaiable here - the credentials are in the password manager.
Workload Deployment
Workloads are not deployed manually to the cluster; they are defined in a Git repo (this one) and continuously applied to the cluster by ArgoCD (this practise is known as GitOps). This helps to prevent config drift within the cluster, and means that service upgrades can be rolled out automatically just by merging PRs in the repo.
ArgoCD itself is deployed in Kubernetes, so what if you need to start from scratch? The Git repo linked above contains a readme that explains how to bootstrap the cluster manually up to the point that ArgoCD can run, after which it will take over.