Encrypting Everything: LUKS + Dropbear + RAID6 on a Headless Cluster

Second post in the k3s homelab series. If you missed the first one, it's about tunneling 40 services through a VPS to escape CGNAT.

So, corellia, my control plane node, an 8-core N305 mini PC, has six SATA drives hanging off it. Five active in a RAID6 array, one hot spare. That's ~33 TB of usable storage, serving the entire cluster over NFS. Media, backups, databases, git repos.. everything lives on /share.

But here's the thing: every amd64 node in the cluster has full-disk encryption. Root partitions on corellia, mandalore, and tatooine are all LUKS-encrypted. The Raspberry Pis are the exception, they boot from SD cards and the threat model is different (and honestly, encrypting a Pi SD card is more pain than it's worth).

This is great until one of the machines reboots and nobody is home to type the passphrase 😅.

The problem: encrypted disks on headless machines

LUKS full-disk encryption means the kernel can't mount the root filesystem without a passphrase at boot time. On a desktop you type it in. On a headless server in a closet? You're stuck.

The common solutions are:

Don't encrypt, sure, if you don't care about someone walking off with your drives
TPM auto-unlock, works great on modern hardware, but my mini PCs don't have TPM chips
Tang/Clevis, network-bound encryption, elegant but requires running a Tang server somewhere
Dropbear in initramfs, SSH into the machine during boot and type the passphrase remotely

I went with Dropbear. It's the simplest thing that works, and I already SSH into everything anyway.

How Dropbear initramfs works

The idea is beautifully stupid. Linux initramfs (the tiny filesystem that loads before the real root) can run a minimal SSH server. You SSH in, pipe the passphrase to cryptsetup, and boot continues normally.

Here's what happens when corellia reboots:

BIOS → GRUB → kernel loads initramfs
Initramfs brings up the network interface with a static IP
Dropbear SSH server starts on port 22
I SSH in and unlock the root LUKS volume
Root mounts, RAID auto-unlocks using a keyfile on root (/etc/luks-md0)
RAID assembles, NFS starts, k3s joins the cluster

❯_bash›2 lines

  1# The unlock command, pipe passphrase to cryptsetup
  2❯❯❯ ssh root@corellia "echo -n 'hunter2' > /lib/cryptsetup/passfifo"

That one command is all it takes; the passphrase goes to a named pipe that cryptsetup reads from during initramfs, and the machine finishes booting on its own.

The clever part is the RAID encryption. Corellia has two LUKS layers: the root partition (nvme0n1p3) unlocked by Dropbear, and the RAID array (md0) unlocked automatically by a keyfile stored on the now-decrypted root. So I only type one passphrase, but both the OS and the 33 TB array end up encrypted. mandalore and tatooine only have the root partition to unlock, no RAID.

The RAID6 array

The storage stack on corellia is layered like a cake:

📄text›1 lines

  16 × SATA drives → RAID6 (md0) → LUKS (md0_crypt) → ext4 → /share → NFS

RAID6 gives me two-disk fault tolerance. Any two drives can fail simultaneously and I lose nothing. The hot spare kicks in automatically on the first failure, so in practice I'd need three drives to die before I lose data. With 8,000+ power-on hours on most drives and zero SMART errors (except sdb with 68 read errors that I'm keeping an eye on 👀), I sleep fine.

Current status from Prometheus:

📄text›4 lines

  1RAID6 (md0): HEALTHY, 5/5 active, 0 failed, 1 spare
  2Used: 9,794 GB / 33,393 GB (29.3%)
  3Drives: sda 44°C, sdb 41°C, sdc 45°C, sdd 42°C, sde 43°C, sdf 42°C
  4NVMe boot: 40°C, wear 3%, power-on 2,523h

Why RAID6 and not ZFS?

I know, I know. Every r/homelab post will tell you to use ZFS. But RAID6 via mdadm is:

Dead simple. mdadm --detail /dev/md0 tells me everything
No special kernel modules or memory requirements
Works with any filesystem on top (I use ext4 because it's boring and reliable)
Been around for decades, I trust it with my data

ZFS is great software. But I don't need snapshots, dedup, or inline compression badly enough to take on the operational complexity. mdadm + LUKS + ext4 is a stack I understand completely, and when something breaks at 3 AM I want simple.

Ansible: 16 roles, 7 nodes, one command

The entire bare-metal provisioning, from fresh Debian install to k3s-ready node, is handled by Ansible. 16 roles, applied in order:

📋yaml›12 lines

  1# site.yml, the full playbook structure
  2- Base provisioning (all hosts):       packages, sysctl
  3- Static networking (home + parents):  networking
  4- LUKS + Dropbear remote unlock:       luks_dropbear    # corellia, mandalore, tatooine
  5- RAID6 array (corellia):              raid
  6- NFS server (corellia):               nfs_server
  7- NFS client mounts:                   nfs_client       # mandalore, tatooine, kamino, jakku
  8- Tailscale:                           tailscale        # all nodes
  9- SD card wear minimization:           sdcard           # RPi nodes only
 10- Scarif firewall:                     scarif_firewall
 11- Headscale:                           headscale        # scarif only
 12- Startup scripts:                     startup          # per-node tuning

The inventory is grouped by function, not just location:

📋yaml›5 lines

  1# Functional groups in hosts.yml
  2luks_hosts:    [corellia, mandalore, tatooine]  # encrypted amd64 nodes
  3nfs_server:    [corellia]                        # the one true NFS server
  4nfs_clients:   [mandalore, tatooine, kamino, jakku]
  5rpi_hosts:     [kamino, jakku, dagobah]          # SD card wear tricks

This means I can provision a single host (ansible-playbook site.yml -l corellia), a single role (--tags sysctl), or the entire cluster in one shot. Dry-run with --check --diff before anything destructive.

The RAID role is intentionally defensive

The RAID Ansible role does not create the array. It only verifies it exists and prints instructions if it doesn't:

📋yaml›13 lines

  1# raid/tasks/main.yml, verify, don't create
  2- name: Fail if RAID array does not exist
  3  ansible.builtin.fail:
  4    msg: |
  5      RAID array {{ raid_array }} does not exist.
  6      To create it manually, run:
  7        mdadm --create {{ raid_array }} --level=6 \
  8          --raid-devices=5 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sdf \
  9          --spare-devices=1 /dev/sde
 10      Then set up LUKS:
 11        cryptsetup luksFormat {{ raid_array }}
 12        cryptsetup luksOpen {{ raid_array }} md0_crypt
 13  when: raid_status.rc != 0

Creating a RAID array is a one-time, destructive operation. I don't want Ansible doing that automatically. The role verifies the array, checks mdadm.conf and /etc/crypttab, and warns if anything is missing. The actual creation was done by hand, once, with me staring at the terminal making sure I had the right drives.

Dropbear setup with Ansible

The Dropbear role deploys three things:

SSH authorized keys into the initramfs
Dropbear config with hardened options
Static IP for the initramfs network

❯_bash›2 lines

  1# Dropbear options: no password auth, no forwarding, 10 min idle timeout
  2DROPBEAR_OPTIONS="-I 600 -j -k -p 22 -s"

The static IP is the interesting part. Each encrypted host needs network configured inside initramfs, before the real OS loads. The format is a single string that the kernel's ip= parameter parses:

📄text›1 lines

  1IP::GATEWAY:NETMASK:HOSTNAME:INTERFACE

For corellia that's 10.0.1.252::10.0.1.254:255.255.254.0:corellia:enp4s0. Every change triggers update-initramfs -u to rebuild the boot image.

NFS: serving 33 TB to the cluster

Once corellia's RAID is unlocked and mounted at /share, it exports via NFS to every worker node:

❯_bash›3 lines

  1# /etc/exports, two subnets: LAN + Tailscale overlay
  2/share    10.0.0.0/23(rw,async,no_root_squash,no_subtree_check,wdelay)
  3/share    100.64.0.0/10(rw,async,no_root_squash,no_subtree_check,wdelay)

Two subnets because worker nodes reach corellia either via LAN (10.0.x.x) or via the Tailscale overlay (100.64.x.x). The no_root_squash is required for k3s, containers run as root and need to create files on NFS volumes.

The client mount options are tuned for k3s workloads:

❯_bash›3 lines

  1# NFS client mount on worker nodes
  2❯❯❯ mount | grep share
  3nfs.crisidev.lan:/share on /share type nfs4 (rw,vers=4.2,hard,nconnect=4,rsize=1048576,wsize=1048576)

The key options:

vers=4.2, NFSv4.2, modern protocol with better locking and performance
hard, retry forever if the server is unreachable (don't fail pods just because NFS hiccupped)
nconnect=4, four parallel TCP connections per mount (huge throughput improvement)
rsize=1048576,wsize=1048576, 1 MB I/O buffers (default is 32 KB)
x-systemd.automount, lazy mount on first access, plays nice with boot ordering

Keeping Raspberry Pis alive: the SD card war

kamino and jakku are Raspberry Pi 4Bs. They're great little arm64 workers, but they boot from SD cards. SD cards have limited write endurance. Every log entry, every journal write, every atime update is slowly killing the card.

The sdcard Ansible role is ruthlessly focused on minimizing writes:

Disable atime, one of the biggest wins. By default, Linux updates the "last accessed" timestamp on every file read. That's a write for every read. Insane on flash storage.

❯_bash›2 lines

  1# /etc/fstab, noatime on root
  2/dev/mmcblk0p2  /  ext4  defaults,noatime  0  1

Volatile journal, systemd journal writes to RAM only, never touches the SD card:

📄text›3 lines

  1[Journal]
  2Storage=volatile
  3RuntimeMaxUse=50M

50 MB of logs in RAM. When the Pi reboots, logs are gone. That's fine, I have Loki collecting everything anyway.

Kill swap, use zram, disk-based swap on an SD card is murder. Instead, systemd-zram-generator creates compressed in-memory swap. The RAM compresses roughly 2:1, so the 8 GB Pi effectively gets ~12 GB of memory without touching the card.

Performance tuning: the startup script

Every node runs a startup script via a systemd oneshot service. The default script enables GRO (Generic Receive Offload) forwarding for NFS performance:

❯_bash›4 lines

  1# default-startup.sh, all nodes
  2❯❯❯ cat /usr/local/bin/startup.sh
  3NETDEV=$(ip -o route get 8.8.8.8 | cut -f 5 -d " ")
  4ethtool -K "$NETDEV" rx-udp-gro-forwarding on rx-gro-list off

Corellia gets extra tuning for the RAID array:

❯_bash›8 lines

  1# corellia-startup.sh, RAID performance
  2❯❯❯ cat /usr/local/bin/startup.sh
  3NETDEV=$(ip -o route get 8.8.8.8 | cut -f 5 -d " ")
  4ethtool -K "$NETDEV" rx-udp-gro-forwarding on rx-gro-list off
  5
  6echo "max_performance" | tee /sys/class/scsi_host/host*/link_power_management_policy
  7echo 256 | tee /sys/block/sd*/queue/read_ahead_kb
  8echo 32768 | tee /sys/block/md0/md/stripe_cache_size

Three tuning knobs:

Link power management: disable aggressive power saving on SATA links (latency vs power)
Read-ahead: 256 KB per disk (better sequential throughput for media streaming)
Stripe cache: 32 MB for RAID6 parity calculations (the default is absurdly small)

The startup role looks for {hostname}-startup.sh first, falls back to default-startup.sh. Simple convention, no conditional logic needed.

The 3 AM reboot scenario

Let's walk through what happens when corellia reboots unexpectedly:

T+0s: Power comes back, BIOS → GRUB → kernel → initramfs
T+15s: Dropbear starts, static IP configured on enp4s0
T+15s: Alertmanager fires NodeDown → Gotify notification on my phone 📱
T+?: I wake up, see the notification, grab my phone
T+?: ssh root@corellia "echo -n '<passphrase>' > /lib/cryptsetup/passfifo"
T+30s: Root unlocks, RAID keyfile available, md0 auto-decrypts
T+45s: RAID assembles, NFS starts, k3s agent rejoins, pods reschedule
T+60s: Cluster healthy, go back to sleep

The gap between step 3 and step 5 is the problem. If I'm traveling, asleep with my phone on silent, or just not paying attention, the cluster runs without its storage node. Worker nodes with hard NFS mounts will hang waiting for /share to come back, they won't crash, but they'll be useless until corellia unlocks.

Is this acceptable? For a homelab, yes. I've considered Tang/Clevis for automated unlock, but that means the encryption key is recoverable from the network, which defeats part of the purpose. For now, Dropbear + a phone notification gets me unlocked within minutes on a normal day.

Temperatures and health

The whole cluster runs cool. Corellia with six spinning drives maxes out at 45°C. The RPis run warmer (kamino at 59°C, jakku at 64°C) because passive cooling. mandalore hits 71°C under load, it's the busiest worker node and could use better ventilation.

📄text›6 lines

  1corellia:  45°C (8-core N305 + 6 SATA + NVMe)
  2mandalore: 71°C (4-core N100, busiest worker)
  3tatooine:  55°C (4-core N100)
  4kamino:    59°C (RPi 4B, passive cooled)
  5jakku:     64°C (RPi 4B, passive cooled)
  6scarif:    cloud VPS, no sensors

All within spec, but mandalore is on my radar. A 3D-printed fan duct is in the future.

The result

The RAID6 array is LUKS-encrypted at rest and served over NFSv4.2 to the rest of the cluster, unlockable remotely via SSH in initramfs 🔐.. The amd64 nodes all have encrypted root partitions, with corellia's RAID auto-unlocking via a keyfile on the decrypted root. Every node provisions from bare Debian to k3s-ready with a single Ansible command, and the Raspberry Pis run on SD cards which should last years instead of months thanks to aggressive write minimisation.

The whole provisioning repo is a modest pile of Ansible roles, which is the pleasant thing about mature infrastructure work: the boring code that reliably does its job tends to be the code worth keeping 💗.

Next up: 150 Pods on 32 Cores, multi-arch scheduling across x86 and ARM, priority classes, and why my cluster uses less power than a gaming PC.

j / k	scroll down / up
gg / G	top / bottom
Ctrl-d / Ctrl-u	half page down / up
n / N	next / prev search match
Ctrl-p	telescope (find posts)

:	command mode
/	search
?	this help
Esc	clear / cancel

:q	close (or go to index)
:q!	force close tab
:w	"save"
:wq	save position + close
:help	toggle help
:about	about page
:tags	browse tags
:tag name	filter by tag
:N	open post #N
:Telescope	fuzzy post finder
:term	floating terminal
:Lazy	plugin manager
:checkhealth	diagnostics
:!neofetch	system info
:!cowsay	moo
:read !fortune	random quote
:colorscheme <name>	tokyonight · gruvbox · catppuccin · nord · rose-pine (suffix `-day` for light)

V	visual line mode
y	yank selection
T	open terminal