๐บ MrWolf: I Gave an AI the Tools to Run My Homelab
"I'm Mr. Wolf. I solve problems." โ Harvey Keitel, Pulp Fiction
That's the vibe. You call Mr. Wolf when something's wrong and you don't want to think about it. He shows up, assesses the situation, tells everyone what to do, and the problem disappears.
I built a thing called MrWolf. It's a Rust server which gives Claude direct access to my k3s homelab ๐บ.. Not "generate some YAML and I'll apply it" access: real, live, query-Prometheus-and-restart-pods access. Dozens of tools spread across a handful of sub-servers, covering monitoring, Kubernetes, media, security, notifications.
And it's absolutely wild ๐ฅ.
Wait, what?
Yeah. I can sit on my couch, open Claude Code on my phone, and say "hey, something seems off with the cluster" and Claude will:
- Query Prometheus for CPU, memory, disk, temperatures across all nodes
- Check Alertmanager for firing alerts
- Pull logs from Loki if something looks wrong
- Restart a pod or scale a deployment if needed
- Tell me what it did and why
No Grafana dashboards. No terminal. No remembering PromQL syntax. Just.. a conversation.
Last week it found 16 unhealthy resources in my reverse proxy, diagnosed the root cause from tunnel logs, fixed everything with one API call, then wrote a CronJob to prevent it from happening again. I didn't even get up from the couch.
I'll tell that story properly in a minute. First, some context.
MCP: the protocol that makes this possible
MCP (Model Context Protocol) is an open protocol that lets an AI call tools instead of just generating text. You define typed functions with parameters and descriptions, the AI decides when to call them, reads the results, and chains them together to solve problems.
Without MCP, you're the middleman โ copying terminal output into a chat window, pasting YAML back. With MCP, the AI does the thing directly. Think โ call tool โ read result โ think โ call next tool โ done.
MrWolf runs as a pod in the cluster, exposes an MCP endpoint, and Claude Code connects to it. From Claude's perspective, it's just functions. From my perspective, it's an operator that never sleeps ๐ค.
The tools
82 of them, across 11 sub-servers. Here's the quick tour:
Monitoring โ Prometheus queries for everything: cluster health, CPU/memory/disk per node, network throughput, temperatures, RAID status, database health, Traefik request rates. Alertmanager for active alerts and silence management. Loki for log queries.
Kubernetes โ list pods, describe pods, get logs, restart pods, scale deployments, check CronJob status, events, node management, MetalLB IP pools, certificate status. Roughly the stuff I used to do with kubectl.
Media โ library management, calendars, download queues, watch history, transcoding status, subtitles, indexer search, media requests. Claude can tell me what's downloading, what's stuck, and what finished overnight.
Security โ CrowdSec for intrusion detection (blocked IPs, attack patterns). GeoBlock for country-level IP filtering. AdGuard for DNS blocking stats and query logs.
Infrastructure โ Pangolin reverse proxy management. ArgoCD deployment status. Gotify push notifications.
The optional servers auto-enable when their API credentials are configured. There's no feature flag or toggle for them; if the key is in the environment the server starts, otherwise it doesn't register. I'll explain how that works in the next post.
So what does it actually feel like?
Let me show you some real scenarios.
Cluster health in one breath
I say "check the cluster" and Claude calls get_cluster_health. One tool call, every node, every metric:
1Cluster Health Summary
2โโโโโโโโโโโโโโโโโโโโโโ
3CPU Usage:
4 corellia: 23.4% mandalore: 18.2% tatooine: 15.7%
5 kamino: 8.1% scarif: 12.3%
6
7Memory:
8 corellia: 67.8% mandalore: 52.1% tatooine: 48.9%
9 kamino: 41.2% scarif: 35.6%
10
11RAID6 (/share): 42.3% used โ 13.8 TB free
12Firing Alerts: none
13Pod Restarts (1h): none
Claude reads this and either says "all good" or starts digging into whatever looks off. The beautiful thing is it decides what to check next based on the data. High CPU on corellia? It'll check what pods are running there. Alert firing? It'll pull details and suggest actions.
Tool chaining is where it gets wild
Claude doesn't have a "diagnose Jellyfin buffering" tool. It has building blocks. When I asked why Jellyfin was stuttering, it:
- Checked active Jellyfin sessions โ found a 1080p transcode
- Checked the transcode queue โ other jobs competing for CPU
- Checked pod resource usage โ Jellyfin at 89% CPU
- Checked node pressure โ corellia overloaded
- Told me: "Jellyfin is competing with transcoding workloads. Pause the transcode queue or scale it down."
Four tool calls, correct diagnosis. I would have spent 10 minutes clicking through Grafana to reach the same conclusion ๐ .
Country blocking with a safety net
Every destructive operation has a confirmation gate:
Me: Block Russia and China from my services
Claude (calls tool with confirmed=false): CONFIRMATION REQUIRED: Remove RU and CN from the allowed countries list. Currently 28 countries allowed, will become 26. Call again with confirmed=true to execute.
Me: Do it
Claude (calls with confirmed=true): Done. 26 countries allowed. Russia and China blocked.
Scale a deployment? It shows current replicas first. Delete an alert silence? It shows what would become unsilenced. Every destructive call has a preview, and nothing executes until I confirm; the AI proposes, the human approves.
The story that made it all worth it ๐บ
This is the one. The Pangolin healthcheck saga.
My reverse proxy (Pangolin) has a health checker that monitors ~34 services exposed through a WireGuard tunnel. Every few days, it gets globally stuck โ all health checks fail after a tunnel reconnection. The fix is beautifully stupid: toggle the health check off and on for any single resource and the whole thing unsticks.
I'd been doing this manually through the web dashboard. Click, uncheck, save, wait, check, save. Every. Few. Days.
One evening I asked Claude to check on things. In a single conversation, it:
- Listed Pangolin resources โ 16 of 34 unhealthy
- Pulled tunnel logs from Loki โ found reconnection events
- Retriggered one health check via the API โ all 16 went healthy instantly
- I said "can we automate this?"
- Claude wrote an hourly Kubernetes CronJob that checks for unhealthy resources and toggles one to unstick the checker
- I reviewed the CronJob and deployed it
The bigger picture
Here's what actually changed in how I operate the cluster.
Before MrWolf, debugging looked like this: notice something wrong โ open Grafana โ remember the PromQL query โ read the result โ form a hypothesis โ open a terminal โ kubectl into the right namespace โ check logs โ restart things โ repeat until fixed.
After MrWolf: "Hey Claude, something seems off with [thing]" โ Claude chains tools โ "here's what's wrong and what to do" โ "fix it" โ confirmed โ done.
The time savings are nice, but that's not the real win. The real win is that Claude remembers how to check everything. I don't need to remember the PromQL query for disk pressure. I don't need to remember which namespace a service runs in. I don't need to remember the kubectl logs incantation for a specific container in a multi-container pod. MrWolf gives Claude the tools and Claude figures out which ones to use.
A pile of API wrappers is just a pile of API wrappers, but a pile an AI can chain together in any combination starts to look like an operator, which is what Mr. Wolf is supposed to be.
Quick stats
Because I can't help myself:
- Dozens of tools across a handful of sub-servers
- A Rust codebase that's chunky but not huge
- Optional servers that auto-enable from credentials
- Every write operation goes through a confirmation gate
- Exponential-backoff retries on every HTTP call
- Per-tool Prometheus metrics โ I can see how Claude uses the cluster
Is it overkill for a homelab? Absolutely. But it's mine and when something breaks at 3 AM and I can fix it from my phone, it doesn't feel like overkill at all ๐.