MrWolf: I Gave an AI the Tools to Run My Homelab

"I'm Mr. Wolf. I solve problems." — Harvey Keitel, Pulp Fiction

That's the vibe. You call Mr. Wolf when something's wrong and you don't want to think about it. He shows up, assesses the situation, tells everyone what to do, and the problem disappears.

I built a thing called MrWolf. It's a Rust server which gives Claude direct access to my k3s homelab 🐺.. Not "generate some YAML and I'll apply it" access: real, live, query-Prometheus-and-restart-pods access. Dozens of tools spread across a handful of sub-servers, covering monitoring, Kubernetes, media, security, notifications.

And it's absolutely wild 🔥.

Wait, what?

Yeah. I can sit on my couch, open Claude Code on my phone, and say "hey, something seems off with the cluster" and Claude will:

Query Prometheus for CPU, memory, disk, temperatures across all nodes
Check Alertmanager for firing alerts
Pull logs from Loki if something looks wrong
Restart a pod or scale a deployment if needed
Tell me what it did and why

No Grafana dashboards. No terminal. No remembering PromQL syntax. Just.. a conversation.

Last week it found 16 unhealthy resources in my reverse proxy, diagnosed the root cause from tunnel logs, fixed everything with one API call, then wrote a CronJob to prevent it from happening again. I didn't even get up from the couch.

I'll tell that story properly in a minute. First, some context.

MCP: the protocol that makes this possible

MCP (Model Context Protocol) is an open protocol that lets an AI call tools instead of just generating text. You define typed functions with parameters and descriptions, the AI decides when to call them, reads the results, and chains them together to solve problems.

Without MCP, you're the middleman — copying terminal output into a chat window, pasting YAML back. With MCP, the AI does the thing directly. Think → call tool → read result → think → call next tool → done.

MrWolf runs as a pod in the cluster, exposes an MCP endpoint, and Claude Code connects to it. From Claude's perspective, it's just functions. From my perspective, it's an operator that never sleeps 💤.

The tools

82 of them, across 11 sub-servers. Here's the quick tour:

Monitoring — Prometheus queries for everything: cluster health, CPU/memory/disk per node, network throughput, temperatures, RAID status, database health, Traefik request rates. Alertmanager for active alerts and silence management. Loki for log queries.

Kubernetes — list pods, describe pods, get logs, restart pods, scale deployments, check CronJob status, events, node management, MetalLB IP pools, certificate status. Roughly the stuff I used to do with kubectl.

Media — library management, calendars, download queues, watch history, transcoding status, subtitles, indexer search, media requests. Claude can tell me what's downloading, what's stuck, and what finished overnight.

Security — CrowdSec for intrusion detection (blocked IPs, attack patterns). GeoBlock for country-level IP filtering. AdGuard for DNS blocking stats and query logs.

Infrastructure — Pangolin reverse proxy management. ArgoCD deployment status. Gotify push notifications.

The optional servers auto-enable when their API credentials are configured. There's no feature flag or toggle for them; if the key is in the environment the server starts, otherwise it doesn't register. I'll explain how that works in the next post.

So what does it actually feel like?

Let me show you some real scenarios.

Cluster health in one breath

I say "check the cluster" and Claude calls get_cluster_health. One tool call, every node, every metric:

📄text›13 lines

  1Cluster Health Summary
  2━━━━━━━━━━━━━━━━━━━━━━
  3CPU Usage:
  4  corellia:  23.4%   mandalore: 18.2%   tatooine: 15.7%
  5  kamino:     8.1%   scarif:    12.3%
  6
  7Memory:
  8  corellia:  67.8%   mandalore: 52.1%   tatooine: 48.9%
  9  kamino:    41.2%   scarif:    35.6%
 10
 11RAID6 (/share): 42.3% used — 13.8 TB free
 12Firing Alerts: none
 13Pod Restarts (1h): none

Claude reads this and either says "all good" or starts digging into whatever looks off. The beautiful thing is it decides what to check next based on the data. High CPU on corellia? It'll check what pods are running there. Alert firing? It'll pull details and suggest actions.

Tool chaining is where it gets wild

Claude doesn't have a "diagnose Jellyfin buffering" tool. It has building blocks. When I asked why Jellyfin was stuttering, it:

Checked active Jellyfin sessions — found a 1080p transcode
Checked the transcode queue — other jobs competing for CPU
Checked pod resource usage — Jellyfin at 89% CPU
Checked node pressure — corellia overloaded
Told me: "Jellyfin is competing with transcoding workloads. Pause the transcode queue or scale it down."

Four tool calls, correct diagnosis. I would have spent 10 minutes clicking through Grafana to reach the same conclusion 😅.

Country blocking with a safety net

Every destructive operation has a confirmation gate:

Me: Block Russia and China from my services

Claude (calls tool with confirmed=false): CONFIRMATION REQUIRED: Remove RU and CN from the allowed countries list. Currently 28 countries allowed, will become 26. Call again with confirmed=true to execute.

Me: Do it

Claude (calls with confirmed=true): Done. 26 countries allowed. Russia and China blocked.

Scale a deployment? It shows current replicas first. Delete an alert silence? It shows what would become unsilenced. Every destructive call has a preview, and nothing executes until I confirm; the AI proposes, the human approves.

The story that made it all worth it 🐺

This is the one. The Pangolin healthcheck saga.

My reverse proxy (Pangolin) has a health checker that monitors ~34 services exposed through a WireGuard tunnel. Every few days, it gets globally stuck — all health checks fail after a tunnel reconnection. The fix is beautifully stupid: toggle the health check off and on for any single resource and the whole thing unsticks.

I'd been doing this manually through the web dashboard. Click, uncheck, save, wait, check, save. Every. Few. Days.

One evening I asked Claude to check on things. In a single conversation, it:

Listed Pangolin resources — 16 of 34 unhealthy
Pulled tunnel logs from Loki — found reconnection events
Retriggered one health check via the API — all 16 went healthy instantly
I said "can we automate this?"
Claude wrote an hourly Kubernetes CronJob that checks for unhealthy resources and toggles one to unstick the checker
I reviewed the CronJob and deployed it

The bigger picture

Here's what actually changed in how I operate the cluster.

Before MrWolf, debugging looked like this: notice something wrong → open Grafana → remember the PromQL query → read the result → form a hypothesis → open a terminal → kubectl into the right namespace → check logs → restart things → repeat until fixed.

After MrWolf: "Hey Claude, something seems off with [thing]" → Claude chains tools → "here's what's wrong and what to do" → "fix it" → confirmed → done.

The time savings are nice, but that's not the real win. The real win is that Claude remembers how to check everything. I don't need to remember the PromQL query for disk pressure. I don't need to remember which namespace a service runs in. I don't need to remember the kubectl logs incantation for a specific container in a multi-container pod. MrWolf gives Claude the tools and Claude figures out which ones to use.

A pile of API wrappers is just a pile of API wrappers, but a pile an AI can chain together in any combination starts to look like an operator, which is what Mr. Wolf is supposed to be.

Quick stats

Because I can't help myself:

Dozens of tools across a handful of sub-servers
A Rust codebase that's chunky but not huge
Optional servers that auto-enable from credentials
Every write operation goes through a confirmation gate
Exponential-backoff retries on every HTTP call
Per-tool Prometheus metrics — I can see how Claude uses the cluster

Is it overkill for a homelab? Absolutely. But it's mine and when something breaks at 3 AM and I can fix it from my phone, it doesn't feel like overkill at all 🚀.

Next up: how MrWolf is built — macros that kill boilerplate, and the patterns that make the codebase maintainable.

j / k	scroll down / up
gg / G	top / bottom
Ctrl-d / Ctrl-u	half page down / up
n / N	next / prev search match
Ctrl-p	telescope (find posts)

:	command mode
/	search
?	this help
Esc	clear / cancel

:q	close (or go to index)
:q!	force close tab
:w	"save"
:wq	save position + close
:help	toggle help
:about	about page
:tags	browse tags
:tag name	filter by tag
:N	open post #N
:Telescope	fuzzy post finder
:term	floating terminal
:Lazy	plugin manager
:checkhealth	diagnostics
:!neofetch	system info
:!cowsay	moo
:read !fortune	random quote
:colorscheme <name>	tokyonight · gruvbox · catppuccin · nord · rose-pine (suffix `-day` for light)

V	visual line mode
y	yank selection
T	open terminal