๐ฆ MrWolf: Rust, Macros, and Zero Boilerplate
Second post in the MrWolf series. Previously: I Gave an AI Tools to Run My Homelab.
This is the Rust nerd post ๐ฆ. If you're here for the "what can it do" story, go read the first post. This one is about macros, middleware, and the patterns that keep the codebase manageable by one person on evenings and weekends.
Before you think I'm insane.. I didn't hand-write the whole thing from scratch. I wrote most of MrWolf with Claude's help. I designed the architecture, defined the tool interfaces, and did all the QA, testing and code review. Claude wrote most of the implementation. It's a Rust MCP server that was largely built by the same AI that uses it. There's something beautifully recursive about that ๐.
The architecture
MrWolf is a single binary that spawns two HTTP servers:
1
2async
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
That's the whole main. Metrics, MCP, and a pending() that blocks the main task while the spawned servers run. All config comes from environment variables via envy::prefixed("MRWOLF_"), which makes the binary a natural fit for a Kubernetes pod where env vars are the native config mechanism anyway.
The interesting bit is how the optional servers auto-enable:
1let media =
2 .then;
3
4let pangolin =
5 .then;
6
7let crowdsec =
8 .then;
9
10let geoblock = Some; // always on
No feature flags. No MRWOLF_MEDIA_ENABLED=true. If the API key exists, the server starts. If it doesn't, that server is None and its tools don't appear in the MCP tool list. One less thing to configure, one less thing to forget.
The macro that killed boilerplate
Every MCP tool needs the same ceremony: create a tracing span, wrap the async body in instrumentation, catch errors, convert them to user-friendly messages. Without a macro, every tool would have more scaffolding than actual logic.
This is tool_body!:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
What it does:
- Creates a tracing span named after the tool (shows up in structured logs)
- Wraps the body in an
asyncblock with a typedResult - If the body returns
Ok, pass it through - If the body returns
Err, log the error and return a friendly message instead of propagating it
That last point is critical. An LLM seeing a Rust stack trace is worse than useless. Every error becomes "Error: CrowdSec API returned HTTP 503. The upstream service may be restarting. Try again in a moment." Claude reads that and either retries or tells the user what happened. No panics, no cryptic errors.
Here's what a tool looks like with the macro:
1
2async
3
4
5
6
7
8
9
10
11
12
13
14
The actual logic is a few lines; the macro handles tracing, error conversion, and type annotations. Once you multiply that across every tool in the codebase, the savings add up fast.
There's also a variant with span fields for richer traces:
1tool_body!
2
3
Now I get structured log lines like update_allowed_countries count=26 confirmed=true without any manual span construction ๐.
The composite server
Each sub-server needs to merge into a single MCP endpoint. The composite server builds a HashMap<String, SubServer> at startup:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
At construction, it iterates every server's tool list and builds the dispatch table:
1let mut dispatch = new;
2for tool in prometheus.tool_router.list_all
3
4
5// ... repeat for each server
6if let Some = media
7
8
9
10
Dispatch at call time is O(1) โ just a HashMap lookup and a match:
1async
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Every tool call gets three Prometheus metrics automatically: call count, duration, and response size. I have a Grafana dashboard that shows which tools Claude uses most, how long they take, and how large the responses are. It's fascinating to watch ๐.
The HTTP middleware sandwich ๐ฅช
MrWolf talks to ~15 upstream services. Every request needs retries with backoff and metrics:
1pub
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The same metrics middleware is used twice with different configs:
- Outer (before retry): sees one event per logical request. Records final status, duration, and sets the
mrwolf_upstream_upgauge. This is what I alert on. - Inner (after retry): sees every attempt including retries. Records
mrwolf_http_attempts_total. This tells me if a service is flaky (lots of retries) vs. down (outer failures).
The middleware extracts the service name from the URL hostname automatically:
1
2
3
4
5
6
7
8
So every HTTP request gets labeled with the service it's talking to. Zero manual instrumentation in the tool code.
Dealing with MCP client quirks
MCP clients sometimes send numbers as strings. Like {"limit": "10"} instead of {"limit": 10}. This silently breaks normal serde deserialization, so I have lenient deserializers:
1pub
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Small thing, but without it half the tools would randomly fail. Defensive coding for AI clients ๐คท.
The ServiceClient abstraction
The media stack talks to a handful of services with slightly different APIs โ different auth headers, different URL prefixes. Instead of duplicating the HTTP scaffolding for each one:
1
2pub
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
All the media-stack tools share one HTTP client struct, so every call picks up tracing, retries, and metrics without any per-tool plumbing.
JWT caching with Arc<RwLock<>>
CrowdSec uses machine-login authentication โ POST credentials, get a JWT, use it until it expires. Here's how MrWolf handles it:
1
2pub
3
4
5
6
7async
8
9
10
11
12
13
14
15
16
17
18
19
20
And retry-on-401:
1async
2
3
4
5
6
7
8
9
10
11
Drop read lock before write lock. Invalidate on 401 and retry. The Arc<RwLock<>> makes it safe to clone the server struct (the MCP handler needs Clone) while sharing the token cache.
Pre-built PromQL queries
The Prometheus server has ~30 queries as constants โ a curated catalog that Claude can browse:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Claude doesn't need to know PromQL. It calls get_cluster_health and MrWolf fires the right queries. But if Claude wants a custom query, query_prometheus accepts raw PromQL too. Best of both worlds.
Node names across 3 subnets
My nodes are reachable over LAN, Tailscale, and WiFi. Prometheus labels use whatever IP it scraped from. This function turns any of them into a Star Wars planet name:
1pub
2
3
4
5
6
7
8
9
10
11
12
Beautiful? No. Works perfectly? Yes. Sometimes the right code is the boring code.
Confirmation gates for destructive ops
Every write tool follows the same pattern โ preview, then execute:
1
2
3
4
5
6
7
8tool_body!
9
10
11
12
13
14
15
16
17
18
19
Simple pattern, but it's the difference between "useful tool" and "unsupervised chaos" ๐ .
Testing with WireMock
Every server has tests that mock the upstream APIs. No real services needed:
1
2async
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Fast, deterministic, runs in CI. cargo nextest run finishes the whole suite in seconds.
The dependency stack
1[]
2 = { = "0.16", = ["server", "macros", "transport-streamable-http-server"] }
3 = { = "3.0", = ["runtime", "client"] }
4 = { = "0.13", = ["json", "rustls"] }
5 = "0.5"
6 = "0.9"
7 = { = "1", = ["full"] }
8 = "0.8"
9 = "0.24"
10 = "0.18"
11 = { = "1", = ["derive"] }
12 = "1.2"
13 = "0.6"
14 = "0.4"
15 = "0.1"
16
17[]
18 = "0.6"
19 = "1"
rmcp does the MCP protocol. kube does the Kubernetes API. Everything else is the standard Rust ecosystem. No frameworks, no code generation.
1
2
3
4
5
6
All of it fairly straightforward async Rust. A lot written by Claude, all of it reviewed by me ๐ฆ.
Done
MrWolf isn't clever ๐งฐ.. It's a pile of small, tested functions which each do one thing, wrapped in macros that handle the plumbing. The Rust compiler does the rest: if it compiles, the dispatch table is correct, the error handling is complete, the metrics are wired up.
That's how Mr. Wolf solves problems. Not with cleverness, but with a system ๐บ.