MrWolf: Rust, Macros, and Zero Boilerplate

Second post in the MrWolf series. Previously: I Gave an AI Tools to Run My Homelab.

This is the Rust nerd post 🦀. If you're here for the "what can it do" story, go read the first post. This one is about macros, middleware, and the patterns that keep the codebase manageable by one person on evenings and weekends.

Before you think I'm insane.. I didn't hand-write the whole thing from scratch. I wrote most of MrWolf with Claude's help. I designed the architecture, defined the tool interfaces, and did all the QA, testing and code review. Claude wrote most of the implementation. It's a Rust MCP server that was largely built by the same AI that uses it. There's something beautifully recursive about that 🔄.

The architecture

MrWolf is a single binary that spawns two HTTP servers:

🦀rust›20 lines

  1#[tokio::main]
  2async fn main() -> Result<()> {
  3    let settings = Settings::from_env()?;
  4    let client = http::build_client(15)?;
  5
  6    // Metrics server on :8080
  7    tokio::spawn(async move {
  8        metrics::serve_metrics(metrics_port).await
  9    });
 10
 11    // MCP server on :8081
 12    if settings.mcp_enabled {
 13        let composite = build_composite_server(&settings, &client).await;
 14        tokio::spawn(async move {
 15            serve_mcp(composite, mcp_port).await
 16        });
 17    }
 18
 19    std::future::pending::<()>().await; // block forever
 20}

That's the whole main. Metrics, MCP, and a pending() that blocks the main task while the spawned servers run. All config comes from environment variables via envy::prefixed("MRWOLF_"), which makes the binary a natural fit for a Kubernetes pod where env vars are the native config mechanism anyway.

The interesting bit is how the optional servers auto-enable:

🦀rust›10 lines

  1let media = (!settings.sonarr_api_key.is_empty())
  2    .then(|| MediaServer::new(client.clone(), media_config));
  3
  4let pangolin = (!settings.pangolin_token.is_empty())
  5    .then(|| PangolinServer::new(client.clone(), ...));
  6
  7let crowdsec = (!settings.crowdsec_api_key.is_empty())
  8    .then(|| CrowdSecServer::new(client.clone(), ...));
  9
 10let geoblock = Some(GeoBlockServer::new(client.clone(), ...)); // always on

No feature flags. No MRWOLF_MEDIA_ENABLED=true. If the API key exists, the server starts. If it doesn't, that server is None and its tools don't appear in the MCP tool list. One less thing to configure, one less thing to forget.

The macro that killed boilerplate

Every MCP tool needs the same ceremony: create a tracing span, wrap the async body in instrumentation, catch errors, convert them to user-friendly messages. Without a macro, every tool would have more scaffolding than actual logic.

This is tool_body!:

🦀rust›23 lines

  1macro_rules! tool_body {
  2    ($name:literal => $body:block) => {{
  3        let span = ::tracing::info_span!($name);
  4        let result = ::tracing::Instrument::instrument(
  5            async {
  6                let __r: color_eyre::Result<CallToolResult> = { $body };
  7                __r
  8            },
  9            span,
 10        )
 11        .await;
 12        match result {
 13            Ok(v) => Ok(v),
 14            Err(err) => {
 15                ::tracing::warn!(tool = $name, error = %err, "tool call failed");
 16                Ok(text_result(format!(
 17                    "Error: {err:#}\n\nThe upstream service may be \
 18                     restarting or unavailable. Try again in a moment."
 19                )))
 20            }
 21        }
 22    }};
 23}

What it does:

Creates a tracing span named after the tool (shows up in structured logs)
Wraps the body in an async block with a typed Result
If the body returns Ok, pass it through
If the body returns Err, log the error and return a friendly message instead of propagating it

That last point is critical. An LLM seeing a Rust stack trace is worse than useless. Every error becomes "Error: CrowdSec API returned HTTP 503. The upstream service may be restarting. Try again in a moment." Claude reads that and either retries or tells the user what happened. No panics, no cryptic errors.

Here's what a tool looks like with the macro:

🦀rust›14 lines

  1#[tool(name = "get_allowed_countries")]
  2async fn get_allowed_countries(&self) -> Result<CallToolResult, ErrorData> {
  3    tool_body!("get_allowed_countries" => {
  4        let data = self.api_get("/api/countries").await?;
  5        let countries: Vec<&str> = data.as_array()
  6            .map(|a| a.iter().filter_map(|v| v.as_str()).collect())
  7            .unwrap_or_default();
  8        Ok(text_result(format!(
  9            "Allowed Countries ({}):\n  {}",
 10            countries.len(),
 11            countries.join(", ")
 12        )))
 13    })
 14}

The actual logic is a few lines; the macro handles tracing, error conversion, and type annotations. Once you multiply that across every tool in the codebase, the savings add up fast.

There's also a variant with span fields for richer traces:

🦀rust›3 lines

  1tool_body!("update_allowed_countries", [count = p.countries.len(), confirmed = p.confirmed] => {
  2    // ...
  3})

Now I get structured log lines like update_allowed_countries count=26 confirmed=true without any manual span construction 💗.

The composite server

Each sub-server needs to merge into a single MCP endpoint. The composite server builds a HashMap<String, SubServer> at startup:

🦀rust›21 lines

  1#[derive(Clone, Copy)]
  2enum SubServer {
  3    Prometheus,
  4    Alertmanager,
  5    Loki,
  6    Kubernetes,
  7    Gotify,
  8    Media,
  9    Pangolin,
 10    CrowdSec,
 11    AdGuard,
 12    ArgoCD,
 13    GeoBlock,
 14}
 15
 16struct MrWolfServer {
 17    prometheus: PrometheusServer,
 18    alertmanager: AlertmanagerServer,
 19    // ... 9 more (some Option<T>)
 20    dispatch: HashMap<String, SubServer>,
 21}

At construction, it iterates every server's tool list and builds the dispatch table:

🦀rust›10 lines

  1let mut dispatch = HashMap::new();
  2for tool in prometheus.tool_router.list_all() {
  3    dispatch.insert(tool.name.to_string(), SubServer::Prometheus);
  4}
  5// ... repeat for each server
  6if let Some(ref media) = media {
  7    for tool in media.tool_router.list_all() {
  8        dispatch.insert(tool.name.to_string(), SubServer::Media);
  9    }
 10}

Dispatch at call time is O(1) — just a HashMap lookup and a match:

🦀rust›20 lines

  1async fn call_tool(&self, request: CallToolRequestParams, context: RequestContext<RoleServer>)
  2    -> Result<CallToolResult, ErrorData>
  3{
  4    let tool_name = request.name.to_string();
  5    let start = Instant::now();
  6
  7    let result = match self.dispatch.get(request.name.as_ref()) {
  8        Some(SubServer::Prometheus) => self.prometheus.call_tool(request, context).await,
  9        Some(SubServer::Alertmanager) => self.alertmanager.call_tool(request, context).await,
 10        // ... all 11 arms
 11        None => Err(ErrorData { message: format!("Unknown tool: {}", request.name).into(), .. }),
 12    };
 13
 14    // Metrics for every tool call — automatic, no per-tool code needed
 15    counter!("mrwolf_tool_calls_total", "tool" => tool_name.clone(), "status" => status).increment(1);
 16    histogram!("mrwolf_tool_duration_seconds", "tool" => tool_name.clone()).record(duration);
 17    histogram!("mrwolf_tool_response_size_bytes", "tool" => tool_name).record(size as f64);
 18
 19    result
 20}

Every tool call gets three Prometheus metrics automatically: call count, duration, and response size. I have a Grafana dashboard that shows which tools Claude uses most, how long they take, and how large the responses are. It's fascinating to watch 📊.

The HTTP middleware sandwich 🥪

MrWolf talks to ~15 upstream services. Every request needs retries with backoff and metrics:

🦀rust›15 lines

  1pub(crate) fn build_client(timeout_secs: u64) -> Result<ClientWithMiddleware> {
  2    let retry_policy = ExponentialBackoff::builder().build_with_max_retries(3);
  3
  4    let client = reqwest_middleware::ClientBuilder::new(
  5        reqwest::Client::builder()
  6            .timeout(Duration::from_secs(timeout_secs))
  7            .build()?,
  8    )
  9    .with(HttpMetricsMiddleware::outer())           // 1. Track logical requests
 10    .with(RetryTransientMiddleware::new_with_policy(retry_policy))  // 2. Retry on 5xx/timeout
 11    .with(HttpMetricsMiddleware::inner())            // 3. Track individual attempts
 12    .build();
 13
 14    Ok(client)
 15}

The same metrics middleware is used twice with different configs:

Outer (before retry): sees one event per logical request. Records final status, duration, and sets the mrwolf_upstream_up gauge. This is what I alert on.
Inner (after retry): sees every attempt including retries. Records mrwolf_http_attempts_total. This tells me if a service is flaky (lots of retries) vs. down (outer failures).

The middleware extracts the service name from the URL hostname automatically:

🦀rust›8 lines

  1fn service_from_host(host: &str) -> &str {
  2    match host.split('.').next().unwrap_or(host) {
  3        "prometheus-kube-prometheus-prometheus" => "prometheus",
  4        "prometheus-kube-prometheus-alertmanager" => "alertmanager",
  5        "loki-gateway" => "loki",
  6        other => other, // sonarr, radarr, jellyfin, gotify...
  7    }
  8}

So every HTTP request gets labeled with the service it's talking to. Zero manual instrumentation in the tool code.

Dealing with MCP client quirks

MCP clients sometimes send numbers as strings. Like {"limit": "10"} instead of {"limit": 10}. This silently breaks normal serde deserialization, so I have lenient deserializers:

🦀rust›18 lines

  1pub(crate) mod serde_usize_lenient {
  2    pub(crate) fn deserialize<'de, D: Deserializer<'de>>(
  3        deserializer: D,
  4    ) -> Result<Option<usize>, D::Error> {
  5        #[derive(Deserialize)]
  6        #[serde(untagged)]
  7        enum StringOrNum {
  8            Num(usize),
  9            Str(String),
 10        }
 11        Option::<StringOrNum>::deserialize(deserializer)?
 12            .map(|v| match v {
 13                StringOrNum::Num(n) => Ok(n),
 14                StringOrNum::Str(s) => s.parse::<usize>().map_err(de::Error::custom),
 15            })
 16            .transpose()
 17    }
 18}

Small thing, but without it half the tools would randomly fail. Defensive coding for AI clients 🤷.

The ServiceClient abstraction

The media stack talks to a handful of services with slightly different APIs — different auth headers, different URL prefixes. Instead of duplicating the HTTP scaffolding for each one:

🦀rust›23 lines

  1#[derive(Clone)]
  2pub(super) struct ServiceClient {
  3    pub(super) client: ClientWithMiddleware,
  4    pub(super) base_url: String,
  5    pub(super) api_prefix: &'static str,   // "/api/v3", "/api/v1", etc.
  6    pub(super) auth_header: &'static str,  // "X-API-Key", "Authorization", etc.
  7    pub(super) api_key: String,
  8    pub(super) name: &'static str,         // for tracing spans
  9}
 10
 11impl ServiceClient {
 12    #[instrument(skip(self), fields(service = self.name))]
 13    pub(super) async fn get(&self, path: &str) -> color_eyre::Result<serde_json::Value> {
 14        let url = format!("{}{}/{path}", self.base_url, self.api_prefix);
 15        self.client
 16            .get(&url)
 17            .header(self.auth_header, &self.api_key)
 18            .send().await?
 19            .error_for_status()?
 20            .json().await
 21            .wrap_err_with(|| format!("{} request failed", self.name))
 22    }
 23}

All the media-stack tools share one HTTP client struct, so every call picks up tracing, retries, and metrics without any per-tool plumbing.

JWT caching with Arc<RwLock<>>

CrowdSec uses machine-login authentication — POST credentials, get a JWT, use it until it expires. Here's how MrWolf handles it:

🦀rust›20 lines

  1#[derive(Clone)]
  2pub(crate) struct CrowdSecServer {
  3    jwt_token: Arc<RwLock<Option<String>>>,  // shared across clones
  4    // ...
  5}
  6
  7async fn get_jwt(&self) -> color_eyre::Result<String> {
  8    // Try read lock first (fast path)
  9    {
 10        let cached = self.jwt_token.read().await;
 11        if let Some(ref token) = *cached {
 12            return Ok(token.clone());
 13        }
 14    } // drop read lock before acquiring write lock
 15
 16    // Slow path: login and cache
 17    let token = self.machine_login().await?;
 18    *self.jwt_token.write().await = Some(token.clone());
 19    Ok(token)
 20}

And retry-on-401:

🦀rust›11 lines

  1async fn machine_get(&self, path: &str) -> color_eyre::Result<serde_json::Value> {
  2    for attempt in 0..2 {
  3        let token = self.get_jwt().await?;
  4        let resp = self.client.get(&url).bearer_auth(&token).send().await?;
  5        if resp.status() != 401 {
  6            return Ok(resp.json().await?);
  7        }
  8        self.invalidate_jwt().await; // force re-login on retry
  9    }
 10    bail!("CrowdSec auth failed after 2 attempts")
 11}

Drop read lock before write lock. Invalidate on 401 and retry. The Arc<RwLock<>> makes it safe to clone the server struct (the MCP handler needs Clone) while sharing the token cache.

Pre-built PromQL queries

The Prometheus server has ~30 queries as constants — a curated catalog that Claude can browse:

🦀rust›18 lines

  1mod queries {
  2    pub(super) const CPU_BY_NODE: &str =
  3        r#"100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)"#;
  4
  5    pub(super) const MEMORY_BY_NODE: &str =
  6        "(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100";
  7
  8    pub(super) const POD_RESTARTS_1H: &str =
  9        "sum by(namespace, pod) (increase(kube_pod_container_status_restarts_total[1h])) > 0";
 10
 11    pub(super) const RAID_DISKS: &str =
 12        r#"node_md_disks{instance="corellia",device="md0"}"#;
 13
 14    // Dynamic queries that accept parameters
 15    pub(super) fn network_receive_rate(period: &str) -> String {
 16        format!(r#"rate(node_network_receive_bytes_total{{device!~"lo|veth.*|cali.*"}}[{period}]) * 8"#)
 17    }
 18}

Claude doesn't need to know PromQL. It calls get_cluster_health and MrWolf fires the right queries. But if Claude wants a custom query, query_prometheus accepts raw PromQL too. Best of both worlds.

Node names across 3 subnets

My nodes are reachable over LAN, Tailscale, and WiFi. Prometheus labels use whatever IP it scraped from. This function turns any of them into a Star Wars planet name:

🦀rust›12 lines

  1pub(crate) fn resolve_node_name(instance: &str) -> &str {
  2    let host = instance.split(':').next().unwrap_or(instance);
  3    match host {
  4        "10.0.1.252" | "100.64.0.1" | "192.168.1.100" | "corellia" => "corellia",
  5        "10.0.1.251" | "100.64.0.3" | "192.168.1.102" | "tatooine" => "tatooine",
  6        "10.0.1.253" | "100.64.0.13" | "192.168.1.101" | "mandalore" => "mandalore",
  7        "100.64.0.6" | "192.168.1.103" | "scarif" => "scarif",
  8        "10.0.1.249" | "100.64.0.8" | "kamino" => "kamino",
  9        "10.0.1.248" | "100.64.0.9" | "jakku" => "jakku",
 10        _ => instance,
 11    }
 12}

Beautiful? No. Works perfectly? Yes. Sometimes the right code is the boring code.

Confirmation gates for destructive ops

Every write tool follows the same pattern — preview, then execute:

🦀rust›19 lines

  1#[derive(Debug, Deserialize, JsonSchema)]
  2struct UpdateAllowedCountriesParams {
  3    countries: Vec<String>,
  4    /// Must be true to execute. If false, shows a preview.
  5    confirmed: bool,
  6}
  7
  8tool_body!("update_allowed_countries", [count = p.countries.len(), confirmed = p.confirmed] => {
  9    if !p.confirmed {
 10        return Ok(text_result(format!(
 11            "CONFIRMATION REQUIRED: Replace allowed countries with {} entries: [{}]. \
 12             All other countries will be BLOCKED. \
 13             Call again with confirmed=true to execute.",
 14            p.countries.len(), p.countries.join(", ")
 15        )));
 16    }
 17    self.api_put("/api/countries", serde_json::json!(p.countries)).await?;
 18    Ok(text_result(format!("Updated: {} countries allowed.", p.countries.len())))
 19})

Simple pattern, but it's the difference between "useful tool" and "unsupervised chaos" 😅.

Testing with WireMock

Every server has tests that mock the upstream APIs. No real services needed:

🦀rust›18 lines

  1#[tokio::test]
  2async fn tool_get_allowed_countries() {
  3    let mock = MockServer::start().await;
  4    Mock::given(method("GET"))
  5        .and(path("/api/countries"))
  6        .respond_with(
  7            ResponseTemplate::new(200)
  8                .set_body_json(serde_json::json!(["CN", "RU", "KP"]))
  9        )
 10        .mount(&mock)
 11        .await;
 12
 13    let server = GeoBlockServer::new(build_client(5).unwrap(), mock.uri());
 14    let result = server.get_allowed_countries().await.unwrap();
 15    let text: &str = result.content[0].raw.as_text().unwrap().text.as_ref();
 16    assert!(text.contains("CN"));
 17    assert!(text.contains("3"));
 18}

Fast, deterministic, runs in CI. cargo nextest run finishes the whole suite in seconds.

The dependency stack

⚙toml›19 lines

  1[dependencies]
  2rmcp = { version = "0.16", features = ["server", "macros", "transport-streamable-http-server"] }
  3kube = { version = "3.0", features = ["runtime", "client"] }
  4reqwest = { version = "0.13", features = ["json", "rustls"] }
  5reqwest-middleware = "0.5"
  6reqwest-retry = "0.9"
  7tokio = { version = "1", features = ["full"] }
  8axum = "0.8"
  9metrics = "0.24"
 10metrics-exporter-prometheus = "0.18"
 11serde = { version = "1", features = ["derive"] }
 12schemars = "1.2"
 13color-eyre = "0.6"
 14envy = "0.4"
 15tracing = "0.1"
 16
 17[dev-dependencies]
 18wiremock = "0.6"
 19pretty_assertions = "1"

rmcp does the MCP protocol. kube does the Kubernetes API. Everything else is the standard Rust ecosystem. No frameworks, no code generation.

❯_sh›6 lines

  1❯❯❯ tokei src/
  2━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  3 Language        Files     Lines      Code   Comments     Blanks
  4━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  5 Rust               28     14026     11963        456       1607
  6━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

All of it fairly straightforward async Rust. A lot written by Claude, all of it reviewed by me 🦀.

Done

MrWolf isn't clever 🧰.. It's a pile of small, tested functions which each do one thing, wrapped in macros that handle the plumbing. The Rust compiler does the rest: if it compiles, the dispatch table is correct, the error handling is complete, the metrics are wired up.

That's how Mr. Wolf solves problems. Not with cleverness, but with a system 🐺.

Next up: what I've learned about giving an AI the keys to infrastructure, what works, what breaks, and what's next.

j / k	scroll down / up
gg / G	top / bottom
Ctrl-d / Ctrl-u	half page down / up
n / N	next / prev search match
Ctrl-p	telescope (find posts)

:	command mode
/	search
?	this help
Esc	clear / cancel

:q	close (or go to index)
:q!	force close tab
:w	"save"
:wq	save position + close
:help	toggle help
:about	about page
:tags	browse tags
:tag name	filter by tag
:N	open post #N
:Telescope	fuzzy post finder
:term	floating terminal
:Lazy	plugin manager
:checkhealth	diagnostics
:!neofetch	system info
:!cowsay	moo
:read !fortune	random quote
:colorscheme <name>	tokyonight · gruvbox · catppuccin · nord · rose-pine (suffix `-day` for light)

V	visual line mode
y	yank selection
T	open terminal