How I Tunnel 40 Services Through a VPS (CGNAT Is Not My Boss)

This is the first post in a series about my k3s homelab. All nodes are named after Star Wars planets because of course they are.

So, I run a 7-node k3s cluster at home. 150 pods, 40 ingresses, 32 namespaces. It does monitoring, media streaming, file storage, git hosting, DNS, CI/CD.. the whole thing. It's been running for years and I love it 💗.

One day my ISP decided to put me behind CGNAT. Well, technically they offered to keep the public IP.. for £5/month. Five quid a month for something that used to be free? No thanks.

Every service I was exposing to the internet, gone.

I had to do something!

What is CGNAT and why it ruins your day

CGNAT (Carrier-Grade NAT) means your ISP gives you a private IP and NATs your traffic through a shared public one. You can browse the web fine, but nobody on the internet can reach you. Port forwarding? Dead. DynDNS? Useless. Your homelab just became invisible.

The common advice on r/selfhosted is "just use Cloudflare Tunnel" or "get a business line." But I'm the kind of person who runs their own git server, so you can imagine how I feel about routing all my traffic through Cloudflare 🤮.

The solution: a cheap VPS as a relay station

The idea is simple. You rent a small VPS with a public IP (mine is called scarif, because it's the outer rim relay), and you tunnel all traffic from the internet through it back to your home cluster.

Here's the full chain:

📄text›1 lines

  1Internet → scarif (:443) → traefik-edge → pangolin → newt (WireGuard) → home traefik → services

Let me break this down.

The stack

I use Pangolin with its companion tools Gerbil and Newt. Pangolin is a self-hosted reverse proxy manager with built-in SSO, zero-trust networking, and WireGuard tunnels. Think of it as a self-hosted alternative to Cloudflare Tunnel + Cloudflare Access, but you control the whole thing.

On scarif (the VPS):

Pangolin — The brains. API server, management dashboard, SSO authentication (we use Google OIDC). Manages resources, sites, and tunnel configuration. Every public service is a Pangolin "resource" with health checks pushed to the newt agent via WebSocket
Gerbil — WireGuard tunnel endpoint. Creates and manages the wg0 interface, handles peer registration, reports bandwidth back to Pangolin
Traefik-edge — Reverse proxy that terminates TLS from the internet, routes traffic to the right backend through the tunnel
CrowdSec — WAF sidecar that reads traefik access logs in real time

Gerbil, traefik-edge, and CrowdSec run in a single pod sharing hostNetwork: true — same pattern as Docker's network_mode: service:. They share the network namespace so traefik can bind to gerbil's WireGuard interface and CrowdSec can tail traefik's log file via an emptyDir volume.

📋yaml›10 lines

  1# The gerbil-traefik pod: 3 containers, 1 network namespace
  2hostNetwork: true
  3dnsPolicy: ClusterFirstWithHostNet
  4containers:
  5- name: gerbil     # WireGuard tunnel management
  6- name: traefik    # TLS termination + routing
  7- name: crowdsec   # WAF, reads traefik access logs
  8volumes:
  9- name: traefik-logs
 10  emptyDir: {}     # shared log volume between traefik and crowdsec

At home:

Newt — WireGuard tunnel client running in kube-system. Connects to gerbil on scarif and exposes the home cluster's services through the tunnel
Traefik (home) — Standard k3s ingress controller behind MetalLB at lb.crisidev.lan

When a request hits blog.crisidev.org, DNS resolves to scarif's public IP. Traefik-edge receives it, authenticates through Pangolin SSO, and forwards it through the WireGuard tunnel to home traefik, which routes it to the right pod.

Why Pangolin over the alternatives?

I evaluated a few options before settling on Pangolin:

Cloudflare Tunnel — Works great, but I don't want Cloudflare inspecting all my traffic. I run my own git server, my own DNS, my own certificate authority.. I'm not about to hand the keys to a corporation. Supply chain risk, proprietary code, and an "ammerregaaaa" company that can enshittify the product whenever they want? No thanks 🤮
Tailscale Funnel — I already use Tailscale extensively across the cluster, but with a self-hosted Headscale control plane for exactly the same reason I don't use Cloudflare: I don't trust proprietary coordination servers. Funnel itself is limited to 3 hosts on the free tier and adds latency through their relay network
frp / ngrok — frp is solid but doesn't have SSO or a management UI. ngrok is SaaS with limits
WireGuard + manual setup — Too much operational overhead. Pangolin handles peer registration, health checks, and DNS automatically

Pangolin won because it's fully self-hosted, has built-in SSO, includes a nice management dashboard, and the WireGuard tunnel is rock solid.. most of the time 😅.

Preserving real client IPs

One thing that drove me crazy was losing the real client IP through the proxy chain. By default, every hop sees the IP of the previous hop, not the actual client.

The fix is X-Forwarded-For headers with trusted proxy CIDRs. Both traefik-edge on scarif and home traefik trust the same ranges.

Home traefik also uses externalTrafficPolicy: Local on the LoadBalancer service to prevent Kubernetes from SNATing the traffic. The result: every request shows the real client IP in ClientHost, X-Real-Ip, and X-Forwarded-For.

Health check probes from Pangolin itself show the newt pod IP (10.42.x.x) — that's expected, since those requests originate inside the tunnel.

The debugging story from hell: stale WireGuard keys 🔥

This one took me days to figure out.

After reinstalling Pangolin (new database, new keypairs), the newt tunnels appeared connected. Logs showed "Peer added successfully" on gerbil, and newt kept trying to establish the tunnel. But pings failed with "i/o timeout" forever.

No errors. No authentication failures. Just.. silence.

I ran tcpdump on scarif and saw the problem:

❯_bash›6 lines

  1# HandshakeInitiation packets arriving from home every ~5 seconds
  2❯❯❯ tcpdump -i eth0 udp port 51821
  3152.37.x.x:PORT > scarif:51821, length 148  # home → scarif
  4152.37.x.x:PORT > scarif:51821, length 148
  5152.37.x.x:PORT > scarif:51821, length 148
  6# ...zero HandshakeResponse packets going back

Home was knocking. Scarif was ignoring it.

The root cause

The wg0 interface persists in scarif's host network namespace across pod restarts (because hostNetwork: true). When gerbil starts and wg0 already exists, it reconfigures peers and the listen port but does not replace the private key.

After a Pangolin reinstall, the database has a fresh keypair. The new public key gets distributed to newt (home). But wg0 on scarif still has the old private key. When home sends a WireGuard handshake encrypted with the new public key, wg0 tries to decrypt it with the old private key → silent drop. No error, no log, nothing.

The fix

Delete wg0 and let gerbil recreate it from scratch:

❯_bash›2 lines

  1❯❯❯ kubectl exec -n pangolin <gerbil-pod> -c gerbil -- ip link delete wg0
  2❯❯❯ kubectl rollout restart deployment/gerbil-traefik -n pangolin

After the restart, gerbil logs "Created WireGuard interface wg0" (not just "configured") and newt reports "Tunnel connection to server established successfully!" 🚀

I wrote a diagnostic script that checks for this exact failure mode:

❯_bash›2 lines

  1❯❯❯ check-newt.sh           # diagnose: checks tunnel status, wg0 traffic, iptables
  2❯❯❯ check-newt.sh --fix     # diagnose + fix: deletes wg0, restarts gerbil, waits for tunnels

The script samples wg0 rx packets over 5 seconds and cross-references with iptables counters on UDP 51821. If packets arrive at the port but wg0 receives zero decrypted traffic, it's a stale key. Simple, but it would have saved me two days of staring at logs.

The second failure mode: Tailscale vs WireGuard

There's a bonus failure mode I discovered later. Gerbil's WireGuard subnet (100.89.x.x) falls within Tailscale's CGNAT range (100.64.0.0/10). Tailscale's ts-input iptables chain drops traffic from that range unless it arrives on the tailscale0 interface.

After running Ansible on scarif (which restarts Tailscale), the ts-input chain gets rebuilt and starts dropping gerbil's WireGuard traffic. The fix is an iptables rule that accepts wg0 traffic before ts-input can drop it:

❯_bash›1 lines

  1❯❯❯ iptables -I INPUT 1 -i wg0 -j ACCEPT

This runs as an init container on the gerbil-traefik pod, so it's applied on every pod start. Another fun one that took a while to track down 😅.

The result

40 services exposed to the internet, all through a single 4-core VPS running a WireGuard tunnel. TLS terminated on the edge, SSO authentication through Pangolin, real client IPs preserved end to end. The whole thing adds maybe 10-15ms of latency on top of the direct path.

Is it overengineered? Probably. But it's mine, I understand every piece of it, and when something breaks at 3 AM I know exactly where to look.

Next up: LUKS + Dropbear + RAID6, what happens when your encrypted NAS reboots at 3 AM and nobody's there to type the passphrase.

j / k	scroll down / up
gg / G	top / bottom
Ctrl-d / Ctrl-u	half page down / up
n / N	next / prev search match
Ctrl-p	telescope (find posts)

:	command mode
/	search
?	this help
Esc	clear / cancel

:q	close (or go to index)
:q!	force close tab
:w	"save"
:wq	save position + close
:help	toggle help
:about	about page
:tags	browse tags
:tag name	filter by tag
:N	open post #N
:Telescope	fuzzy post finder
:term	floating terminal
:Lazy	plugin manager
:checkhealth	diagnostics
:!neofetch	system info
:!cowsay	moo
:read !fortune	random quote
:colorscheme <name>	tokyonight · gruvbox · catppuccin · nord · rose-pine (suffix `-day` for light)

V	visual line mode
y	yank selection
T	open terminal