๐ How I Tunnel 40 Services Through a VPS (CGNAT Is Not My Boss)
This is the first post in a series about my k3s homelab. All nodes are named after Star Wars planets because of course they are.
So, I run a 7-node k3s cluster at home. 150 pods, 40 ingresses, 32 namespaces. It does monitoring, media streaming, file storage, git hosting, DNS, CI/CD.. the whole thing. It's been running for years and I love it ๐.
One day my ISP decided to put me behind CGNAT. Well, technically they offered to keep the public IP.. for ยฃ5/month. Five quid a month for something that used to be free? No thanks.
Every service I was exposing to the internet, gone.
I had to do something!
What is CGNAT and why it ruins your day
CGNAT (Carrier-Grade NAT) means your ISP gives you a private IP and NATs your traffic through a shared public one. You can browse the web fine, but nobody on the internet can reach you. Port forwarding? Dead. DynDNS? Useless. Your homelab just became invisible.
The common advice on r/selfhosted is "just use Cloudflare Tunnel" or "get a business line." But I'm the kind of person who runs their own git server, so you can imagine how I feel about routing all my traffic through Cloudflare ๐คฎ.
The solution: a cheap VPS as a relay station
The idea is simple. You rent a small VPS with a public IP (mine is called scarif, because it's the outer rim relay), and you tunnel all traffic from the internet through it back to your home cluster.
Here's the full chain:
1Internet โ scarif (:443) โ traefik-edge โ pangolin โ newt (WireGuard) โ home traefik โ services
Let me break this down.
The stack
I use Pangolin with its companion tools Gerbil and Newt. Pangolin is a self-hosted reverse proxy manager with built-in SSO, zero-trust networking, and WireGuard tunnels. Think of it as a self-hosted alternative to Cloudflare Tunnel + Cloudflare Access, but you control the whole thing.
On scarif (the VPS):
- Pangolin โ The brains. API server, management dashboard, SSO authentication (we use Google OIDC). Manages resources, sites, and tunnel configuration. Every public service is a Pangolin "resource" with health checks pushed to the newt agent via WebSocket
- Gerbil โ WireGuard tunnel endpoint. Creates and manages the
wg0interface, handles peer registration, reports bandwidth back to Pangolin - Traefik-edge โ Reverse proxy that terminates TLS from the internet, routes traffic to the right backend through the tunnel
- CrowdSec โ WAF sidecar that reads traefik access logs in real time
Gerbil, traefik-edge, and CrowdSec run in a single pod sharing hostNetwork: true โ same pattern as Docker's network_mode: service:. They share the network namespace so traefik can bind to gerbil's WireGuard interface and CrowdSec can tail traefik's log file via an emptyDir volume.
1# The gerbil-traefik pod: 3 containers, 1 network namespace
2hostNetwork: true
3dnsPolicy: ClusterFirstWithHostNet
4containers:
5- name: gerbil # WireGuard tunnel management
6- name: traefik # TLS termination + routing
7- name: crowdsec # WAF, reads traefik access logs
8volumes:
9- name: traefik-logs
10 emptyDir: # shared log volume between traefik and crowdsec
At home:
- Newt โ WireGuard tunnel client running in
kube-system. Connects to gerbil on scarif and exposes the home cluster's services through the tunnel - Traefik (home) โ Standard k3s ingress controller behind MetalLB at
lb.crisidev.lan
When a request hits blog.crisidev.org, DNS resolves to scarif's public IP. Traefik-edge receives it, authenticates through Pangolin SSO, and forwards it through the WireGuard tunnel to home traefik, which routes it to the right pod.
Why Pangolin over the alternatives?
I evaluated a few options before settling on Pangolin:
- Cloudflare Tunnel โ Works great, but I don't want Cloudflare inspecting all my traffic. I run my own git server, my own DNS, my own certificate authority.. I'm not about to hand the keys to a corporation. Supply chain risk, proprietary code, and an "ammerregaaaa" company that can enshittify the product whenever they want? No thanks ๐คฎ
- Tailscale Funnel โ I already use Tailscale extensively across the cluster, but with a self-hosted Headscale control plane for exactly the same reason I don't use Cloudflare: I don't trust proprietary coordination servers. Funnel itself is limited to 3 hosts on the free tier and adds latency through their relay network
- frp / ngrok โ frp is solid but doesn't have SSO or a management UI. ngrok is SaaS with limits
- WireGuard + manual setup โ Too much operational overhead. Pangolin handles peer registration, health checks, and DNS automatically
Pangolin won because it's fully self-hosted, has built-in SSO, includes a nice management dashboard, and the WireGuard tunnel is rock solid.. most of the time ๐ .
Preserving real client IPs
One thing that drove me crazy was losing the real client IP through the proxy chain. By default, every hop sees the IP of the previous hop, not the actual client.
The fix is X-Forwarded-For headers with trusted proxy CIDRs. Both traefik-edge on scarif and home traefik trust the same ranges.
Home traefik also uses externalTrafficPolicy: Local on the LoadBalancer service to prevent Kubernetes from SNATing the traffic. The result: every request shows the real client IP in ClientHost, X-Real-Ip, and X-Forwarded-For.
Health check probes from Pangolin itself show the newt pod IP (10.42.x.x) โ that's expected, since those requests originate inside the tunnel.
The debugging story from hell: stale WireGuard keys ๐ฅ
This one took me days to figure out.
After reinstalling Pangolin (new database, new keypairs), the newt tunnels appeared connected. Logs showed "Peer added successfully" on gerbil, and newt kept trying to establish the tunnel. But pings failed with "i/o timeout" forever.
No errors. No authentication failures. Just.. silence.
I ran tcpdump on scarif and saw the problem:
1# HandshakeInitiation packets arriving from home every ~5 seconds
2
3
4
5
6# ...zero HandshakeResponse packets going back
Home was knocking. Scarif was ignoring it.
The root cause
The wg0 interface persists in scarif's host network namespace across pod restarts (because hostNetwork: true). When gerbil starts and wg0 already exists, it reconfigures peers and the listen port but does not replace the private key.
After a Pangolin reinstall, the database has a fresh keypair. The new public key gets distributed to newt (home). But wg0 on scarif still has the old private key. When home sends a WireGuard handshake encrypted with the new public key, wg0 tries to decrypt it with the old private key โ silent drop. No error, no log, nothing.
The fix
Delete wg0 and let gerbil recreate it from scratch:
1
2
After the restart, gerbil logs "Created WireGuard interface wg0" (not just "configured") and newt reports "Tunnel connection to server established successfully!" ๐
I wrote a diagnostic script that checks for this exact failure mode:
1
2
The script samples wg0 rx packets over 5 seconds and cross-references with iptables counters on UDP 51821. If packets arrive at the port but wg0 receives zero decrypted traffic, it's a stale key. Simple, but it would have saved me two days of staring at logs.
The second failure mode: Tailscale vs WireGuard
There's a bonus failure mode I discovered later. Gerbil's WireGuard subnet (100.89.x.x) falls within Tailscale's CGNAT range (100.64.0.0/10). Tailscale's ts-input iptables chain drops traffic from that range unless it arrives on the tailscale0 interface.
After running Ansible on scarif (which restarts Tailscale), the ts-input chain gets rebuilt and starts dropping gerbil's WireGuard traffic. The fix is an iptables rule that accepts wg0 traffic before ts-input can drop it:
1
This runs as an init container on the gerbil-traefik pod, so it's applied on every pod start. Another fun one that took a while to track down ๐ .
The result
40 services exposed to the internet, all through a single 4-core VPS running a WireGuard tunnel. TLS terminated on the edge, SSO authentication through Pangolin, real client IPs preserved end to end. The whole thing adds maybe 10-15ms of latency on top of the direct path.
Is it overengineered? Probably. But it's mine, I understand every piece of it, and when something breaks at 3 AM I know exactly where to look.