โ—index โšกape-rust-async-wall.md ๐Ÿท๏ธtags ๐Ÿ‘คabout

โšก Where the Single-Binary Dream Breaks: Async Rust, `epoll`, and Why `xh` Only Runs on Linux

Fifth post in the one-bin-to-rule-them-all series. Previously: intro, probe matrix, extern-static pattern, ripgrep and dog.

Three sync workloads into the series and the scoreboard still reads six out of six ๐Ÿฆ€.. Single fat APE, six targets, the binary works out what kernel it's really on and behaves correctly. The previous post closed with the observation that the pattern scales to real tools with small, bounded patches, and that the only thing it can't solve is "this OS uses a fundamentally different API" situations like Windows DNS, which cosmo-sysconf handles with runtime dispatch.

This post is where that story stops being that tidy ๐Ÿค’.. The workload is xh, a Rust httpie-equivalent, and it exists to answer one question: does async Rust survive the jump to cosmo the same way sync Rust did? The short answer: "on Linux hosts, yes, on everything else, no, and the reason it fails is structural in a way the extern-static trick can't fix".

Why xh

xh is a deliberate escalation from dog. It's still a CLI, but underneath the command parsing it drags in what is effectively the whole modern async Rust stack: reqwest for the client API, hyper for the HTTP wire protocol, tokio for the runtime, mio for the IO reactor. A single HTTP request exercises all four: a connection pool, a non-blocking socket, a epoll-based waker, a ready loop. If async Rust under cosmo is going to work, this is the kind of binary which has to work.

I forked ducaale/xh, wired it up the same way as dog (target JSONs, build-fat.sh, [patch.crates-io] pointing at libc-cosmo and getrandom-cosmo), and dropped TLS features for the moment because that's a rabbit hole of its own. More on that below.

The fork cascade

Porting ripgrep changed nothing in the src/ tree and one line in Cargo.toml. Porting dog changed a few cfg(cosmo) lines plus the new cosmo-sysconf crate. Porting xh needed three more small crate forks, one each for socket2, mio, and tokio. The reason, in every case, is the same.

Once libc::SOCK_NONBLOCK, libc::SOCK_CLOEXEC, libc::SOL_SOCKET, and libc::O_NONBLOCK become extern static under cfg(cosmo), you cannot read them in a non-unsafe position without a compiler error, and you cannot use them inside a const fn. Upstream socket2, mio, and tokio all touch these constants in places that do exactly that, because for a normal build where they're compile-time integers there's no reason not to. So each fork does the same mechanical thing: wrap the read site in unsafe { }, drop const from the handful of helpers that now transitively depend on a runtime value, and leave everything else alone. The diffs are small, the review is boring, and none of the changes are novel in any way post 3 didn't already cover.

The one that fell out differently was tokio specifically: I ended up forking tokio rather than patching it downstream because its pipe and UDS paths read libc::O_NONBLOCK in spots that aren't reachable via a simple cfg(cosmo) switch in user code. Keeping the fork upstream's master and carrying the unsafe { } wraps as a branch was cleaner than any downstream workaround I could find.

With the three forks wired in via [patch.crates-io], xh compiled.

About TLS

I want to flag this before the results table, because it changes what "xh works" actually means. I disabled TLS for the xh port.

reqwest 0.13's default rustls backend pulls in aws-lc-rs as its crypto provider, which in turn depends on aws-lc-sys, a sizeable BoringSSL fork. cosmocc's linker doesn't resolve some glibc-2.38-only string helpers that aws-lc's C sources call, so the link fails outright. The plausible alternative, ring, compiles its C portions through cosmocc fine, but it ships hand-written .S assembly files for curve25519 and friends, and cosmocc's driver refuses .S inputs with "assembler input files not supported." There's no no_asm feature on the version of ring that reqwest 0.13 pulls in.

Rather than keep chasing providers, I shipped xh as HTTP-only for this series. default = [] in Cargo.toml, a small local crate::tls shim that makes the TLS-touching types resolve so cli.rs and to_curl.rs still build, and plain HTTP requests for the test. The async stack, reqwest all the way down to mio, is still fully exercised against plain HTTP; it's just that the test endpoint has to be http:// not https://.

Bringing TLS back is a separate follow-up that needs a pure-Rust provider compilable without asm. For this post, "xh works" means "HTTP requests work end to end." That's still the whole async stack.

Linux host, and the aarch64 shim

xh GET http://httpbin.org/get on Linux x86_64 returned the expected JSON response on the first run ๐ŸŽ‰.. Full async runtime, connection pool, hyper wire protocol, all under cosmo. That's the payoff: a single binary built against target_os = "linux" through cosmocc is running non-trivial async Rust on a Linux host. Encouraging.

Linux aarch64 needed one small shim. Linux aarch64 doesn't have the original epoll_wait(4) or eventfd(1) syscalls, only the newer epoll_pwait(5) and eventfd2(2). cosmo's libcosmo.a ships sys_epoll_wait and sys_eventfd for x86_64 but they return ENOSYS on aarch64. The fix was the same shape as the waitid / __xpg_strerror_r shims in the probe and ripgrep: a one-line epoll_wait shim in xh/main.rs that wraps sys_epoll_pwait with a NULL sigmask, and a libc-cosmo change to link libc::eventfd at sys_eventfd2 (whose two-arg flag ABI matches). With those in, aarch64 Linux works the same as x86_64 Linux.

That's where the good news ends ๐Ÿ˜ฌ..

Non-Linux hosts: immediate failure

I copied the same xh.com to the other four targets (macOS arm64, FreeBSD 14, OpenBSD 7.4, Windows Server 2022) and ran the same command. Every one of them failed the same way, before any network traffic, before any DNS, before anything. The failure is at tokio builder time, roughly:

โฏ_bashโ€บ3 lines
  1โฏโฏโฏ ./xh.com GET http://httpbin.org/get
  2Error: Unsupported
  3  caused by: Function not implemented (os error 78)

The errno number varies per host (macOS and FreeBSD both return 78 for ENOSYS; OpenBSD its own; Windows has a different passthrough), but the kind is always Unsupported. The call that's failing is mio's reactor setup, inside the tokio runtime builder.

The stack trace below that points at a call that looks something like this, inside mio's Linux backend:

๐Ÿฆ€rustโ€บ2 lines
  1// simplified, from mio
  2let fd = syscall!(eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK))?;

On Linux x86_64 and aarch64, eventfd is a real syscall and the call returns a file descriptor. On macOS, BSD and Windows, cosmo's libcosmo.a does not know how to implement eventfd, because is a Linux-specific syscall with no native equivalent on those hosts. The cosmo runtime returns -1 with errno ENOSYS, mio propagates that as an IO error, tokio's builder turns it into ErrorKind::Unsupported, and the whole program exits before anything useful happens.

Why the extern-static pattern doesn't help here

Every finding until now has fit the same mould: a Linux numeric value baked at compile time that should have been different at runtime. F-001 was a struct layout, F-002 was errno numbers, F-005 was a CLOCK_MONOTONIC ID, F-015 was MSG_NOSIGNAL. The fix in each case was to turn the value into an extern static and let the cosmo loader fill it in from the host's real value.

F-016 is not that. There is no value to fix. eventfd(0, flags) is a syscall. On Linux it exists, on macOS / BSD / Windows it doesn't. You cannot make "this syscall exists" into an extern static. The problem isn't "the number is wrong," it's "the function is absent." That is a categorically different class of problem, and the pattern from post 3 cannot reach it.

The deeper reason this shows up inside mio and not somewhere earlier is that mio's design assumes one IO backend per build. Its source tree has sys/unix/selector/epoll.rs, sys/unix/selector/kqueue.rs, and sys/windows/selector.rs, and cfg attributes pick exactly one at compile time based on target_os. We compile with target_os = "linux", so we get the epoll path, for every host we deploy to. Every host that isn't Linux is then missing the syscalls that path depends on.

Tokio inherits the problem transitively. It never gets a chance to do anything non-Linux-specific because the reactor below it never came up.

Why this is where I stop

I thought about this one for a while and couldn't come up with a path I was confident enough in to write code against. From the outside looking in, the two places a fix could conceivably live are inside mio (somehow linking all of the epoll, kqueue, and IOCP backends together and picking between them at runtime based on cosmo's __hostos bitmask) or inside cosmopolitan (some kind of emulation of eventfd and epoll_* on non-Linux hosts, built on top of whatever the host actually provides). Both of those sound, to my non-expert ear, like serious pieces of work in codebases I don't know well enough to have an opinion on what the right design would be. So I'm noting this as where my investigation hit its wall, not prescribing anything to the maintainers of either project.

The scoreboard

WorkloadKindLinux x64Linux arm64FreeBSD 14OpenBSD 7.4Windows 2022macOS arm64
probesyncโœ“โœ“โœ“โœ“โœ“โœ“
ripgrepsyncโœ“โœ“โœ“โœ“โœ“โœ“
dogsyncโœ“โœ“โœ“โœ“โœ“โœ“
xhasync (HTTP)โœ“โœ“โœ—โœ—โœ—โœ—

Sync Rust works everywhere. Async Rust works on Linux. That's the honest state.

The "async works on Linux" row is still a real result. It means reqwest, hyper, tokio, and mio can all ride the Strategy 3 pattern the other posts in this series have been building, and a single fat APE carrying the full async stack runs correctly on both Linux arches. If your target is "Linux distros with different libc flavours" or "Linux x86_64 and Linux aarch64 from one binary," you can ship async Rust under cosmo today.

The "async doesn't work elsewhere" row is what it is. Under cosmo, async Rust is not portable to macOS, BSD, or Windows, not because the language can't do it, but because the current reactor architecture in mio picks its backend at compile time and cosmo can't undo that choice from the outside.

So what now

For sync Rust, the story the series has been telling holds. One binary, six targets, three real workloads, every time.

For async Rust, the pragmatic answer I landed on is two binaries: an APE for the sync parts if the single-binary story matters for distribution, and a normal cross-compile matrix for the async parts on the OSes I care about. That's an annoying place to end up after five posts of "one binary, every OS," but it's where the evidence points, and pretending otherwise would be worse than admitting it.

Next post: the final one in the series ๐Ÿ.. Pulling the whole investigation together, walking through the finding disposition, and being honest about what I got wrong going in.

:discuss share / comment on Mastodon โ†’