Why You Should Avoid Async Rust
There was a lot written on the subject already so if you simply want the key points here they are:
You should avoid async
because:
- Async is a leaky abstraction which leads to “async contamination”.
- Leakiness leads to violation of the zero-cost abstractions principle.
- Async inevitably leads to severe degradation of developer productivity.
- Async fragments the ecosystem of available code, forcing developers to maintain sync and async APIs.
- Async leads to long compile times.
- Satisfying borrow checker requirements in async code is very hard.
- Most advertised performance benefits are negligible, too expensive or can be achieved without async.
You might benefit from async if you run at truly massive scale and microscopic performance gains still save you lots of dollars or if nanoseconds of latency are critical to your business.
Alternatives to consider before using async:
- Specialized event loop.
- Threads and thread pools.
- Golang / Erlang
The Long Story
But before we begin a small intro to Async Rust is needed.
Official Rust documentation defines Async Rust as:
concurrent programming model that lets you run a large number of concurrent tasks on a small number of OS threads … through the async/await syntax.
Async provides significantly reduced CPU and memory overhead, especially for workloads with a large amount of IO-bound tasks, such as servers and databases. All else equal, you can have orders of magnitude more tasks than OS threads, because an async runtime uses a small amount of (expensive) threads to handle a large amount of (cheap) tasks.
In other words it is a form of M:N cooperative scheduling in userspace where developer is responsible for cooperating with the scheduler. Most of the time it means correctly yielding execution control back to the scheduler. If this rings a bell this is where you’ve seen this already: in C programming language the developer is responsible for manually managing memory. Everything will be fine, iff you do it correctly :)
This is an attractive sales pitch but every engineer needs to ask several important questions:
- What is the overhead exactly and how “significant” is its reduction?
- Is it free? What price do I pay for this?
- In what situations will I observe the advertised benefits?
- In what situations will the benefits be worth the cost?
- Am I in such situation?
If you are new to async
in Rust I can recommend a good deep dive here
The C10K Problem
In 1999 Dan Kagel’s has coined C10K Problem term. Ten thousand concurrent connections may not seem like a big deal today, but number itself is somewhat arbitrary. Indeed, the hardware of 1999 was much weaker than today. But what is even more important is the idea that hardware is capable of delivering better performance on IO-bound tasks if special care is taken to avoid the overhead. The article looks at multiple solutions to the problem and here is it says this about cooperative M:N models:
There is a choice when implementing a threading library: you can either put all the threading support in the kernel (this is called the 1:1 threading model), or you can move a fair bit of it into userspace (this is called the M:N threading model). At one point, M:N was thought to be higher performance, but it’s so complex that it’s hard to get right, and most people are moving away from it.
Many operating systems have tried M:N scheduling models and all of them use 1:1 model today. It is tempting to reach to a premature conclusion that M:N model is inferior to 1:1, but how come M:N model is used in Golang and Erlang - 2 languages known for their superior concurrency features?
The Coloring Problem
In 2015 Robert Nystrom wrote his famous What Color Is Your Function blog post where he explains fundamental problem with cooperative scheduling models. He used colors to represent 2 different types of functions that can’t interact with each other normally. This article needs to be included in the basic school education program, go read it now, if you haven’t already.
Function coloring problem is an example of extremely leaky abstraction. Async
effectively “infects” your code such that everything needs to be aware of
async. This effect naturally follows from the rules of interaction between sync
and async functions. The only way to “plug” the leak is to tell the async
runtime to emulate sync behaviour by calling block_on
(or similar
function)such that it polls all the futures until the one you are interested in
returns. This is what postgres
crate and many others have done.
This helps to stop the “infection”, but introduces another problem - a violation of the zero-overhead abstractions promise. Users who don’t need async will still pay the price. A good example of this is Postgres crate. It comes with Tokio runtime and there is no way to disable it, you still pay with longer compile times and performance overhead of running a userland scheduler. Only this time you won’t know who are you paying to until you open the hood and look inside your dependency graph.
But are colors the only reason that makes async a leaky abstraction?
The Human Problem
Golang and Erlang successfully employ M:N models and there are no coloring problems in those languages. The leakiness of async comes from “cooperative” design choice. One might be tempted to ask “who is failing to cooperate with whom?”.
Humans are bad at performing tedious tasks, especially so if the task is irrelevant to the main goal (like writing business logic and shipping code). We have decades of CVEs from C world to back this statement up. It is a problem between a human and a scheduler.
But there is even a bigger problem - cooperation between one developer and another. Humans need money to exist and money are usually made by doing useful work. Helping a computer to do what it can do without you is not what is considered to be “useful work” by many. [1]
C has similar human-to-human problem, only with memory management. There is no good way to negotiate the ownership of memory between one developer and another. Similarly, is very difficult to negotiate the reasonable scope for async in Rust.
One developer decides to use an infectious leaky API to solve their problem and now everybody who is touching this code with a 10ft pole is paying the price. Since application developers are paid to solve business problems and not scheduler problems it immediately translates into the questions like:
- Is the async helping me do what I need to do?
- Is the price of async in my codebase worth the price?
And if the answer happens to be negative then the next question immediately follows:
- Why that library or person is affecting my productivity?
As a result both async and manual memory management do not compose very well. And there is no way to plug the leak in a good way. The only option is to abandon this leaky ship and build another one.
I believe this is the worst effect of Async Rust - community and ecosystem fragmentation. Now every API needs to be implemented twice. And even things like keyword generics are not going to save the day. To preserve backwards compatibility this feature will be opt-in and there will always be that guy who wrote his opinionated library in a chosen color and actively ignores the rest.
The Rust Problem
On top of general design problems there are Rust specific problems.
Matt Kline brilliantly captured the gist of it: Async Rust Is A Bad Language
pain.await
On one hand, futures in Rust are exceedingly small and fast, thanks to their cooperatively scheduled, stackless design. But unlike other languages with userspace concurrency, Rust tries to offer this abstraction while also promising the programmer total low-level control.
There’s a fundamental tension between the two, and the poor async Rust programmer is perpetually caught in the middle, torn between the language’s design goals and the massively-concurrent world they’re trying to build. Rust attempts to statically verify the lifetime of every object and reference in your program, all at compile time. Futures promise the opposite: that we can break code and the data it references into thousands of little pieces, runnable at any time, on any thread, based on conditions we can only know once we’ve started! A future that reads data from a client should only run when that client’s socket has data to read, and no lifetime annotation will tells us when that might be.
General design problems are not lonley. They are in a company of immature implementations. Here @WormRabbit on Reddit: writes:
Still no async traits.
Still no async closures. Quite painful when you need to move stuff into it.
Still no async iterators. Working with Streams is painful, the terminology is inconsistent, many iterator methods are missing.
Pin is a huge ball of complexity dumped into the language, and it’s basically useless outside of writing async (i.e. if you think it will help with your self-referential/non-movable type, think again). Anything meaningful done with it requires unsafe. At least there are pin and pin_project macros which automate some of it.
Basically all fundamental async stuff is still in crates and not in libstd.
No way to abstract over executors, leading to ecosystem split and de-facto monopoly of Tokio. If you aren’t Google, writing a new executor isn’t worth the hell of rewriting the whole ecosystem around it, so Rust could just go with a built-in executor to the same effect, saving people from a lot of pain.
No way to abstract over sync/async, leading to ecosystem split and infectious async.
Yes, basically the whole ecosystem from libstd upwards needs to be rewritten for async. Even bloody serde and rand.
select! macro is a mess.
Debugging and stacktraces are useless for async.
Generators are still not stable. Personally, for me pretty state machine syntax is like 95% benefits of async, but I’m forced to drag all the mess of executors and async IO with it.
Implicit Send/Sync propagation of async fn types is a mess.
Lack of async Drop is a huge pain point.
Future cancellation in general is a mess.
Or here Tima Kinsart shows you yet more problems: Rust Is Hard, Or: The Misery of Mainstream Programming
There is more but I think you get the idea. Let’s ask another question instead - is the pain worth it?
The (Absence Of) Performance Problem
One of the main reasons why people think they need to use async
in Rust is to
make their I/O bound application go fast. But they rarely ask themselves these
questions:
- Do I have performance problem?
- Did async solve my problem?
If you haven’t answered the first question the second question becomes impossible to answer. Even more interesting question to ask is:
- Do I have C10K problem?
Rust gives you plenty of performance to begin with, and you need to push modern hardware pretty far before context switching or PCB size becomes your problem. Modern hardware is capable of running a lot more than 10k threads. My computer has 2045 threads running right now. And this is just my browser, a terminal and a handful of system services. As you can see the system is 99.2 % idle (it is an average 8 Core AMD Zen processor).
$ procstat -t -a | wc -l
2045
$ top
last pid: 45104; load averages: 0.81, 0.46, 0.29 up 3+12:24:47 20:23:34
101 processes: 1 running, 92 sleeping, 4 stopped, 4 zombie
CPU: 0.1% user, 0.0% nice, 0.0% system, 0.7% interrupt, 99.2% idle
But we need to be more specific to support this claim.
This github project did some benchmarking on the subject:
A context switch takes around 0.2µs between async tasks, versus 1.7µs between kernel threads. But this advantage goes away if the context switch is due to I/O readiness: both converge to 1.7µs.
IO-bound workloads are the main reason for async to exist but here we see that async provides no performance improvement over threads when the context switch happens due to IO. I will leave the question “What is the value of a context switch in userspace if it was not due to IO? Are you just artificially slowing down your program?” to the reader.
Creating a new task takes ~0.3µs for an async task, versus ~17µs for a new kernel thread.
Ok, spawning tasks is faster, but unless you spawn hundreds of thousands of tasks per second it is not going to be your problem. I can go out on a limb and say that if you do need to continuously spawn this many tasks you may have a design problem, not a performance problem.
And yes, 0.2 µs vs 1.7 is an order of magnitude difference. But don’t over index on ratios. 2 µs is pretty damn fast. There are 1000 microseconds in one millisecond. How much money will these extra 2 microseconds going to make you? How much time are you going to spend writing and debugging async code?
Memory consumption per task (i.e. for a task that doesn’t do much) starts at around a few hundred bytes for an async task, versus around 20KiB (9.5KiB user, 10KiB kernel) for a kernel thread. This is a minimum: more demanding tasks will naturally use more.
It’s no problem to create 250,000 async tasks, but I was only able to get my laptop to run 80,000 threads (4 core, two way HT, 32GiB).
4 core laptop runs 80K threads. We don’t have C10K problems anymore. It is not a problem at all to get an 8 or 16 core server these days, it is a low tier commodity hardware. We maybe have C200K problem. Do you really have over a hundred thousand concurrent connections to a single server? What will you do if AWS schedules a maintenance or your hardware dies? At this point you need to start thinking about horizontal scaling and load balancing because loosing 100k clients is going to hurt. You don’t to keep all eggs in one basket.
20 Kib vs 200-400 bytes is also an orders of magnitude difference. But even 100,000 threads will take 1953MiB which pales in comparison to how much memory your browser consumes. If you really need to go from 80K to 200K threads you will need 2 GiB more RAM. On AWS the price difference from t3.small to t3.medium is going to be about $15/mo which will give you the desired upgrade along with an extra CPU core. Is $15 going to break your bank? You likely spend more on coffee each day. And you will need a lot more coffee when you add async to your codebase :)
How many servers do you need to run before extra 2GB per server becomes a meaningful sum? How much larger does this sum needs to get for you to decide to allocate a couple of developers to fix it? I don’t know exactly, but it is much larger than any single developer makes in a year.
So who does benefit from async? - I really don’t know. I can speculate that large companies that have effects of scale, or businesses where 2 microseconds make or break the bank do.
Non-Existing Problems
Is there anything else to async in Rust? Not really.
Yoshua Wuyts argues here
that Async enables timers, signals, cancellation as forms of structured
concurrency. But he gracefully omits the fact that all of these features
are not unique to async Rust. There ara available in Golang and Erlang.
The primitives needed to build these features have been part of POSIX for decades now.
See man pages for signal
, timerfd
, epoll
, kqueue
and pthreads
.
If you have a lot of IO going on building a special-purpose event loop for your problem
instead of using a general-purpose runtime (like Tokio) is the right approach.
Conclusion
In the beginning I asked these questions:
- What is the overhead exactly and how “significant” is its significant reduction?
- Is it free? What price do I pay for this?
- In what situations will I observe the advertised benefits?
- In what situations will the benefits be worth the cost?
- Am I in such situation?
The overhead of traditional threads approach is not significant for most people. You need to be an Amazon/Google size company or do something really special to reap the benefits of async. Chances are you are not that kind of business. And chances are you don’t have performance problems that async can fix for you.
Async comes with a heavy price that is not worth paying all tings considered.