Design of the uvco async library

Tue, Jul 30, 2024 tags: [ Programming C++ Async Uvco ]

New article, discussing recent developments and improvements: News in uvco

I’ve already presented my “uvco” library in a previous post. Here I want to go deeper on the underlying design ideas and mechanics of using C++ coroutines to make using libuv a lot more ergonomic than otherwise.

The outcome is similar to e.g. Node.js concurrency - not surprising, because that’s still libuv’s main use case - or Python’s asyncio library.

Getting Started with C++ coroutines

When coroutines were first standardized, I was looking forward to using them. Having used Go before, I hoped that writing concurrent and parallel code would finally become more accessible in C++. However, if you read references for C++ coroutines, you’ll notice that they never talk about concurrency at all. That was initially a bit difficult to understand, and I wasn’t quite sure what coroutines would be good for then.

After getting back to this topic later, I managed to switch my perspective, which made it a lot easier to grasp coroutines: The key is that goroutines are merely one special case of coroutines; and while I had used continuations in functional languages, and was aware of how goroutines are implemented, it took me this insight to make progress.

The insight, in few words, is:

Coroutines are a generalized concept of functions, and C++ coroutines provide callbacks to every stage of function execution.

You can read all about the details on cppreference, but uvco only uses a subset of all available hooks and interfaces - so let’s focus on the important parts.

Coroutine Execution

C++ now introduces a few occasionally overcomplicated mechanisms in order to provide these callbacks; I’m sure someone sitting on the committee had already written a library for this, and a very similar design therefore ended up in the standard (although I’m not aware of specifically which code base that would be).

A function becomes a coroutine if it contains a co_await or co_return statement.

Whereas a conventional function is called, runs, and then returns, a coroutine allows having multiple entry and exit points. Each such point is called a “suspension point”, and either implicit or explicit by use of co_await.

Promise objects

The execution of a coroutine is determined by its promise_type. The promise type is determined by the return type of the coroutine, and must implement a certain interface which I call the “coroutine protocol”. As usual in C++, this is achieved by convention instead of inheritance: if you implement the methods in a way that they can be called as the compiler plans to, it will work (and of course, if your code is working).

The promise_type must fulfill the following interface:

initial_suspend(): This method is called when the coroutine is started. It returns an “awaitable” - see below - which is “awaited” - see below - right after the coroutine is started.
final_suspend(): This method is called when the coroutine is finished. Just like initial_suspend() it returns an awaitable that determines what happens after the function body has executed and is about to return.
return_value(): This method is called when the coroutine is about to return a value using co_return. If the coroutine is supposed to have a void result type, the method’s name is return_void().
get_return_object(): This method is called when the coroutine is started, and must return an object that is returned from the coroutine to the caller. Typically, the promise object and the return object must be linked in a way that the caller can eventually retrieve the returned result of the coroutine.
unhandled_exception(): This method is called when an exception is thrown in the coroutine. It can choose to rethrow the exception, or handle it in some other way; for example, store it in the return object for later rethrowing.

As a coroutine starts running, a “coroutine frame” is allocated on the heap - it contains all necessary state, including the promise object. Then, the coroutine starts running until the first suspension point is reached. This is illustrated by the following snippet:

Coroutine “hooks” as used while running a coroutine. The non-linear flow makes it difficult to present.

Awaitables

The second important protocol is the combination of “awaitable” and “awaiter” interfaces.

An awaitable is an object that can be awaited by using co_await on it.

The awaitable interface is simple: either it is an awaiter (see below), or it produces an awaiter:

Awaiter operator co_await(): This method is called when the co_await operator is used on the awaitable. It returns an awaiter.

An awaiter is used to suspend a coroutine.

The awaiter interface is a bit more complex; it consists of the following three methods:

bool await_ready(): This method is called when the coroutine is about to be suspended. It returns true if the awaiter object is already in a state that allows the coroutine to continue immediately. Otherwise, it returns false.
void await_suspend(std::coroutine_handle<promise_type> handle): If await_ready() has returned false, this method is called right after suspending the coroutine. The handle is a special object that mostly only allows the coroutine to be resumed later, at exactly the point it was suspended. The awaiter is now free to store the handle in a place where it can be retrieved later. It is also valid for this method to take a std::coroutine_handle<>, if you don’t care which type of coroutine you are suspending.
T await_resume(): This method is called when the coroutine is resumed by calling handle.resume(). The value it returns becomes the result of the co_await expression.

Coroutine flow Flow diagram of a coroutine’s execution - compare to previous figure.

There are a few things to note here, and to give you an initial orientation around the basic types in uvco:

The promise type of a coroutine is not what’s returned by the function! In uvco it used to be the same, but now it isn’t; the Coroutine<T> class template manages the coroutine execution, and the Promise<T> class template is used to return values to callers.

The reason I changed this after some time is an intricacy in the C++ standard that says: if coroutine arguments can be used as constructor arguments to the promise object, then the promise object will be constructed from those arguments. That is a regular annoyance if the promise type has a copy constructor, and you want to write coroutines taking promises as arguments (because unexpected things will happen). Separating the two types solves this.
An awaitable is not an awaiter, but an awaiter is an awaitable… this is just legalese but allows you to await types that themselves don’t implement the awaiter protocol (I say “protocol” instead of “interface” because the compiler just expects you to implement certain methods, as opposed to inheriting and overriding them)
A coroutine is not necessarily awaitable, and an awaitable is not necessarily backed by a coroutine; but oftentimes it makes sense to return an awaitable from a coroutine (in uvco that’s called a Promise<T>) to allow for efficient waiting on a coroutine result. Specifically, Promise<T> is an awaitable that returns an awaiter of type Promise<T>::PromiseAwaiter_.

This introduction skips some intricacies, but most of these are not required to understand the concept and design decisions of uvco. For example, there are also generators (coroutines using suspension points that return values), but those work similarly to what I’ve briefly described before.

If this has been overwhelming, and you still feel like this is way overcomplicated - that’s normal. If you want to see code, check out the Bonus section at the end, which shows equivalent C++ code to what the compiler transforms the coroutine code into.

libuv

libuv is a C library implementing an event loop and platform-independent I/O primitives, like sockets, files, thread pools, process access, timers, and so on. The main concept in libuv is the handle; each type of I/O comes with its own handle type, and a handle - for the library user - usually at least contains a void* data field for storing arbitrary data, and often enough a loop field as well. The uv_loop_t* loop is a pointer to an event loop object, the second important concept. The event loop is started by calling uv_run() and continuously monitors sockets, timers, and triggers callbacks when needed. For example, when an application registered its interest in reading data from a socket, the event loop monitors the socket using a mechanism like epoll(2) or io_uring (and comparable facilities on other platforms than Linux), and calls the supplied callback function when data is available.

The libuv interface is very easy to understand and I consider it quite well-designed except for some minor warts. In general, libuv is callback-based. For example, in order to connect to a remote TCP host, the uv_tcp_connect function is used:

typedef void (*uv_connect_cb)(uv_connect_t *req, int status);

int uv_tcp_connect(uv_connect_t *req, uv_tcp_t *handle, const struct sockaddr *addr, uv_connect_cb cb);

where req is a request object tracking the connection attempt, handle is the TCP handle to connect, addr is the remote address, and cb is the callback function to be called when the connection attempt is finished.

If you managed to understand the basic mechanism of coroutine suspension and resumption above, then you may already suspect how coroutines can play together with libuv. In principle, suspending a running coroutine on e.g. a socket read must store the coroutine_handle<> in an appropriate place for a callback on that socket to find it, which can then resume() the coroutine - either directly from the callback or later on, by scheduling the coroutine with some kind of task queue.

uvco

To understand uvco’s architecture, in my opinion it’s better to start with the specific and then see how specific interactions generalize across all sorts of I/O, from TCP traffic to process management.

Note that the Doxygen documentation is fairly rich for uvco, with class diagrams and even more importantly, the source code in-lined. This is a great way to understand the library in depth.

Example: Standard I/O

The first feature implemented for uvco, as a proof of concept, was a basic class for asynchronously reading and writing from and to a TTY. libuv provides the uv_tty_t handle type for this. Nowadays, the TtyStream class lives in the uvco/stream.h header and inherits most functionality from the StreamBase class.

To continue digging into the specific, let’s consider the use case of reading a line from stdin, a very common operation. There’s an overloaded read() method, with an alternative implementation writing the read data to a pre-allocated std::span<char> - but let’s consider the simple approach for now.

// A Loop instance is required to create libuv handles on the right loop.
// There's one single Loop object for each thread that uses uvco.

// uvco::Loop loop;
auto stdin = uvco::TtyStream::stdin(loop);
// if std::nullopt is returned, the stream has reached EOF (Ctrl-D)
std::optional<std::string> stdinput = co_await stdin.read();

Before going into the detail of this interaction - for which we need to understand general Promise suspension and resumption, let’s check the implementation of TtyStream::read():

Promise<std::optional<std::string>> TtyStream::read() {
  // This is a promise root function, i.e. origin of a promise.
  std::string buf(maxSize, '\0');
  InStreamAwaiter_ awaiter{*this, buf};
  const size_t nRead = co_await awaiter;
  if (nRead == 0) {
    // EOF.
    co_return std::nullopt;
  }
  buf.resize(nRead);
  co_return buf;
}

This functionality is implemented as a coroutine - analogous to what is shown above - but can be seen as the “root” of a coroutine tree. Why “root”? This coroutine is the first one that doesn’t co_await yet another coroutine, but instead a mysterious InStreamAwaiter_ object. This object is the “awaiter” for uvco Stream operations - it is also used for socket reads, for example.

Now the action follows exactly the same pattern as described above: the InStreamAwaiter_ object is awaited, by first calling its await_ready() method:

bool InStreamAwaiter_::await_ready() const {
  uv_status state = uv_is_readable(&stream_.stream());
  if (state == 1) {
    // If data is available, the callback onInStreamRead will be called
    // immediately. In that case we don't have to wait.
    start_read();
    stop_read();
  }
  return status_.has_value();
}

Even if you don’t know libuv yet, it should be quite clear what happens here: the uv_is_readable() function is called, and if data is available, we immediately read from the socket. We know that data is there, and can therefore stop reading immediately after. Afterwards, through the underlying mechanics, the status_ field is set. Only if it contains a result status, we return true from await_ready(). In that case we get away without suspending the coroutine. In the more common case there is no data to be read, and await_ready() returns false.

In the latter case the await_suspend() method is called next; the handle argument is supplied by the runtime (respectively, the compiler’s transformed code):

bool uvco::StreamBase::InStreamAwaiter_::await_suspend(std::coroutine_handle<> handle) 	
  BOOST_ASSERT(uv_handle_get_data((uv_handle_t *)&stream_.stream()) == nullptr);
  uv_handle_set_data((uv_handle_t *)&stream_.stream(), this);
  handle_ = handle;
  stream_.reader_ = handle;
  start_read();
  return true;
}

void uvco::StreamBase::InStreamAwaiter_::start_read() {
  uv_read_start(&stream_.stream(), StreamBase::InStreamAwaiter_::allocate,
                StreamBase::InStreamAwaiter_::onInStreamRead);
}

In here, we store a pointer to the current InStreamAwaiter_ instance, representing a pending stream read, in the libuv handle. Then the start_read() function is called, which made its appearance in the happy case in await_ready() before. It actually calls into libuv, and supplies the onInStreamRead() callback. That callback function is a static member - because libuv only understands C function pointers - and is responsible for resuming the coroutine. As with every libuv callback, it receives a pointer to the handle, allowing us to refer back to the awaiter, which contains the handle to the suspended coroutine.

So given that the callback is the pivot point of all the action, what’s going on inside of it?

void uvco::StreamBase::InStreamAwaiter_::onInStreamRead(uv_stream_t *stream,
		ssize_t nread,
		const uv_buf_t *buf) 		
{
  auto *awaiter = (InStreamAwaiter_ *)stream->data;
  BOOST_ASSERT(awaiter != nullptr);
  awaiter->stop_read();
  awaiter->status_ = nread;
 
  if (awaiter->handle_) {
    auto handle = awaiter->handle_.value();
    awaiter->handle_.reset();
    Loop::enqueue(handle);
  }
  stream->data = nullptr;
}

It looks fairly complex, but all it does is

recover the InStreamAwaiter_ instance from the libuv handle,
ensure that libuv stops trying to read from the stream,
store the number of bytes read in the status_ field (which is checked by await_ready() and await_resume()),
if a coroutine handle is stored in the handle_ field, enqueue it for execution. The Loop is described in more detail below, when I discuss the higher-level aspects of uvco.

The last check is necessary because of a fundamental design decision in uvco: Unlike many other concurrency frameworks like Python’s asyncio or Rust’s Future trait, coroutines are run in uvco as soon as they are intialized. Meaning, the following line will actually already try to read in the background and be ready if you ask for a result later (using co_await):

auto stdin = TtyStream::stdin(loop);
uvco::Promise<std::optional<std::string>> stdinput = stdin.read();

// You don't know the result yet but can continue doing other things.
// Input may arrive in the meantime.

// and later come back to fetch the result:
auto input = co_await stdinput;

This is obvious from how the callback and await_ready() are implemented: when ready, the bytes read are stored in the InStreamAwaiter_, which has a std::span<char> buffer_ field pointing to either a user-allocated buffer or the std::string allocated by TtyStream::read(). If you call await_ready() at any later point, no suspension/resumption dance takes place, because the data is already there!

As a very half-baked analogy taken from the Go programming language, you could say that uvco coroutines are like goroutines that are automatically spawned in the background as soon as they are created. Note however that uvco is always single-threaded (except for a small exception), so there’s no parallelism involved.

One downside of libuv’s design is that even closing handles is asynchronous. This means we can’t do it in destructors of uvco classes! Instead, uvco’s users must remember to always call co_await obj.close() where available before dropping obj. The reason for this is that we want a deterministic shutdown of all libuv handles.

Generalizing

If you look at other classes responsible for I/O or asynchronous operations, like CurlRequest, Channel, Udp, you will find an inner struct or class being used as awaiter. The actual I/O methods are expressed in terms of normal coroutines, the same way an application using uvco would implement it.

While in each case, the awaiter is shared between calling code and callback, and contains a field destined to hold the result, there is some variation in the division of work: More recently, I’ve written the awaiters to be very simple. That means that await_suspend() only stores the coroutine handle, and await_resume() resumes it; for example: Udp::RecvAwaiter.

In earlier implementations like StreamBase, the awaiter’s await_suspend() also was responsible for starting the actual read operation (e.g. StreamBase::InStreamAwaiter_. The actual difference in behavior is negligible, because the execution is the same.

Scheduling

What you now know covers essentially the entire functionality of uvco. Every kind of I/O or other interaction with the world outside the process is modeled as a class with some coroutine methods and one or more awaiter classes.

What happens when a callback runs and wants to resume a coroutine, though? In the callback we’re usually in the luxurious situation of having the std::coroutine_handle<> object available. So what if we just call resume() on it?

This is in fact how earlier versions of uvco worked, and it does work just fine. You can even make the argument that it reduces latency between a triggering I/O event and the resumption of the coroutine. The main reason it was replaced by a different solution is, however, that it makes the control flow very difficult to debug. It was not infrequent for me to debug an issue, and being in a stack that would cross three active coroutines, as a coroutine resumed from a callback can easily result in the creation of even more coroutines which can resume other coroutines; this is because not all coroutines are resumed from libuv, as some types like Channel<T> work without involvement of libuv. And not only does this make our life harder in debugging sessions, it can also cause them - because control flow may end up where you don’t suspect it, and you can easily find what looks like race conditions in single-threaded code (only that they’re mostly reproducible, phew).

So, in order to flatten those callstacks, the simplest idea is to only resume coroutines from one place. If coroutines need to be resumed, they will be scheduled to do so - you’ve already seen Loop::enqueue() - and once libuv has finished all the event processing, we can conveniently resume all coroutines after one another.

This diagram, taken from docs.libuv.org, shows the individual stages of the event loop.

In the figure you can see how the libuv event loop progresses through its discrete steps; in uvco, we use a customization by only starting the event loop using the UV_RUN_ONCE flag. This causes libuv to return right after having processed one iteration. After we obtain control, we can resume all coroutines, which then schedule new I/O on the libuv loop, and so on.

The loop itself runs as part of the Loop::run() method, The scheduler is obviously implemented in the Scheduler class, and currently only has one complication beyond running all coroutines after another: in order to implement SelectSet, which is a way to wait for multiple promises at once, we need to be able for a coroutine to be resumed from multiple places, but only actually resume it once. Therefore the scheduler keeps track of which coroutines are currently running, and only resumes coroutines which have not been resumed before (due to how SelectSet is implemented, we only need to track duplicates within one turn of the event loop).

In order to make it easier to callbacks to schedule coroutines, the Loop class has a static method Loop::enqueue() which uses a static field defaultLoop set to the currently running loop. This both ensures that only one loop is running at a time, and that coroutines can be resumed from anywhere in the code.

Bootstrapping

The last aspect to mention is how we go from int main() to having coroutines run. Uvco makes this very simple for any user of the library:

int main() {
    uvco::runMain([](const uvco::Loop& loop) -> uvco::Promise<void> {
        // Your code here.
        co_return;
    });
}

Many uvco types, like TcpClient, require a const Loop& in their constructor - now you have one, and can supply it to start running I/O.

Shutdown

Finally, what happens at the end of a Loop’s life: how does it know when to stop? uv_run() usually blocks as long as there are active sockets or other handles registered on the loop, and in the single-step mode that uvco uses, the function uv_loop_alive() tells us if libuv thinks there’s more work to do, e.g. a scheduled callback on a socket. The mere presence of open handles doesn’t make the loop alive though - and closing a loop with open sockets is something that libuv will complain about (in the form of an error code). We use this behavior to automatically shut down the loop once an application has finished with everything: if the loop is not alive, and no more coroutines are scheduled in the uvco scheduler, Loop::run() will return.

This principle is used in every single unit test case, and part of the test: only if all handles are closed correctly will the test finish without exception. It can happen that there is still an open handle on the loop, e.g. a socket; if not closed, the loop will not be alive, but runMain() will eventually throw an exception (“there were still resources on the loop”).

It is possible that the loop doesn’t finish despite you expecting that it should; or the other way around, that Loop::run() decides to finish despite there being open handles on the loop. The reason for this is a bug in your code or in uvco; for example, if you’re not awaiting a coroutine that is supposed to close a handle, the loop will never do so. Take a look at the README.md file for a specific explanation. (For the record, I haven’t managed to reproduce this kind of error on purpose with just “client” code, i.e. without modifying uvco; usually uvco will do the right thing even if you forget to await a promise or close an object!)

However, if not closed, the libuv handle will still be closed asynchronously, but at the cost of leaking a small amount of memory (and a complaint from uvco on stderr). Here’s what happens:

You forget to close a socket (in this case a UnixStreamServer) at all, in which case address sanitizer will complain (if enabled):

[ RUN      ] UdsTest.UnixStreamNotClosed
StreamServerBase::~StreamServerBase(): closing server in dtor; this will leak memory. Please co_await server.close() if possible.
[       OK ] UdsTest.UnixStreamNotClosed (1 ms)
[----------] 4 tests from UdsTest (9 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (10 ms total)
[  PASSED  ] 4 tests.

=================================================================
==979677==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 264 byte(s) in 1 object(s) allocated from:
    #0 0x7f92422f84b8 in operator new(unsigned long) (/lib64/libasan.so.8+0xf84b8) (BuildId: c1431025b5d8af781c22c9ceea71f065c547d32d)
    #1 0x26e4dd in std::__detail::_MakeUniq<uv_pipe_s>::__single_object std::make_unique<uv_pipe_s>() /usr/include/c++/14/bits/unique_ptr.h:1076
    #2 0x287064 in uvco::UnixStreamServer::UnixStreamServer(uvco::Loop const&, std::basic_string_view<char, std::char_traits<char> >, int) /home/lbo/dev/cc/uvco/uvco/uds.cc:27
    [...]

You call close() but don’t await it - this is the worst case, as it will either crash (if libuv is built in Debug mode) or hang (otherwise), because the loop is closed at the same time as a handle is being closed:

[ RUN      ] UdsTest.UnixStreamNotClosed
StreamServerBase::~StreamServerBase(): closing server in dtor; this will leak memory. Please co_await server.close() if possible.
uds-test: /home/lbo/test/libuv/src/unix/core.c:148: uv_close: Assertion `!uv__is_closing(handle)' failed.
Aborted (core dumped)

If combined with a bug in uvco - most classes should not be affected by this - and the asynchronous close in the destructor doesn’t work, an early return or an exception (or otherwise forgetting to close an object) will result in the following error:
```
[ RUN      ] UdpTest.udpNoClose
Udp::~Udp(): closing UDP socket in dtor; this will leak memory. Please co_await udp.close() if possible.
Loop::~Loop(): uv_loop_close() failed; there were still resources on the loop: resource busy or locked
```
Essentially your top-level coroutine supplied to runMain() has finished, but the libuv loop still contains open handles. It’s not a serious bug, given the application is shutting down, but points to unclean error handling, and - as mentioned - a bug in uvco. Please report if you come across this behavior!

Another type of error that points to a bug in uvco is the following:

C++ exception with description "UV error EAGAIN (unwrap called on unfulfilled promise)" thrown in the test body.

This originates in runMain(), which assumes that after Loop::run() returns, your top-level coroutine has finished. However, if there is a bug in uvco, then the loop again finishes before all coroutines have been resumed. Specifically, it can be provoked like this:

In fs.cc: File::close(), remove the call to uv_fs_close().
In your application code, call co_await file.close() as always.
When passing this point, your coroutine will be suspended.
But as libuv never was informed of your intent to close the file, no callback will be scheduled.
Therefore the loop will be empty, and no coroutines are left to be resumed at the end of your application code.
Thus the loop returns, even though there is still at least one un-resumed coroutine: the top-level one supplied to runMain().
When calling Promise<T>::unwrap() in order to obtain its result, the exception shown above is thrown - as that coroutine never finished. If you see this error, it means not all the code you expected to run actually did run! The fun with asynchronous programming…

Other features

There are some more features that I have not (yet) described in greater detail here, for example:

Channel which is an asynchronous bounded queue of items that can be awaited, and doesn’t actually involve libuv in its operation, making it fairly fast (~ 20ns per operation).
SelectSet was mentioned earlier; it allows for syntactically elegant “racing” of Promise objects.
MultiPromise works just like a promise, except that it yields more than one value. In that, it is essentially a generator type. This increases performance when e.g. reading large numbers of UDP packets from an Udp instance, or from a Channel. Inside a generator coroutine, i.e. a coroutine returning a MultiPromise, you can use the co_yield keyword to return values to the caller while suspending the generator itself. This also doesn’t have to involve libuv; you could also implement toy features like Python’s range() or enumerate().
Timers like Ticker and sleep(), which do what their names say - again making use of the efficient promise handling to avoid busy-waiting or other inefficient methods.

In addition, many high-level facilities from libuv are implemented, mostly using consistent patterns.

Start using uvco

uvco is currently packaged as CMake package, and can be easily imported by other CMake projects. Once you’ve built and installed uvco like this:

$ mkdir build && cd build && cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo .. && cmake --build . && sudo cmake --install .

using it in your own project is as simple as using a CMakeLists.txt like this:

cmake_minimum_required(VERSION 3.20)
project(test-uvco)

find_package(Uvco)

set(CMAKE_CXX_STANDARD 23)

# This is your code:
add_executable(test-uvco test-uvco.cc)
target_link_libraries(test-uvco PRIVATE Uvco::uv-co-lib)

test-uvco.cc is typically structured like this:

// The includes come from your system directories:
#include "uvco/promise/promise.h"
#include "uvco/run.h"
#include "uvco/tcp.h"
#include "uvco/tcp_stream.h"

#include <fmt/core.h>
#include <optional>
#include <string>
#include <sys/socket.h>

using namespace uvco;

void run_loop() {
  uvco::runMain<void>([](const Loop &loop) -> uvco::Promise<void> {
    // Use the uvco features here!
  });
}

int main() {
  run_loop();
  return 0;
}

Bonus: Coroutine Transformation

cppinsights.io is a great tool for getting a feel for how coroutine code is transformed into “plain” C++. For example, if we have a super-barebones coroutine implementation:

#include <coroutine>

struct Promise;

struct Coroutine {
  Promise get_return_object();
  void unhandled_exception() {}
  std::suspend_never initial_suspend() { return {}; }
  std::suspend_never final_suspend()  noexcept { return {}; }  
};

struct Promise {
  using promise_type = Coroutine;
  
  bool await_ready() { return false; }
  void await_suspend(std::coroutine_handle<>) {}
  void await_resume() {}
};

Promise Coroutine::get_return_object() { return {}; }

Promise f(Promise x) {
  int testVariable = 3;
  co_await x;
}

int main() {
  Promise x;
  Promise y = f(x);
  return 0;
}

it will show us, with the correct option set, that the definition of Promise f(Promise) is transformed into:

/*************************************************************************************
 * NOTE: The coroutine transformation you've enabled is a hand coded transformation! *
 *       Most of it is _not_ present in the AST. What you see is an approximation.   *
 *************************************************************************************/
#include <coroutine>

struct Promise;

struct Coroutine
{
  Promise get_return_object();
  
  inline void unhandled_exception()
  {
  }
  
  inline std::suspend_never initial_suspend()
  {
    return {};
  }
  
  inline std::suspend_never final_suspend() noexcept
  {
    return {};
  }
  
  // inline constexpr Coroutine() noexcept = default;
};


struct Promise
{
  using promise_type = Coroutine;
  inline bool await_ready()
  {
    return false;
  }
  
  inline void await_suspend(std::coroutine_handle<void>)
  {
  }
  
  inline void await_resume()
  {
  }
  
  // inline constexpr Promise() noexcept = default;
  // inline constexpr Promise(const Promise &) noexcept = default;
  // inline constexpr Promise(Promise &&) noexcept = default;
};


Promise Coroutine::get_return_object()
{
  return {};
}


struct __fFrame
{
  void (*resume_fn)(__fFrame *);
  void (*destroy_fn)(__fFrame *);
  std::__coroutine_traits_impl<Promise>::promise_type __promise;
  int __suspend_index;
  bool __initial_await_suspend_called;
  Promise x;
  int testVariable;
  std::suspend_never __suspend_23_9;
  Promise __suspend_25_12;
  std::suspend_never __suspend_23_9_1;
};

Promise f(Promise x)
{
  /* Allocate the frame including the promise */
  /* Note: The actual parameter new is __builtin_coro_size */
  __fFrame * __f = reinterpret_cast<__fFrame *>(operator new(sizeof(__fFrame)));
  __f->__suspend_index = 0;
  __f->__initial_await_suspend_called = false;
  __f->x = std::forward<Promise>(x);
  
  /* Construct the promise. */
  new (&__f->__promise)std::__coroutine_traits_impl<Promise>::promise_type{};
  
  /* Forward declare the resume and destroy function. */
  void __fResume(__fFrame * __f);
  void __fDestroy(__fFrame * __f);
  
  /* Assign the resume and destroy function pointers. */
  __f->resume_fn = &__fResume;
  __f->destroy_fn = &__fDestroy;
  
  /* Call the made up function with the coroutine body for initial suspend.
     This function will be called subsequently by coroutine_handle<>::resume()
     which calls __builtin_coro_resume(__handle_) */
  __fResume(__f);
  
  
  return __f->__promise.get_return_object();
}

/* This function invoked by coroutine_handle<>::resume() */
void __fResume(__fFrame * __f)
{
  try 
  {
    /* Create a switch to get to the correct resume point */
    switch(__f->__suspend_index) {
      case 0: break;
      case 1: goto __resume_f_1;
      case 2: goto __resume_f_2;
    }
    
    /* co_await insights.cpp:23 */
    __f->__suspend_23_9 = __f->__promise.initial_suspend();
    if(!__f->__suspend_23_9.await_ready()) {
      __f->__suspend_23_9.await_suspend(std::coroutine_handle<Coroutine>::from_address(static_cast<void *>(__f)).operator std::coroutine_handle<void>());
      __f->__suspend_index = 1;
      __f->__initial_await_suspend_called = true;
      return;
    } 
    
    __resume_f_1:
    __f->__suspend_23_9.await_resume();
    __f->testVariable = 3;
    
    /* co_await insights.cpp:25 */
    __f->__suspend_25_12 = x;
    if(!__f->__suspend_25_12.await_ready()) {
      __f->__suspend_25_12.await_suspend(std::coroutine_handle<Coroutine>::from_address(static_cast<void *>(__f)).operator std::coroutine_handle<void>());
      __f->__suspend_index = 2;
      return;
    } 
    
    __resume_f_2:
    __f->__suspend_25_12.await_resume();
    goto __final_suspend;
  } catch(...) {
    if(!__f->__initial_await_suspend_called) {
      throw ;
    } 
    
    __f->__promise.unhandled_exception();
  }
  
  __final_suspend:
  
  /* co_await insights.cpp:23 */
  __f->__suspend_23_9_1 = __f->__promise.final_suspend();
  if(!__f->__suspend_23_9_1.await_ready()) {
    __f->__suspend_23_9_1.await_suspend(std::coroutine_handle<Coroutine>::from_address(static_cast<void *>(__f)).operator std::coroutine_handle<void>());
    return;
  } 
  
  __f->destroy_fn(__f);
}

/* This function invoked by coroutine_handle<>::destroy() */
void __fDestroy(__fFrame * __f)
{
  /* destroy all variables with dtors */
  __f->~__fFrame();
  /* Deallocating the coroutine frame */
  /* Note: The actual argument to delete is __builtin_coro_frame with the promise as parameter */
  operator delete(static_cast<void *>(__f));
}


int main()
{
  Promise x;
  Promise y = f(Promise(x));
  return 0;
}

That’s probably not great for the build times! I find most of it quite self-explanatory: if you follow the __fResume() function you can see how those magic co_await suspension points are transformed into a switch statement. You can also see the coroutine frame, stack variables in that frame, and its allocation using operator new. Understanding this is a huge help for a good mental model that you can refer to when debugging something tricky.