Playing with OCaml Effects

Sun, Jul 23, 2023 tags: [ ocaml programming ]

A few weeks ago I finally started seriously working with OCaml, after having ignored it for a long time. The main reason was the lack for multithreading, which I felt is essential these days (even though most code, of course, doesn’t make use of it.) But OCaml 5 was released a few months ago, and has removed this barrier – along with introducing Effect handlers, a smooth way for handling all kinds of ways to hack a program’s control flow.

One common application exploiting Effect handlers is asynchronous I/O, which is now implemented in the eio framework. The functionality and mechanics are very similar to how Go or Rust implement asynchronous I/O: using “green threads”, “fibers”, respectively “tasks” which are managed by a central event loop monitoring sources of events, like sockets, files, or locks. One difference I noticed was that in eio the underlying implementation is a lot more accessible than in other languages; I have followed the development of Rust’s tokio from back in 2016, and while it was fairly transparent back then, it has exploded in complexity since then. The rapid pace of development hasn’t helped either, with features being scattered across the standard library and different crates. Although fortunately, the different tokio crates appear to have been merged by now…

Due to this transparency, it has been fairly simple (although not quick) to implement select functionality from the ground up for non-blocking and blocking “streams” (analogous to Go’s channels or Rust’s mpsc). Of course, atomic and lock-free programming has its pitfalls, but at no point was there a need to drop to unsafe code or hack on the runtime. I don’t quite expect those PRs to be ultimately merged, but it was a nice exercise anyway.

Generator functions

All this is fairly complicated though, so a small self-contained feature I wanted to implement based on the fundamental Effect handler framework is Python-style generator functions:

def generate_numbers(start=0):
    # Also known as "range()"
    i = start
    while True:
        yield i
        i += 1

def main():
    # Play fizz-buzz for all eternity (or until ints run out, whichever comes first)
    for i in generate_numbers():
        fizzbuzz = ""
        if i % 3 == 0:
            fizzbuzz = "fizz"
        if i % 5 == 0:
            fizzbuzz += "buzz"
        if not fizzbuzz:
            print(i)
        else:
            print(fizzbuzz)

But first, a quick overview of what Effect handlers can do for you.

The module signature of the Effect module is one of the few places with magic included:

module Effect :
  sig
    type _ t = ..
    exception Unhandled : 'a t -> exn
    exception Continuation_already_resumed
    external perform : 'a t -> 'a = "%perform"
    module Deep : sig ... end
    module Shallow : sig ... end
  end

The only function available is perform, which… well, performs an Effect. The effect is a variant of type t, and typically defined in a library. After performing an effect, control is transferred to an Effect handler. This is actually a lot like raising an exception! In fact, exceptions can be implemented using Effect handlers, and can be considered a special case of Effects. To my knowledge, perform is implemented in the runtime, as it will switch between execution stacks. That’s also where you can already see “fibers” peeking from behind the corner – fibers in eio (respectively OCaml) are independent execution stacks, and user code can switch between them.

The other module needed to implement the simple Python-style generators is the Effect.Shallow module, already contained in the previous snippet:

module Shallow :
  sig
    type ('a, 'b) continuation
    val fiber : ('a -> 'b) -> ('a, 'b) continuation
    type ('a, 'b) handler = {
      retc : 'a -> 'b;
      exnc : exn -> 'b;
      effc : 'c. 'c Effect.t -> (('c, 'a) continuation -> 'b) option;
    }
    val continue_with : ('c, 'a) continuation -> 'c -> ('a, 'b) handler -> 'b
    val discontinue_with :
      ('c, 'a) continuation -> exn -> ('a, 'b) handler -> 'b
    val discontinue_with_backtrace :
      ('a, 'b) continuation ->
      exn -> Printexc.raw_backtrace -> ('b, 'c) handler -> 'c
    external get_callstack :
      ('a, 'b) continuation -> int -> Printexc.raw_backtrace
      = "caml_get_continuation_callstack"
  end

Personally, the Deep module is a bit more intuitive, but Shallow is more appropriate for this case. The central function is continue_with, which runs a function (its first argument) until the function either returns, raises an exception, or performs an effect. For each case, one of the respective handler functions from the supplied handler will be invoked. This is roughly analogous to the except clause when handling exceptions.

What does the Python example above do?

  1. Run generate_numbers(), until yield is encountered. At that point, control is transferred back to the caller.
  2. Continue running the caller.
  3. When asking for the next value, continue running generate_numbers() where it yielded before.
  4. Once yield is encountered again, control is transferred to the caller yet again.

It took me some thinking on how to best translate this into Effect handler logic, but it is actually not very complex. First, obviously, we can define the types. coro represents what Python calls a Generator object, i.e. a suspended coroutine which can be run, yield values, suspended, and continued. Note that Python (and other implementations) allow bidirectional communication between a caller and a called coroutine – for simplicity and laziness, I’ve only worked out the generator use case.

Uninit_called will never be thrown unless we made a mistake in the implementation. corostate is only used internally, and stores if a coroutine has finished or is still active. The coro type is what actually matters for users of this code, and allows extracting values from the generator.

exception Uninit_called

type 'a corostate = Uninit | Finished | Suspended of (unit, unit) continuation
type 'a coro = { awaiter : 'a corostate -> 'a option; st : 'a corostate ref }

The implementation looks a bit convoluted at first, but is straight-forward after some following of the control flow:

let spawn_generator (type elt) (f : (elt -> unit) -> unit) : elt coro =
  let module M = struct
    type _ Effect.t += Yield : elt -> unit Effect.t
  end in
  let state = ref Uninit in
  let yielder a = Effect.perform (M.Yield a) in
  state := Suspended (fiber (fun () -> f yielder));
  let awaiter st =
    match st with
    | Uninit -> raise Uninit_called
    | Finished -> None
    | Suspended cont ->
        continue_with cont ()
          {
            retc =
              (fun () ->
                state := Finished;
                None);
            exnc = (fun e -> raise e);
            effc =
              (fun (type b) (eff : b Effect.t) ->
                match eff with
                | M.Yield e ->
                    Some
                      (fun (k : (b, _) continuation) ->
                        state := Suspended k;
                        Some e)
                | _ -> assert false);
          }
  in
  { awaiter; st = state }

spawn_generator takes a function with one argument, the “yielder”. The yielder allows the function to yield values – analogous to Python’s yield. As a result, a coro value is returned, which allows for collecting the yielded values:

let await cst = cst.awaiter !(cst.st)

So before examining the spawn_generator function more closely, we can replicate the Python example above - I’ve written it as inline-test which makes it easier to run (even though it always succeeds):

let%test "coro-fizzbuzz" =
  let f yield =
    for i = 1 to 100 do
      yield i
    done
  in
  let gen = spawn_generator f in
  let rec fb gen =
    match await gen with
    | Some i ->
        if Int.rem i 3 == 0 then (
          Printf.printf "fizz";
          if Int.rem i 5 == 0 then Printf.printf "buzz";
          Printf.printf "\n")
        else if Int.rem i 5 == 0 then
          Printf.printf "buzz\n"
        else Printf.printf "%d\n" i;
        fb gen
    | None -> ()
  in
  fb gen;
  true

f yield is the analogous function to generate_numbers(), and with the right function naming, even the syntax feels similar. Otherwise, the function should be fairly obvious, at least if you’ve worked with generators in Python before.

But how does the coroutine mechanics work behind the scenes? Let’s go through spawn_generator, step by step:

  1. A local module is defined with the right effect. This is necessary so that the Effect is specialized to the element type elt. In fact this trick was a major hurdle when playing with some of my own ideas, until I found it in the nicely written Effects documentation. The Effect.t extensible variant type is extended with the required variant. The GADT has type unit Effect.t because the generator won’t receive values from the caller.

    The Effect’s type parameter ends up as the argument type of the continuation – see the definition of the Effect.Shallow module above! It is not trivial to grasp the type interactions at first; for shallow handlers an intuitive understanding might go like this:

    • The effect handler is only called once; it returns a value that goes back to the original caller. (As opposed to Deep handlers, which once installed also handle effects from calling continuations later on)
    • In ('a, 'b) handler, 'a is the return type of the continuation supplied to continue_with. retc maps from 'a to 'b. 'b is the ultimate return type of continue_with, i.e. what is returned to the original caller, whether from effc or retc.
    • 'c as appears in effc is the GADT type parameter of the effect. Through perform, it determines the value returned by perform to the code performing an effect.
    • Note the difference to the Deep Effects API – there, continuations have the original handler attached, which changes the continuation’s type – from what I see, by mapping the continuation’s original result using retc.
  2. A (mutable) state variable is initialized; it will hold the continuation, which allows resuming the generator function at the point of its last yield.

  3. The yielder function is defined, which as promised simply performs the Yield effect. It will be called by the generator.

  4. The state is initialized by setting the initial continuation. fiber is a helper function in the Shallow module wrapping a normal function in a continuation, and giving it its own stack.

  5. Then, the awaiter function is defined, which will actually orchestrate the coroutine. If the generator were already finished or not yet set up, we could return early, but for now we are on the happy path. First, the continuation cont is called. At first, this is simply the top-level generator function set in step 4. At some point, the function will call yielder. Then, the effc handler is called, which is only set up to handle a single effect for now, Yield. The handler returns another function taking a continuation (this is where the brain starts knotting itself up); this function is called by the Effects runtime, allowing us to store the current generator state, and return the yielded element to the calling function.

  6. Next time the awaiter is called, the continuation cont has been replaced with one that will resume the generator. After resumption, it doesn’t take long for it to yield the next value.

  7. At some point, the generator runs out of values. Then the retc handler is called, which sets the state to Finished and will return None. At that point, the calling function (like our fizzbuzz) knows not to continue trying.

A small helper is defined finally:

let await cst = cst.awaiter !(cst.st)

All this comes together to a fairly small interface:

(* Generator.mli *)
type 'a coro

val spawn_generator : (('a -> unit) -> unit) -> 'a coro
val await : 'a coro -> 'a option

All in all, this example is a bit forced. The way Python uses generators is more suited to an imperative language; this can be done in OCaml too, but is not the most natural way to go about it, and Seq is already fairly close in terms of emulating coroutines – but also more explicit.

Iterator/Generator conversion

A classic example I’ve reimplemented, and which actually helped a lot in arriving at the implementation of generator functions above, is the automatic conversion between an iterator function

type ('el, 'container) iterator = ('el -> unit) -> 'container -> unit

to a generator

type 'el generator = unit -> 'el option

An iterator calls your supplied code, while a generator lets you call it to yield value by value. This is very similar to the first example, and the “generator function” in this case is the iterator type. Making a generator out of an iterator is unsurprisingly very similar to the generator function example above, with just a small difference in the API:

let generator_of_iterator (type el c) (iter : (el, c) iterator) (container : c)
    =
  let module M = struct
    type _ Effect.t += Generator_yield : el -> unit Effect.t

    type ('e, 'c) generator_state =
      | Uninitialized of (el, c) iterator
      | In_progress of (unit, unit) continuation
  end in
  let yielder e = ignore (Effect.perform (M.Generator_yield e)) in
  let state = ref (M.Uninitialized iter) in
  let effect_handler (type b) (eff : b Effect.t) =
    match eff with
    | M.Generator_yield e ->
        Some
          (fun (k : (b, _) continuation) ->
            state := M.In_progress k;
            Some e)
    | _ -> assert false
  in
  let generator () =
    match !state with
    | M.In_progress cont ->
        (* Continue iterator, with specified effect handler. *)
        continue_with cont ()
          {
            retc = (fun _ -> None);
            exnc = (fun e -> raise e);
            effc = effect_handler;
          }
    | M.Uninitialized iter ->
        (* Start running iterator. *)
        continue_with
          (fiber @@ iter yielder)
          container
          {
            retc = (fun _ -> None);
            exnc = (fun e -> raise e);
            effc = effect_handler;
          }
  in
  generator

With the previous explanation, you should be able to quickly make sense of this too. In fact, it boils down to quite the same behavior, and can be implemented using the first example too:

let () =
    let mylist = [1; 2; 3] in
    let gen = generator_of_iterator List.iter mylist;
    ...

(* is equivalent to *)
let () =
    let mylist = [1; 2; 3] in
    let iter yield = List.iter yield mylist in
    let coro_gen = spawn_generator iter in
    let gen () = await coro_gen in
    ...