Playing with OCaml Effects
A few weeks ago I finally started seriously working with OCaml, after having ignored it for a long time. The main reason was the lack for multithreading, which I felt is essential these days (even though most code, of course, doesn’t make use of it.) But OCaml 5 was released a few months ago, and has removed this barrier – along with introducing Effect handlers, a smooth way for handling all kinds of ways to hack a program’s control flow.
One common application exploiting Effect handlers is asynchronous I/O, which is now implemented in the
eio framework. The functionality and mechanics are very similar to how Go or
Rust implement asynchronous I/O: using “green threads”, “fibers”, respectively “tasks” which are managed by a central
event loop monitoring sources of events, like sockets, files, or locks. One difference I noticed was that in eio the
underlying implementation is a lot more accessible than in other languages; I have followed the development of Rust’s
tokio
from back in 2016, and while it was fairly transparent back then, it has exploded in complexity since then. The
rapid pace of development hasn’t helped either, with features being scattered across the standard library and different
crates. Although fortunately, the different tokio crates appear to have been merged by now…
Due to this transparency, it has been fairly simple (although not quick) to implement select
functionality from the
ground up for non-blocking and
blocking “streams” (analogous to Go’s channels or Rust’s mpsc). Of
course, atomic and lock-free programming has its pitfalls, but at no point was there a need to drop to unsafe code or
hack on the runtime. I don’t quite expect those PRs to be ultimately merged, but it was a nice exercise anyway.
Generator functions
All this is fairly complicated though, so a small self-contained feature I wanted to implement based on the fundamental Effect handler framework is Python-style generator functions:
def generate_numbers(start=0):
# Also known as "range()"
i = start
while True:
yield i
i += 1
def main():
# Play fizz-buzz for all eternity (or until ints run out, whichever comes first)
for i in generate_numbers():
fizzbuzz = ""
if i % 3 == 0:
fizzbuzz = "fizz"
if i % 5 == 0:
fizzbuzz += "buzz"
if not fizzbuzz:
print(i)
else:
print(fizzbuzz)
But first, a quick overview of what Effect handlers can do for you.
The module signature of the Effect
module is one of the few places with magic included:
module Effect :
sig
type _ t = ..
exception Unhandled : 'a t -> exn
exception Continuation_already_resumed
external perform : 'a t -> 'a = "%perform"
module Deep : sig ... end
module Shallow : sig ... end
end
The only function available is perform
, which… well, performs an Effect. The effect is a variant of type t
, and
typically defined in a library. After performing an effect, control is transferred to an Effect handler. This is
actually a lot like raising an exception! In fact, exceptions can be implemented using Effect handlers, and can be
considered a special case of Effects. To my knowledge, perform
is implemented in the runtime, as it will switch
between execution stacks. That’s also where you can already see “fibers” peeking from behind the corner – fibers in eio
(respectively OCaml) are independent execution stacks, and user code can switch between them.
The other module needed to implement the simple Python-style generators is the Effect.Shallow
module, already
contained in the previous snippet:
module Shallow :
sig
type ('a, 'b) continuation
val fiber : ('a -> 'b) -> ('a, 'b) continuation
type ('a, 'b) handler = {
retc : 'a -> 'b;
exnc : exn -> 'b;
effc : 'c. 'c Effect.t -> (('c, 'a) continuation -> 'b) option;
}
val continue_with : ('c, 'a) continuation -> 'c -> ('a, 'b) handler -> 'b
val discontinue_with :
('c, 'a) continuation -> exn -> ('a, 'b) handler -> 'b
val discontinue_with_backtrace :
('a, 'b) continuation ->
exn -> Printexc.raw_backtrace -> ('b, 'c) handler -> 'c
external get_callstack :
('a, 'b) continuation -> int -> Printexc.raw_backtrace
= "caml_get_continuation_callstack"
end
Personally, the Deep
module is a bit more intuitive, but Shallow
is more appropriate for this case. The central
function is continue_with
, which runs a function (its first argument) until the function either returns, raises an
exception, or performs an effect. For each case, one of the respective handler functions from the supplied handler
will be invoked. This is roughly analogous to the except
clause when handling exceptions.
What does the Python example above do?
- Run
generate_numbers()
, untilyield
is encountered. At that point, control is transferred back to the caller. - Continue running the caller.
- When asking for the next value, continue running
generate_numbers()
where it yielded before. - Once
yield
is encountered again, control is transferred to the caller yet again.
It took me some thinking on how to best translate this into Effect handler logic, but it is actually not very complex.
First, obviously, we can define the types. coro
represents what Python calls a Generator object
, i.e. a suspended
coroutine which can be run, yield values, suspended, and continued. Note that Python (and other implementations) allow
bidirectional communication between a caller and a called coroutine – for simplicity and laziness, I’ve only worked out
the generator use case.
Uninit_called
will never be thrown unless we made a mistake in the implementation. corostate
is only used
internally, and stores if a coroutine has finished or is still active. The coro
type is what actually matters for
users of this code, and allows extracting values from the generator.
exception Uninit_called
type 'a corostate = Uninit | Finished | Suspended of (unit, unit) continuation
type 'a coro = { awaiter : 'a corostate -> 'a option; st : 'a corostate ref }
The implementation looks a bit convoluted at first, but is straight-forward after some following of the control flow:
let spawn_generator (type elt) (f : (elt -> unit) -> unit) : elt coro =
let module M = struct
type _ Effect.t += Yield : elt -> unit Effect.t
end in
let state = ref Uninit in
let yielder a = Effect.perform (M.Yield a) in
state := Suspended (fiber (fun () -> f yielder));
let awaiter st =
match st with
| Uninit -> raise Uninit_called
| Finished -> None
| Suspended cont ->
continue_with cont ()
{
retc =
(fun () ->
state := Finished;
None);
exnc = (fun e -> raise e);
effc =
(fun (type b) (eff : b Effect.t) ->
match eff with
| M.Yield e ->
Some
(fun (k : (b, _) continuation) ->
state := Suspended k;
Some e)
| _ -> assert false);
}
in
{ awaiter; st = state }
spawn_generator
takes a function with one argument, the “yielder”. The yielder allows the function to yield values –
analogous to Python’s yield
. As a result, a coro
value is returned, which allows for collecting the yielded values:
let await cst = cst.awaiter !(cst.st)
So before examining the spawn_generator
function more closely, we can replicate the Python example above - I’ve
written it as inline-test which makes it easier to run (even though it always succeeds):
let%test "coro-fizzbuzz" =
let f yield =
for i = 1 to 100 do
yield i
done
in
let gen = spawn_generator f in
let rec fb gen =
match await gen with
| Some i ->
if Int.rem i 3 == 0 then (
Printf.printf "fizz";
if Int.rem i 5 == 0 then Printf.printf "buzz";
Printf.printf "\n")
else if Int.rem i 5 == 0 then
Printf.printf "buzz\n"
else Printf.printf "%d\n" i;
fb gen
| None -> ()
in
fb gen;
true
f yield
is the analogous function to generate_numbers()
, and with the right function naming, even the syntax feels
similar. Otherwise, the function should be fairly obvious, at least if you’ve worked with generators in Python before.
But how does the coroutine mechanics work behind the scenes? Let’s go through spawn_generator
, step by step:
-
A local module is defined with the right effect. This is necessary so that the Effect is specialized to the element type
elt
. In fact this trick was a major hurdle when playing with some of my own ideas, until I found it in the nicely written Effects documentation. TheEffect.t
extensible variant type is extended with the required variant. The GADT has typeunit Effect.t
because the generator won’t receive values from the caller.The Effect’s type parameter ends up as the argument type of the continuation – see the definition of the
Effect.Shallow
module above! It is not trivial to grasp the type interactions at first; for shallow handlers an intuitive understanding might go like this:- The effect handler is only called once; it returns a value that goes back to the original caller. (As opposed to Deep handlers, which once installed also handle effects from calling continuations later on)
- In
('a, 'b) handler
,'a
is the return type of the continuation supplied tocontinue_with
.retc
maps from'a
to'b
.'b
is the ultimate return type ofcontinue_with
, i.e. what is returned to the original caller, whether fromeffc
orretc
. 'c
as appears ineffc
is the GADT type parameter of the effect. Throughperform
, it determines the value returned byperform
to the code performing an effect.- Note the difference to the
Deep
Effects API – there, continuations have the original handler attached, which changes the continuation’s type – from what I see, by mapping the continuation’s original result usingretc
.
-
A (mutable) state variable is initialized; it will hold the continuation, which allows resuming the generator function at the point of its last yield.
-
The
yielder
function is defined, which as promised simply performs theYield
effect. It will be called by the generator. -
The state is initialized by setting the initial continuation.
fiber
is a helper function in theShallow
module wrapping a normal function in a continuation, and giving it its own stack. -
Then, the
awaiter
function is defined, which will actually orchestrate the coroutine. If the generator were already finished or not yet set up, we could return early, but for now we are on the happy path. First, the continuationcont
is called. At first, this is simply the top-level generator function set in step 4. At some point, the function will callyielder
. Then, theeffc
handler is called, which is only set up to handle a single effect for now,Yield
. The handler returns another function taking a continuation (this is where the brain starts knotting itself up); this function is called by the Effects runtime, allowing us to store the current generator state, and return the yielded element to the calling function. -
Next time the
awaiter
is called, the continuationcont
has been replaced with one that will resume the generator. After resumption, it doesn’t take long for it to yield the next value. -
At some point, the generator runs out of values. Then the
retc
handler is called, which sets the state toFinished
and will returnNone
. At that point, the calling function (like our fizzbuzz) knows not to continue trying.
A small helper is defined finally:
let await cst = cst.awaiter !(cst.st)
All this comes together to a fairly small interface:
(* Generator.mli *)
type 'a coro
val spawn_generator : (('a -> unit) -> unit) -> 'a coro
val await : 'a coro -> 'a option
All in all, this example is a bit forced. The way Python uses generators is more suited to an imperative language; this
can be done in OCaml too, but is not the most natural way to go about it, and Seq
is already fairly close in terms of
emulating coroutines – but also more explicit.
Iterator/Generator conversion
A classic example I’ve reimplemented, and which actually helped a lot in arriving at the implementation of generator functions above, is the automatic conversion between an iterator function
type ('el, 'container) iterator = ('el -> unit) -> 'container -> unit
to a generator
type 'el generator = unit -> 'el option
An iterator calls your supplied code, while a generator lets you call it to yield value by value. This is very similar
to the first example, and the “generator function” in this case is the iterator
type. Making a generator out of an
iterator is unsurprisingly very similar to the generator function example above, with just a small difference in the
API:
let generator_of_iterator (type el c) (iter : (el, c) iterator) (container : c)
=
let module M = struct
type _ Effect.t += Generator_yield : el -> unit Effect.t
type ('e, 'c) generator_state =
| Uninitialized of (el, c) iterator
| In_progress of (unit, unit) continuation
end in
let yielder e = ignore (Effect.perform (M.Generator_yield e)) in
let state = ref (M.Uninitialized iter) in
let effect_handler (type b) (eff : b Effect.t) =
match eff with
| M.Generator_yield e ->
Some
(fun (k : (b, _) continuation) ->
state := M.In_progress k;
Some e)
| _ -> assert false
in
let generator () =
match !state with
| M.In_progress cont ->
(* Continue iterator, with specified effect handler. *)
continue_with cont ()
{
retc = (fun _ -> None);
exnc = (fun e -> raise e);
effc = effect_handler;
}
| M.Uninitialized iter ->
(* Start running iterator. *)
continue_with
(fiber @@ iter yielder)
container
{
retc = (fun _ -> None);
exnc = (fun e -> raise e);
effc = effect_handler;
}
in
generator
With the previous explanation, you should be able to quickly make sense of this too. In fact, it boils down to quite the same behavior, and can be implemented using the first example too:
let () =
let mylist = [1; 2; 3] in
let gen = generator_of_iterator List.iter mylist;
...
(* is equivalent to *)
let () =
let mylist = [1; 2; 3] in
let iter yield = List.iter yield mylist in
let coro_gen = spawn_generator iter in
let gen () = await coro_gen in
...