Functional JSON Access

Sat, Aug 26, 2023 tags: [ programming ocaml ]

While working on typst_of_jupyter, I came across a very common problem: efficiently and reliably accessing nested JSON structures. This is an age-old problem that every programmer will have come across, and it’s easier in some languages – dynamically typed ones like Python or JavaScript – and more annoying in others.

typst_of_jupyter is written in OCaml, which – being statically typed – rather falls into the second category. In order to generate typst code for a Jupyter notebook, the JSON structure representing the notebook must be processed. It is not a very complicated structure, but enough to be annoying to deal with in a plain way.

First Approach

My first approach was a small utility library; I would cast JSON dicts into associative lists (alists) (string * Json.t) list, where Json.t is the Yojson.Basic data type representing a JSON value:

type t =
        [ `Assoc of (string * t) list
        | `Bool of bool
        | `Float of float
        | `Int of int
        | `List of t list
        | `Null
        | `String of string ]

On top of it, I defined some ad-hoc functions, like

module JUtil = Yojson.Basic.Util

let find_assoc ~default l key = Option.value ~default (find_assoc_opt l key)

let cast_assoc = function `Assoc l -> l | x -> raise_type_error "dict" x
let cast_string = function `String s -> s | x -> raise_type_error "string" x
let cast_string_list = function
  | `List l -> ~f:cast_string l
  | `String s -> [ s ] (* lenient... *)
  | x -> raise_type_error "string list" x
let cast_int = JUtil.to_int
let cast_list = JUtil.to_list

let rec recursive_find obj path =
  match (obj, path) with
  | _, [] -> None
  | `Assoc obj, x :: [] -> find_assoc_opt obj x
  | `Assoc obj, x :: xs -> (
      match find_assoc_opt obj x with
      | Some o -> recursive_find o xs
      | None -> None)
  | _ -> None

That was kind of alright, but not satisfying. It would look crufty and always felt a bit improvised at the site of use, like here or here.

Yojson Util

If I had done a bit of research, or remembered the appropriate section from Real World OCaml, I would have used the Yojson.Basic.Util module:

module Util :
    val member : string -> Yojson.Basic.t -> Yojson.Basic.t
    val path : string list -> Yojson.Basic.t -> Yojson.Basic.t option
    val index : int -> Yojson.Basic.t -> Yojson.Basic.t
    val to_assoc : Yojson.Basic.t -> (string * Yojson.Basic.t) list
    val to_option : (Yojson.Basic.t -> 'a) -> Yojson.Basic.t -> 'a option
    val to_bool : Yojson.Basic.t -> bool

which allows writing nice access code such as

let get_nested_member k1 k2 j = j |> member k1 |> member k2

to extract a value from a nested dict such as

    "foo": {
        "bar": 123

using get_nested_member "foo" "bar", etc. That’s definitely a fairly elegant solution.

But I noticed some similarity to monads – although this pattern doesn’t fit monads, for all I know! – and thought of some more composable access methods. They amount to the same, but also enable more useful error messages for type errors or missing elements.

The Complex (but Elegant) Way

After trying to fit the problem into a monad type, and failing, I took a step back to think about the right type for this kind of problem. I arrived at the following:

module Json = Yojson.Basic

(* A Json document. *)
type doc = Json.t

module JR = struct
  type op = Key of string | As_assoc | As_int | [...]
  type 'a t = Value of 'a | Error of op list

(* Type [t] represents an operation on a Json (or other kind of) value that can
 potentially fail (not found, wrong type, etc.). The failure path can be
 tracked. Operations can be composed. *)
type ('a, 'b) t = { f : 'a -> 'b JR.t; op : JR.op }

This looks more complicated than it is. What it is is a potentially failing function, with JR.t basically being Result.t: either a successful value or an error.

f: 'a -> 'b JR.t

Most often, 'a is a doc, i.e. a JSON value, and b either a doc (if dealing with nested values) or a primitive type (int, string, float, …). So bear with me here: this basic type enables nice composition for peeking into complex JSON structures.

What we can do now is to start at basic definitions for extractors, as we can call values of type t:

(* Map a JSON value to an int, or fail. *)
let int : (doc, int) t =
  let op = JR.As_int in
  let f = function `Int i -> JR.value i | _ -> JR.error_root op in
  { f; op }
let string = ...
let float = ...

Given a JSON value 123 a.k.a. `Int 123, the integer can be extracted as such:

let j = `Int 123
let i = extract_exn int j

This is not spectacular.

However, the power of this approach lies in combining these extractors: For example, to obtain a dictionary entry:

val key: string -> (doc, 'a) t -> (doc, 'a) t

The extractor key accesses a dict element of that name and converts it into the given type. For example, assuming a JSON value

  "hello": "world",
  "foo": [ "bar", "baz" ],
  "a": 1,
  "b": {},
  "dict": { "inner": { "key": "value" }, "second": 123 }
let test_doc = Json.from_string {| ... json from above ... |}

the value "world" can be described by an extractor like this:

let access_world = key "hello" string

and the value 123 like this:

let access_123 = key "dict" dict >> key "second" int

where >> is the crucial chaining operator

(* Compose two operations, with the result of the first feeding the second. *)
val ( >> ) : ('a, 'b) t -> ('b, 'c) t -> ('a, 'c) t

This operator was actually the first piece I came up with, as it seemed like a suitable way to express paths into a JSON object together with conversions. The remaining operations present themselves very easily, such as combinators for extracting more than one element from a dict (both, either, alternative), directly descending into a nested dict (path, similar to Yojson.Basic.Util.path), reading lists (list_of, list_index), functional combinators (map, lift) etc.

The resulting extractor returned by these primitives and combinators is typically a value of type (doc, 'a) t, describing extracting a value of type 'a from a JSON value. The extractor can be run on a JSON document using one of the extract* functions:

(* raises exception if access failed (type error, key error, ...) *)
let string_world = extract_exn access_world test_doc
(* returns [Result.t] *)
let int_123 = match extract access_123 test_doc with
  | Ok v -> v
  | Error e -> printf "%s" e; assert false
(* etc. *)

Error Handling

I had promised that error reporting is better than for most improvised solutions. The cost paid for tracking the operations with their types (which involves a cell allocation and a list append) has the upside of seeing a full path for any error. In addition, the extractor itself doesn’t specify how the error is handled; the same extractor can be used with extract_exn, extract, or extract_or and result in different behaviors based on that.

A simple but common example: a dict key is not found. Still using the test JSON document introduced above, the extractor logic does its job:

let test_doc = Json.from_string {| (* ... as above ... *) |}
let () = match (extract (key "helloo" string) test_doc) with
  | Ok _ -> assert false
  | Error e -> printf "%s" e

will result in

json error: ((Key helloo))

The error is structured as a printed sexp, which in this case only consists of one element; the element (Key helloo) (one o too many!) says that the error occurred during a dict access for key helloo. Given the sexp’s flexibility, you can already predict more complex failures; for example a type error – expecting an int but getting a string:

let () = match (extract (path [ "dict"; "inner"; "key" ] int) test_doc) with
  | Ok _ -> assert false
  | Error e -> printf "%s" e
json error: ((Key dict)(Key inner)(Key key)As_int)

Here we see that the extractor attempted (and succeeded) in accessing the nested keys “dict”, then “inner”, then “key”, but eventually failed during a cast to integer (because the accessed value is a string). Obviously, an improvement to implement in future would be to also include the value attempted to access – but the message shown today is already helpful if you have the JSON document in front of you, trying to reconstruct what went wrong.

The >>? operator can be used together with the default primitive, providing a default value if the previous operation failed for some reason:

(* key not-exist does not exist. *)
let () = match (extract (key "not-exist" int >>? default 1234) test_doc) with
    | 1234 -> printf "ok\n"
    | _ -> assert false
(* key b is a dict. *)
let () = match (extract (key "b" int >>? default 1234) test_doc) with
    | 1234 -> printf "ok\n"
    | _ -> assert false

Another improvement whose usefulness I can’t judge yet is returning an sexp as error instead of a plain string. Handling errors programmatically may be interesting, but I expect that ultimately most JSON access errors are to be consumed by humans who will then fix the program (or the JSON).


So far, I have come up with the following module signature, exposing a number of primitives and combinators. With the examples above, you are probably able to understand what these functions are doing:

(* A Json Result type [t] along with error indications provided as [op list]. *)
module JR : sig
  type op
  type 'a t

(* A Json document. *)
type doc = Json.t

(* Type [t] represents an operation on a Json document that can
   potentially fail (not found, wrong type, etc.). The failure
   path can be tracked. Operations can be composed. *)
type ('a, 'b) t

exception Json_object_error of string

(* Run the given operation on the supplied Json. *)
val extract_exn : (doc, 'a) t -> doc -> 'a

(* Run the given operation on the supplied Json, returning an error string if it failed. *)
val extract : (doc, 'a) t -> doc -> ('a, string) result

(* Extract or return provided default value.*)
val extract_or : default:'a -> (doc, 'a) t -> doc -> 'a

(* Compose two operations, with the result of the first feeding the second. *)
val ( >> ) : ('a, 'b) t -> ('b, 'c) t -> ('a, 'c) t

(* Compose two operations, with the result of the first feeding the second. In case of an error during the first operation,
   the second will still be called. Use with [default]. *)
val ( >>? ) : ('a, 'b) t -> ('b JR.t, 'c) t -> ('a, 'c) t

(* Lift a function into [t] *)
val lift : ('a -> 'b) -> ('a, 'b) t

(* Transform the result of an operation. *)
val map : ('a, 'b) t -> f:('b -> 'c) -> ('a, 'c) t

(* Convert a Json value into an integer. *)
val int : (doc, int) t

(* Convert a Json value into a float. *)
val float : (doc, float) t

(* Convert a Json value into a float. *)
val string : (doc, string) t

(* Convert a Json value into a boolean. *)
val bool : (doc, bool) t

(* Convert a Json value into an alist *)
val assoc : (doc, (string, doc) Base.List.Assoc.t) t

(* Extract keys from a dict. *)
val keys : (doc, string list) t

(* Extract values from a dict *)
val values : (doc, doc list) t

(* Assert that a Json value is a dict. *)
val dict : (doc, doc) t

val list_index : int -> (doc, 'a) t -> (doc, 'a) t

(* Convert a Json value into a list of the given type. For example [list_of int]. An error is returned if any conversion fails. *)
val list_of : (doc, 'a) t -> (doc, 'a list) t

(* Convert a Json value into a list of the given type. Like [list_of]. Values failing conversion will be ignored. *)
val list_filtered_of : (doc, 'a) t -> (doc, 'a list) t

(* Extract a dict entry with the given key from the specified object. *)
val key : string -> (doc, 'a) t -> (doc, 'a) t

(* Extract a dict entry of type dict with the given key. Shortcut for [key "..." dict]. *)
val inner : string -> (doc, doc) t

(* Run both operations or fail. *)
val both : ('i, 'a) t -> ('i, 'b) t -> ('i, 'a * 'b) t

(* Run either of two operations, preferring the first and attempting the second otherwise. Useful if e.g. there are two keys to a dict entry of interest. *)
val either : ('i, 'a) t -> ('i, 'b) t -> ('i, ('a, 'b) Base.Either.t) t

(* Try both operations. *)
val alternative : ('i, 'a) t -> ('i, 'a) t -> ('i, 'a) t

(* Operator for [both] *)
val ( <+> ) : ('a, 'b) t -> ('a, 'c) t -> ('a, 'b * 'c) t

(* Operator for [either] *)
val ( <|*> ) : ('a, 'b) t -> ('a, 'c) t -> ('a, ('b, 'c) Base.Either.t) t

(* Operator for [alternative] *)
val ( <|> ) : ('a, 'b) t -> ('a, 'b) t -> ('a, 'b) t

(* If a previous extractor failed, use a default instead. Use with [(>>?)] *)
val default : 'a -> ('a JR.t, 'a) t

(* Quickly traverse a nested dict by specifying a path of keys and the type of the final value. *)
val path : string list -> (doc, 'a) t -> (doc, 'a) t

Current Use

The combinators are currently in light use for parsing Jupyter notebooks in the typst_of_jupyter project, as described initially. That project can be found on github at dermesser/typst_of_jupyter.

For example, a function to convert a code cell into a native record works as follows:

let cell_of_json j =
  let cell_type = extract_exn (key "cell_type" string) j in
  if not (String.equal cell_type "code") then
           "Code.cell_of_json only handles code cells but got",
             (cell_type : string)]);
    execount = extract_exn (key "execution_count" int) j;
    meta = cast_assoc @@ extract_exn (key "metadata" dict) j;
    source = String.concat @@ extract_exn (key "source" (list_of string)) j;
    outputs = parse_outputs (extract_exn (key "outputs" (list_of dict)) j);


If you would like to add functionality or send helpful feedback, please use the linked github project:

If the approach continues proving to be useful, I plan on creating a standalone opam package; for now, it feels that the functionality is not sufficient to do so.