Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for mixed blocks #2667

Merged
merged 5 commits into from
Jun 7, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 145 additions & 1 deletion ocaml/jane/doc/extensions/unboxed-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,4 +284,148 @@ Here's the list of primitives that currently support `[@layout_poly]`:
* `%array_safe_get`
* `%array_safe_set`
* `%array_unsafe_get`
* `%array_unsafe_set`
* `%array_unsafe_set`

# Using unboxed types in structures

Unboxed types can usually be put in structures, though there are some restrictions.

These structures may contain unboxed types, but have some restrictions on field
orders:
* Records
* Constructors

Unboxed numbers can't be put in these structures:
* Constructors with inline record fields
* Exceptions
* Extensible variant constructors
* Top-level fields of modules
* Tuples

There aren't fundamental issues with the structures that lack support. They will
just take some work to implement.

Here's an example of a record with an unboxed field. We call such a record
a "mixed record".

```ocaml
type t =
{ str : string;
i : int;
f : float#;
}
```

## Restrictions on field ordering

The below is written about record fields but equally applies to constructor
arguments.

Suppose a record contains any unboxed field `fld` whose layout is not `value`[^or-combination-of-values]. Then, the following restriction applies: All
fields occurring after `fld` in the record must be "flat", i.e. the GC can
skip looking at them. The only options for flat fields are immediates (i.e. things
represented as ints at runtime) and other unboxed numbers.

[^or-combination-of-values]: Technically, there are some non-value layouts that don't hit this restriction, like unboxed products and unboxed sums consisting only of values.

The following definition is rejected, as the boxed field `s : string` appears
after the unboxed float field `f`:

```ocaml
type t_rejected =
{ f : float#;
s : string;
}
(* Error: Expected all flat fields after non-value field, f,
but found boxed field, s. *)
```

The only relaxation of the above restriction is for records that consist
solely of `float` and `float#` fields. Any ordering of `float` and `float#`
fields is permitted. The "flat float record optimization" applies to any
such record—all of the fields are stored flat, even the `float` ones
that will require boxing upon projection. The ordering restriction is relaxed
in this case to provide a better migration story for all-`float` records
to which the flat float record optimization currently applies.

```ocaml
type t_flat_float =
{ x1 : float;
ncik-roberts marked this conversation as resolved.
Show resolved Hide resolved
x2 : float#;
x3 : float;
}
```

The ordering restriction has to do with the "mixed block" runtime
representation. Read on for more detail about that.

## Generic operations aren't supported

Some operations built in to the OCaml runtime aren't supported for structures
containing unboxed types.

These operations aren't supported:
* polymorphic comparison and equality
* polymorphic hash
* marshaling

These operations raise an exception at runtime, similar to how polymorphic
comparison raises when called on a function.

You should use ppx-derived versions of these operations instead.

## Runtime representation: mixed blocks

As a general principle: The compiler should not change the user-specified
field ordering when deciding the runtime representation.

Abiding by this principle allows you to write C bindings and
predict hardware cache performance.

A structure containing unboxed types is represented at runtime as a "mixed
block". A mixed block always consists of fields the GC can-or-must scan followed by
fields the GC can-or-must skip[^can-or-must]. The garbage collector must be kept
informed of which fields of the block it should scan. A portion of the header
word is reserved to track the length of the prefix of the block that should be
scanned by the garbage collector.

[^can-or-must]: "Can-or-must" is a bit of a mouthful, but it captures the right nuance. Pointer values *must* be scanned, unboxed number fields *must* be skipped, and immediate values *can* be scanned or skipped.

The ordering constraint on structure fields is a reflection of the same
ordering restriction in the runtime representation.

## C bindings for mixed blocks

The implementation of field layout in a mixed block is not finalized. For example, we'd like for int32 fields to be packed efficiently (two to a word) on 64 bit platforms. Currently that's not the case: each one takes up a word.

Users who write C bindings might want to be notified when we change this layout. To ensure that your code will need to be updated when the layout changes, use the `Assert_mixed_block_layout_v#` family of macros. For example,

```
Assert_mixed_block_layout_v1;
```

Write the above in statement context, i.e. either at the top-level of a file or
within a function.

Here's a full example. Say you're writing C bindings against this OCaml type:

```ocaml
(** foo.ml *)
type t =
{ x : int32#;
y : int32#;
}
```

Here is the recommend way to access fields:

```c
Assert_mixed_block_layout_v1;
#define Foo_t_x(foo) (*(int32_t*)&Field(foo, 0))
#define Foo_t_y(foo) (*(int32_t*)&Field(foo, 1))
```

We would bump the version number in either of these cases, which would prompt you to think about the code:

* We change what word half the int32 is stored in
* We start packing int32s more efficiently
Loading