bitvec Memory Model

bitvec addresses individual bits, while computer hardware addresses register elements. As a result, bitvec has a more complex memory model than the Rust language does. The library implementation strives to satisfy users’ expectations, the Rust language’s rules, and performance in the produced artifact to the best solution for all parties, with as few compromises as possible. Unfortunately, this has the side effect of increasing the complexity of the codebase, both for maintainers and for readers.

This chapter explains the abstract concepts of the bitvec memory model and the means by which it is encoded in the Rust language without running afoul of Rust or LLVM rules that threaten undefined behavior.

The bitvec memory model is typed entirely within the store module’s BitStore trait definition and implementations. It utilizes the access module’s BitAccess trait to mediate memory access events through types that are known to be correct in the Rust and LLVM models, so that at any point in program execution, memory access will be consistent and sound.

In addition, the domain module’s types provide views which manipulate the store model to maximize performance. This chapter will discuss primarily store, and use domain only to provide examples of how store is used in practice to accomplish the library’s implementation.

Aliasing

To Rust and LLVM, “aliasing” occurs whenever there are two paths to a memory region, and at least one of them has write privileges. In Rust, this is represented by the &mut reference exclusion rule: it is always, always, always Undefined Behavior in the Rust memory model to have two references which can reach a memory element if at least one of them is marked &mut.

LLVM, which was created for, is written in, and culturally shaped by C++, takes a similar view with its noalias annotation, but struggles to enforce it as thoroughly as Rust does.

bitvec takes a similar view of the abstract meaning of the &mut reference type, but where the Rust memory model focuses on whole units T, and has no concept of subdivision from T into the bits that compose it, bitvec views each individual bit as a standalone, independent, atom of memory. It excludes the creation of two &mut BitSlice reference handles that are capable of viewing the same bit, but will happily produce two &mut BitSlice handles which are mutually-exclusive in bits, but reference the same location in memory.

Here we come to the first problem with the conflicting memory models: bitvec cannot ever create an &mut T through which it may write to memory, because it has no way of ensuring that no other &T or &mut T reference exists which is capable of viewing the memory region into which it writes.

Rust Shared Mutation

The problem isn’t just that the Rust standard library doesn’t offer any non-unsafe APIs to produce such references. The problem is that this is about the most illegal thing you can do in the eyes of the Rust compiler, and if it ever detects this transgression, it has full liberty to either reject, or worse, accept and miscompile, your program.

In Gankra’s immortal words on the topic of attempting to sneak past the compiler’s commandments,

  • Transmuting an & to &mut is always UB
  • No you can’t do it
  • No you’re not special

The solution is simple: Rust exposes a type, UnsafeCell, which is the axiomatic type for mutation through shared views. In the Rust memory model, any reference to a region of memory marked with UnsafeCell is permitted to write to it, and if any other references can view that region, then it is up to them to ensure consistent behavior.

This is, of course, not quite true in LLVM’s memory model, but we are not there yet.

Since an &mut BitSlice<T, _> handle cannot produce an &mut T reference to perform writes to memory, it must instead either use a *mut T bare pointer, which has absolutely no checks or optimizations whatsoëver, or use an &UnsafeCell<T> shared reference, which has all the usual guarantees present on all1 reference types.

All UnsafeCell does is instruct the Rust compiler to politely look the other way about your program’s memory accesses. It is somewhat like the C keyword volatile, in that the compiler no longer believes that reads are stateless, or freely reörderable, but entirely unlike that keyword in that the compiler doesn’t have any obligation to keep your reads from or writes to such regions.

Rust provides an additional type called Cell. This is a wrapper over UnsafeCell2 that defines a more useful API, including the only safe and guaranteed way to write into memory through a shared reference: Cell::set.

And voilà: &mut BitSlice<T, _> simply constructs &Cell<T> when writing, the Rust compiler does not see a violation of the &mut exclusion principle, and we are done.

LLVM Shared Mutation

No we’re not. If it was that easy, there wouldn’t be a trait system or this document dedicated specifically to dealing with this problem.

Cell is not thread-safe, because Cell does not modify the instructions used to access memory. It produces ordinary load and store instructions, carefree and ignorant of the bane of everyone who wants consistency and the delight of everyone who wants performance: concurrency.

Just as it is undefined behavior in Rust to manifest two &mut references that can view the same location, it is equally undefined behavior in LLVM to manifest two pointers into memory that ever, at all, no matter what, perform any memory access to the same location, at the same time, on multiple executors.

As with above:

  • Unsynchronized writes are always UB.
  • No you can’t do it.
  • No you’re not special.

LLVM has an even more insidious punishment for this transgression that Rust does not directly express: unsynchronized reads from a data race produce poison. Poison is a nifty concept, because it’s not illegal to obtain one. When LLVM gives you a poison, your program continues undergoing compilation as if nothing had happened. You can pass it around. You can write to it, and if you destroy it before reading, you’re fine.

As soon as you attempt to read the bit-wise value of poison, your program is undefined3.

So if bitvec wants to be threadsafe, which it does, and it wants to insist on its ability to safely alias the same memory location from multiple handles, which is non-negotiable, there’s only one avenue left to take.

Atomic Powered Microscopes

Rust doesn’t actually model atomics. It doesn’t have to do so: no harm can ever come from multiple handles reading out of the same immutable location, harm can only occur when writes are observable, and writes are not observable due to the &mut exclusion rule. Well, except for UnsafeCell, so everything that has an UnsafeCell in it gets marked as !Send, & references can’t cross threads, and the whole problem is avoided.

This is, of course, not good enough. Concurrent, mutable, access to a location is an important property in a computer. LLVM provides atomic types, which Rust transparently exports as wrappers over UnsafeCell that have their Sync implementation restored. Handles to a region marked as atomic will use some form of hardware-provided exclusion in order to preserve the one-writer-XOR-any-readers system behavior, and all is well.

Hardware-level exclusion has the unfortunate penalty of being, to put it lightly, “slow”. So while it’s the safest choice to be correct, and was in fact the default universal choice for all memory access in bitvec for some time, its costs4 called for a more prudent behavior.

It’s time for a new trick, something that the Rust compiler does at the region level, and that bitvec now does at the element level:

Run Time Alias Analysis

BitSlice handles can only be constructed from ordinary Rust references to memory, which Rust guarantees start out unaliased. The potential for aliasing only occurs when a &mut BitSlice is split into multiple subslices using any of the functions that eventually call .split_at_unchecked_mut(). Since that is the root function that introduces alias conditions, it returns subslices whose memory type parameters are tainted with the ::Alias marker. It has the following type signature:

#![allow(unused)]
fn main() {
impl<T, O> BitSlice<T, O>
where O: BitOrder, T: BitStore {
  pub fn split_at_unchecked_mut(&mut self, at: usize) -> (
    &mut BitSlice<T::Alias, O>,
    &mut BitSlice<T::Alias, O>,
  );
}
}

The BitStore trait defines an ::Alias associated type, which ensures that all memory accesses through it have appropriate aliasing guards. For builds which do not use the atomic feature (or where the target does not have an atomic variant of the integer), this is Cell, and its protections are the loss of thread-safety. For builds that do permit atomics, the marker enforces that all reads and writes use atomic instructions.

The ::Alias marker is applied, at compile time, by operations that split &mut BitSlice references into multiple coëxisting subslices. This is a good first step to reducing unnecessary synchrony, but not good enough. Consider the following:

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let mut data = [0u8; 2];
let bits = data.view_bits_mut::<Lsb0>();
let (one, two) = data.split_at_mut(8);
}

It so happens that this is equivalent to splitting data first, then viewing each subslice as bits, but this transformation can only be done if the partition point is known at compile-time to fall on an element boundary. There is no need for the subslices one and two to use alias-safe operations, as accesses to memory through them do not conflict with each other.

Bit Slice Domains

The in-memory domain of any bit slice can be generalized to one of two formats: either the slice touches zero edge-bits (0 or T::BITS - 1), or it touches at edge-bit in at least one element. Consider three bytes of memory (any element will do, but the extra width on this page is unnecessary), with some bitslice regions drawn within them:

|00000000│11111111│22222222| Element
|76543210│76543210│76543210│ Bit
├────────┼────────┼────────┤
┆        ┆        ┆        ┆ 1
┆        ╞═════╡  ┆        ┆ 2
┆        ┆ ╞════╡ ┆        ┆ 3
┆        ┆  ╞═════╡        ┆ 4
┆   ╞═════════╡   ┆        ┆ 5
┆    ╞════════════╡        ┆ 6
┆        ╞═════════════╡   ┆ 7
┆     ╞═════════════════╡  ┆ 8
╞══════════════════════════╡ 9

There are nine example slices here, but they can be reduced into six specific categories, and two general ones:

  1. empty: row 1
  2. partially-spanning tail, no body: row 2
  3. minor (interior of an element, no edge indices): row 3
  4. partially-spanning head, no body: rows 4
  5. major (partial head, partial tail, no body): row 5
  6. partially-spanning head, fully-spanning body: row 6
  7. partially-spanning tail, fully-spanning body: row 7
  8. major (partial head, partial tail, full body): row 8
  9. spanning: row 9

The minor slice (row 3) is irreducible; the rest can all be divided into three subcomponents:

  • zero or one partially-occupied head element, where the slice touches the last index in it but not the first
  • zero or more fully-occupied middle elements, where the slice touches all indices in each
  • zero or one partially-occupied tail element, where the slice touches the first index in it but not the last

We can break each row down into these components:

RowHeadBodyTail
1None0None
2None0Some
4Some0None
5Some0Some
6Some1None
7None1Some
8Some1Some
9None3None

We can observe that where a slice fully-spans some elements, those elements cannot be mutated by any other reference. In the &BitSlice case, all &muts are forbidden by the compiler’s ordinary rules; in the &mut BitSlice case, bitvec’s obedience to those same rules forbids any other handle from observing the bits covered by the &mut BitSlice. As such, it is statically impossible for any alias to exist to the described memory in row 9, or for any alias to observe element 1 of rows 6 through 8.

bitvec happily permits element-aliasing &mut BitSlice references to observe the partially-filled elements in the outer columns, and middle column of rows 2 through 5, so writes to them must remain synchronized through either single-threaded Cell or concurrency-safe atomics. These domain components can be calculated from the three components of a slice pointer: the base address, the starting bit index, and the bit count.

This is expressed in the domain module’s twt enums.

BitDomain

This enum splits any BitSlice region into its subcomponents, immutably or mutably, respectively. It has the (rough) definition

#![allow(unused)]
fn main() {
pub enum BitDomain<'a, M, T, O>
where
  M: Mutability,
  T: BitStore,
  O: BitOrder,
{
  Enclave(&'a /* mut */ BitSlice<T, O>),
  Region {
    head: &'a /* mut */ BitSlice<T, O>,
    body: &'a /* mut */ BitSlice<T::Unalias, O>,
    tail: &'a /* mut */ BitSlice<T, O>,
  },
}
}

and, rather than granting direct memory access, merely removes any aliasing markers from as much memory as possible. The subslices that partially fill their base element do not need to add an additional aliasing marker, as the marker is only required when writes to the element may collide. If the slice is immutable, aliasing never occurs, so synchrony is never required. If the slice is mutable, then the only way to get a partial edge slice is to either forget about some bits from the main slice, which is not an alias event, or to split the slice, which is, and splitting already marks the alias.

Domain

The bit domains are still bit slices, and do not offer a way to access the backing memory. For operations where raw memory access is required, this enum produces the same domain definitions, but typed for the bare elements rather than their bits.

It has the (rough) definition

#![allow(unused)]
fn main() {
pub enum Domain<'a, M, T>
where
  M: Mutability,
  T: BitStore,
{
  Enclave(PartialElement<&'a T>),
  Region {
    head: Option<PartialElement<&'a T>>,
    body: &'a /* mut */ [T::Unalias],
    tail: Option<PartialElement<&'a T>>,
  },
}
}

(The PartialElement type prevents accessing bits that do not belong to the originating BitSlice.)

As with the bit domains, these domains will inherit any aliasing markers from their source bitslice. The ::Alias associated type enables the mutable domain to produce references that allow mutation without adding an unnecessary aliasing marker. Rust strongly forbids the production of &mut references to aliased memory elements, which is why the only &mut reference in these views is to memory that is fully known to be unaliased.

The Domain structure will produce bare atomic or Cell types in the alias condition. This is necessary in order to avoid production of &mut references which alias (as this is undefined in the Rust abstract machine, regardless of behavior), and safe because any other references to the same location will be similarly aliased and capable of handling external mutation.

LLVM Suboptimizations

LLVM considers a “racing read”, that is, any read from memory that could occur contemporaneously with an atomic write, to be undefined behavior. This is a reasonable view to take, but a pessimistic one. bitvec has information that cannot be expressed to LLVM about which bits of an element it will observe or modify, and bitvec is capable of guaranteeing that two distinct access points will not be able to interfere with each other electrically. LLVM does not know this, so it considers any write to memory to touch all bits of the touched element, and any read from memory to view all bits of the fetched element.

bitvec exclusively5 writes to contended memory with the Rust functions AtomicT::fetch_and and AtomicT::fetch_or, which are mapped to the LLVM instructions __atomic_fetch_and_N and __atomic_fetch_or_N. It uses bitmasks that bitvec can guarantee are non-intersecting, but this proof cannot be extended to even the Rust compiler, let alone LLVM. These bitmasks are applied to register values before using either of the fetch_op instructions, and after any reads that use AtomicT::load/__atomic_load_N.

If the bitvec mask proofs were extendible to LLVM, and LLVM were to expand its tracking of which bits of a memory address became poisoned by a memory write, and which bits of a fetched memory value became un-poisoned by a masking operation, then the compiler would be more able to observe that bitvec memory accesses do not observably interfere with each other. This observation would then define the behavior in the compiler’s memory model of racing writes/reads, and permit an increased (possibly even complete) removal of synchrony guards.

I am not aware of any processor hardware which fails to guarantee that all bits of memory are fully defined at the clock edges of all instructions that use the location. To the full extent my knowledge, all memory banks in all relevant processors have a stable bit-value at the start of a tick, when reads occur, and at the end of a tick, when writes commit. At no point does changing the value of one bit of a memory component affect the electrical value of other bits in the component.

This is not necessarily true of other storage devices, such as SSDs, but bitvec can only be used to access storage cells mapped in the RAM address space, which tend to all have this stability property.

Summary

The formal definition of the bitvec memory model extends the Rust mutable-exclusion rules by refining memory regions to have bit-precision instead of element-precision. The Rust compiler is only weakly capable of tracking the region status of individual bits, and only in compiler-internal structures. LLVM has a more robust arbitrary-bit-tracking capability, but similarly limits its interface to external code.

Barring any errors in the bitvec implementation, the bitvec memory model is fully sound in its behavior with regard to single-observer unsynchrony. Synchronization is only added in order to correctly interface with rustc and LLVM without causing either of them to introduce undefined behavior due to a lack of information.

Footnotes

1

all references to T where T is either !Sized, or mem::size_of::<T>() is non-zero, that is. A fun quirk of Rust’s first-class concept of zero-width types is that the only illegal value for a &Zst reference is null. Since there is nothing to load or store through a &Zst reference, the compiler doesn’t care what the reference value is, as it will never be used to perform memory access.

3

This is not absolutely true. Like we saw with UnsafeCell, the only immutable rule of compiler developers is that whenever they make an immutable rule, they also provide a way to sidestep it. If you freeze a poison, you are now free to read its value and act on it. LLVM just doesn’t make any guarantees about what you’ll see.

4

I’d feel a great deal more comfortable if I had firm knowledge of what those costs actually were. An atomic write always issues a lock instruction modifier on x86, and I have heard vastly different things about what that actually means, from “it’s free if no other cache holds that address” up to “it poisons the whole cacheline”, and have not had much luck producing a benchmark that firmly demonstrates that unneeded atomic access is a strict performance cost.

5

In multithreading environments. Disabling atomics also disables bitvec’s support for multithreading, so the penalty for aliasing is reduced to an inability to remove redundant reads.