Dedication

I began working on bitvec shortly before I was told that my father had been diagnosed with cancer for the third time. Developing the library gave me something into which to sink my attention and keep my mind from dwelling on his rapidly-progressing illness. I wrote the core pointer representation that enables bitvec’s modern behavior at his side, in the last days that he remained conscious.

I never had the chance to explain to my dad what I was building. By the time it was developed enough to be worth explaining, he had lost too much of his brain to understand me. More than anything I’ve done for my employers, bitvec is the work of which I’m most proud, and which I most wish he could have seen.

The bitvec project is dedicated to the memory of my father, George M. Payne, who inspired me to build quality work, supported my search for my talents, and taught me to labor for the benefit of others beyond myself.

Introduction

Programmers have sought to model memory as a one-dimensional stream of bits for as long as hardware manufacturers have sought to chunk it into wider and wider words. bitvec is far from the first library to provide this model, and it will likely not be the last. It is, however, among the best (at least in my opinion).

bitvec was built out of my experience and frustration with performing I/O buffer manipulation using C, C++, and Ruby. My work required programs capable of dynamically selecting a region of a bitstream, a task to which C-style bitfield declarations were unsuited, and it required those programs to be fast, which is not an adjective one commonly associates with Ruby engines.

Furthermore, my work involved message schemata that were permitted to select a bit ordering at the packet and field level. This is not a behavior that any existing bitstream library or language feature provides. These experiences informed my goals and design choices from the very beginning.

bitvec matches, and exceeds, the functionality of every other bitstream implementation I have found. It is also the only Rust crate that is a drop-in replacement for standard library types, and is able to do so without taking a performance loss. Thanks to excellent compiler engineering by the Rust and LLVM teams, it often produces code that exactly matches, and sometimes surpasses, the bit-shift/mask assembly logic you would write yourself.

Goals of This Book

I intend for this book to serve as an explanation of bitvec’s design choices and philosophy. It is not a detailed exploration of the crate APIs – docs.rs exists – but instead seeks to communicate how to think about bitvec so that you will know how to use the APIs offered.

The best way that I know how to communicate this information is as a dialogue between me, the author, and you, the user. Since this is a book, not a conversation, I actively encourage you to get in contact with me with any questions through the channels listed in the repository’s CONTRIBUTING document, and throughout the book I will periodically remind you that if a section is unclear, it is an error on my part, and I would appreciate an issue or other contact so that I can improve it.

Data Structures

bitvec’s primary exports are four data structures: BitSlice, BitArray, BitBox, and BitVec. These correspond to the [bool], [bool; N], Box<[bool]>, and Vec<bool> types in the Rust standard libraries. Unlike the Rust types, the bitvec types are not composable, and cannot be mixed with any other types, including pointers in the standard library.

The bitvec types implement the APIs of their standard-library counterparts to the fullest extent possible. The only missing feature is currently IndexMut<usize>, preventing collection[index] = bit; assignment. This means that, for users looking for compact usize => bool collections, substituting types in your project codebase ought to be enough to make your project Just Work™️.

It is the fact that BitSlice acts exactly like [bool], and can only be used through &BitSlice and &mut BitSlice references, that makes bitvec unique. No other Rust library has this capability.

Before we explore the data structures in more detail, there are three caveats I must provide:

bitvec uses an opaque, custom, pointer representation for everything except BitArray. You may not inspect or modify this pointer. You may not use it as a type or value parameter in any other types or functions. You will break your program if you try.

bitvec ensures that this pointer encoding does not fail the compiler and language requirements for reference correctness. The rules used to do so are internal to the crate, and will not be present outside it. bitvec pointers are perfectly safe to use, as long as you treat them as completely opaque and use only the interfaces provided.
These structures all have two type parameters, <T: BitStore, O: BitOrder>. These parameters are described in the next chapter. They govern the in-memory representation of data storage, but are not relevant to the general use of the handle types.
bitvec trades an increased memory space efficiency for decreased instruction size and speed efficiency. The compiler may optimize some of the costs away, but bitvec structures are not free to use. The “zero-cost” of the bitvec abstraction is that you cannot write a better bitset, and not that it is equal in runtime performance to an ordinary bool sequence.

Now that the disclaimers are over, we can talk about how the types actually work.

Slices

The base type of the project is BitSlice. This is a region type, like [bool], and cannot be held directly. Instead, it is accessed by borrowed references (&BitSlice, &mut BitSlice) or owning handles (BitArray, BitBox, BitVec). The distinction between the handles and the region is the same as it is in ordinary Rust types.

The BitSlice region is able to begin and end at any bit in memory, and is not restricted to having one edge aligned to the edge of its initial element. This restriction, present in all of its competitors, is removed through the use of a special encoding in all pointers to the region, which stores the starting bit of the base element in part of the slice pointer that describes the real memory.

There are eight bits to a byte on all systems Rust targets, and therefore the index of a bit within a byte is itself three bits wide. These bits are taken from the length counter of a slice pointer, which means that BitSlice is able to address only ⅛^th of the indices that [bool] can.

This is 64 Mebibytes on a 32-bit system, and 256 Pebibytes on a 64-bit system. If you can even allocate that much real memory in one handle, then you have very different operating conditions than I can help you with.

Getting a `BitSlice`

BitSlice is strictly a borrowed region. It can neither be created nor destroyed; rather, views to it are acquired from a memory buffer that some other binding owns.

The BitStore chapter covers this in more detail, but only slices of the unsigned integers u8, u16, u32, usize, and (on 64-bit targets) u64 can be used as the source memory for a BitSlice. (You can also use their Cell<> wrappers or atomic variants; this will be discussed later).

Borrowing Constructors

The simplest way to create a BitSlice reference is to borrow it from ordinary Rust data. The BitView trait, available in the prelude, implements methods on the supported unsigned integers, all arrays of them, and their slices.

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let byte = 0u8;
let bits = byte.view_bits::<LocalBits>();

let array = [0u16; 2];
let bits = array.view_bits::<Lsb0>();

let mut array = [0u32; 3];
let slice = &mut array[..];
let bits = slice.view_bits_mut::<Msb0>();
}

The .view_bits() and .view_bits_mut() methods take the other type parameter bitvec requires. This is described in the BitOrder chapter. Use Lsb0 until you have a specific need for a more precise parameter.

In addition, BitSlice offers constructor functions ::from_element(), ::from_slice(), and their _mut variants, which borrow elements and slices, respectively, and construct &/mut BitSlice references from them. The trait methods are generally easier, and certainly shorter to write, but they all do the same work.

Lastly, empty slices can be produced with the ::empty() or ::empty_mut() functions, since there is no &[] or &mut [] literal available.

Macro Constructor

In addition to these method constructors, you may also use the bits! constructor macro. This macro runs at compile-time to produce a buffer containing the correct data values, then borrows it as a BitSlice reference. It is a macro_rules! macro, not a procedural macro, and should not have a significant impact on your compilation times.

By default, the produced buffer is a temporary that the compiler will then extend to have the minimum lifetime of the produced reference handle. However, you can use the static keyword to cause the macro to produce a hidden and unnameable static BitArray backing buffer, which then provides the &'static BitSlice lifetime. Since this static buffer cannot be named, it is safe to use even when mutable, as the provided reference is the only handle to it.

The macro syntax extends that of vec!. The simplest invocations are sequences or repetitions of expressions, which can optionally be made mutable:

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let r = bits![0, 1, 0, 1];
let w = bits![mut 0, 1, 0, 1];

let r2 = bits![static 1; 4];
let w2 = bits![static mut 1; 4];
}

You are not required to use the literals 0 or 1; you can use any expression that is const-evaluable and can be placed into the expression expr != 0. This means that you cannot use the names of runtime let bindings, but can use the names of const bindings, or other literals. You probably do not want to do this, but you can.

In addition, you can specify the bit-ordering parameter and the integer storage parameter, for even more precise control over the memory layout. If you do not specify them, the macro uses the default parameters of usize storage and Lsb0 ordering.

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let in_bytes = bits![u8, LocalBits; 0, 1, 0, 1];
let in_shorts = bits![u16, Lsb0; 0, 1, 0, 1];
let in_ints = bits![mut u32, Msb0; 0; 4];
}

To summarize the macro rules:

If the first macro argument is mut, then the macro produces &mut BitSlice, otherwise it produces &BitSlice. You do not need to bind the name as mut unless you want to reässign it to a different slice.
You may then optionally provide the storage and ordering type parameters, followed by a semicolon. If you choose to add type parameters:
- You must provide the bit-ordering parameter.
- You may provide the storage parameter.
The data input to the macro is one of the two vec! token lists:
- One or more expressions that can be placed into $expr != 0, separated by commas. A trailing comma is permitted.
- A single expression that can be placed into $expr != 0, followed by a semicolon and a repetition counter. The resulting BitSlice will be counter bits long, all set to expression.

Emulation tests indicate that bitvec correctly instructs the compiler to produce suitable buffers even when compiling for a target with a different byte-endianness than the host. However, I have not actually performed such cross-compilation and testing with real hardware. It should be correct; please file an issue if it is not.

What `BitSlice` Can Do

Now that you have acquired a BitSlice reference, either by borrowing memory from elsewhere in your program or by creating a temporary, it is time to do some actual work with it.

… That `[bool]` Can

Everything¹. I am not going to rewrite the standard library’s slice documentation here.

… That `[bool]` Cannot

In addition to the standard library [bool] API, BitSlice offers some inherent methods tailored to its specialization.

Set Queries

The five query methods .any(), .all(), .not_any(), .not_all(), and .some() test how many bits in a region are set to 1. As you can guess from the names, these methods have the following truth table:

Slice	`any`	`all`	`not_any`	`not_all`	`some`
`00`	`false`	`false`	`true`	`true`	`false`
`01`	`true`	`false`	`false`	`true`	`true`
`11`	`true`	`true`	`false`	`false`	`false`

any is the Boolean OR operator; all is the Boolean AND operator, and some is the Boolean XOR operator.

In addition, .count_ones() and .count_zeros() count how many bits of the slice are set to one or zero, rather than merely indicating whether any exist. These methods are slower than the Boolean queries, which are capable of short-circuiting once satisfied. You can also use .iter_{ones,zeros}() to walk each index of bits with the specified value. These are equivalent to running .filter() and .enumerate() calls on iterators of bool, but are specialized to use dedicated bit-counting instructions where processors provide them.

Boolean Arithmetic

bitvec data structures all implement the Boolean operators (&, |, ^, and !) against each other.

In version 0, they allowed any impl Iterator<Item = bool>. This has been changed for performance reasons, since people never used the arbitrary iterator support but did require improved behavior when operating on two bit-slices.

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let mut or  =  bits![mut 0, 0, 1, 1];
        or |=  bits![    0, 1, 0, 1];
assert_eq!(or, bits![    0, 1, 1, 1]);

let mut and  =  bits![mut 0, 0, 1, 1];
        and &=  bits![    0, 1, 0, 1];
assert_eq!(and, bits![    0, 0, 0, 1]);

let mut xor  =  bits![mut 0, 0, 1, 1];
        xor ^=  bits![    0, 1, 0, 1];
assert_eq!(xor, bits![    0, 1, 1, 0]);

let mut not = bits![mut 0, 1];
        not = !not;
assert_eq!(not, bits![  1, 0]);
}

Writing To Memory

You can set all bits in a region to a new value by using the .fill() method, or you can set one bit in a region to a new value by using either the .set or .get_mut methods. .get_mut produces a proxy type which acts roughly like an &mut bool reference slot.

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let bits = bits![0; 4];
assert!(bits.not_any());

bits[0 .. 1].set_all(true);
assert!(bits[0]);

bits.set(1, true);
assert!(bits[1]);

*bits.get_mut(2).unwrap() = true;
assert!(bits[2]);

let mut bit = bits.get_mut(3).unwrap();
assert!(!bit);
*bit = true;
assert!(bits[3]);

assert!(bits.all());
}

The proxy type produced by .get_mut() implements DerefMut<Target = bool>, so you can assign into it and read out of it. However, it does not commit the value assigned into it back to its source BitSlice until it Drops.

You can force the destruction of a named proxy reference by using its .commit() method, which takes self by value, destroying it and releasing the borrow.

Viewing the Underlying Memory

The memory underlying any bit-slice region is subject to some restrictions about aliasing that are documented more thoroughly in the domain module and the Memory Model chapter. In short, borrowed BitSlice regions cannot view their underlying memory directly without violating aliasing rules established by either the Rust language or by bitvec itself. Instead, the .domain() and .domain_mut() methods provide views that correctly handle aliasing and edge conditions, and mediate access to the underlying memory.

The owning handles (BitArray, BitVec, and BitBox) do not have this limitation, as they can guarantee unique access to a memory location without any possibility of aliasing. As such, these types all have .as_raw_slice() and .as_raw_mut_slice() methods that provide ordinary slice views to their storage region.

Footnotes

Except write-assignment through indexing. I am not going to keep mentioning this exception.

Arrays

While BitSlice describes a region of borrowed data, BitArray provides a container that can hold and manage such a region.

It is most comparable to the C++ type std::bitset<N>. Unfortunately, the Rust support for type-level integers is still experimental, so it is unable to take the length of the BitSlice it contains as a type parameter. Instead, it must take the entire region it contains as a type parameter. The full type declaration is

use bitvec::prelude::*;
pub struct BitArray<
  A: BitViewSized,
  O: BitOrder,
> {
  _ord: PhantomData<O>,
  data: A,
}

As described in the previous chapter, the BitView trait is implemented on the unsigned integers, and on arrays of them. Currently, array support is limited to arrays up to and including 32 elements long; as Rust type-level integers mature, this will grow to include all arrays.

Once type-level computation stabilizes, BitArray will change to have the type parameters <T: BitStore, O: BitOrder, const N: usize>, matching the std::bitset<N> length parameter.

This array dereferences to a BitSlice region over its entire length. It does not currently permit shortening its BitSlice from either end. If this is a behavior you want, please file an issue.

Using a `BitArray`

The ::ZERO constant is a blank BitArray with its memory completely zeroed. The ::new() function wraps an existing element or array into a BitArray. In addition, the macro constructor bitarr! takes the exact same arguments as the bits! constructor, except that it returns an array directly rather than a reference to a buffer.

Furthermore, BitArray structures and references can be constructed from &BitSlice references using the TryFrom trait, just as arrays can be constructed in the standard library.

Once constructed, BitArray offers the .as_bitslice() and .as_mut_bitslice() explicit methods, as well as all the standard traits, to borrow its data as a BitSlice. The array has almost no functionality of its own, and serves only to own a region used as a BitSlice.

Once you are done using BitSlice to manipulate the array, you can remove the array with .into_inner() and regain the A memory within.

That’s everything that the array does! Like regular arrays, it is useful primarily for its ability to move memory through a program, and has essentially no behavior in its own right. It is useful for programs that do not have access to a dynamic allocator, and do not wish to use static buffers. However, if you do have access to an allocator, you will probably prefer to use BitVec instead.

Vectors

bitvec has two types that require compiling the crate with feature = "alloc" (and linking against the Rust distribution crate alloc): BitVec and BitBox. BitVec is a dynamically resizable vector, equivalent to the C++ type std::vector<bool> and the Rust type Vec<bool>. BitBox is a frozen vector, incapable of changing its length, equivalent to the Rust type Box<[bool]> or the C++ type std::unique_ptr<std::bitset<N>>.

Since BitBox is a vector that cannot .push() or .pop(), I will not discuss it in detail here. It is a heap-allocated BitSlice, and is otherwise uninteresting.

Getting a `BitVec`

BitVec implements the constructors shown in the standard library: ::new() creates a handle without any allocation, BitSlice implements the .to_bitvec() method, and and ToOwned trait, to copy a bit-slice into a new bit-vector. Additionally, the bitvec! macro takes all the bits! arguments and produces an owned allocation instead of a stack temporary. You can also construct a BitVec by .collecting any iterator of bools.

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let a: BitVec = BitVec::new();
let b = bits![0, 1, 0, 1].to_bitvec();
let c = bits![0; 4].clone();
let d = bits![u8, Msb0; 1; 4].to_owned();
let e = bitvec![0, 1, 1, 0];
let f = bitvec![u16, Msb0; 0; 4];
let g = [true, false, true, false]
  .iter() // &bool
  .copied() // bool
  .collect::<BitVec>();
}

Using a `BitVec`

Once constructed, BitVec implements the entire API that Vec does in the standard library. This remains uninteresting to write out.

Like BitSlice, BitVec and BitBox are implemented as stack handles that use the specially-encoded pointer to describe their region. This enables them to remain the same size as their standard-library counterparts, while making them completely opaque to inspection.

Because they are fully owned, BitVec and BitBox have some important behavioral differences from BitSlice. Primarily, because they do not have to worry about other handles viewing the memory under their control, they can modify the contents of their buffers outside the BitSlice that is considered live, and they do not exclude partial edge elements when viewing their buffer as raw memory. If you are using BitVec to construct an I/O buffer, these two facets can have surprising results if you are not careful to fully initialize a memory span.

Vectors, like slices, do not need to begin at the zeroth index of the base element. They can begin, and end, at any bit in any element. This will only happen when copying a vector from a source slice that was misaligned. The .force_align() method moves the vector’s live slice down to start at the zero index. Once done, extending the live slice to reach the last index of an element ensures that viewing the buffer as raw memory will have no uninitialized bits.

Type Parameters

bitvec uses type parameters to permit precise user control of its behavior and in-memory representation. The Rust generic system permits bitvec to have a more powerful and capable behavior than any other bitstream library yet implemented in any language.

All bitvec types take two type parameters. The first denotes the storage type being used: for everything but BitArray, this is an implementor of the BitStore trait, and denotes the integer component of an underlying slice; for BitArray, it is an implementor of BitViewSized, and is the entire storage block. The second is an implementor of BitOrder, and informs how the structure translates a semantic index into a memory access.

The combination of these two parameters governs how a bitvec type computes its accesses to memory.

The next two chapters describe each trait and their implementors in more detail. You may be able to skip them with this sentence as a good-enough summary:

Use <Lsb0, usize> as the parameters if you are implementing a collection and do not care about memory layout; if you are implementing an I/O protocol specification, the specification document will tell you what ordering and unit size it requires.

Rust syntax requires explicitly choosing type parameters when using generic expressions, such as BitVec::<Store, Order>::new(), and will not substitute in the default parameters when attempting to elide the parameters with BitVec::new(). However, Rust will use the default type parameters in patterns: let bv: BitVec = BitVec::new(); will use the default type parameters in the : BitVec type annotation, which then completes the type of the expression on the right side of the assignment =.

Bit Ordering

bitvec expects user code to count semantic indices, and hides the actual position of a bit within a memory element from users. This allows user code to treat indices as a uniform domain of integers in 0 ..= !0 >> 3, and not have to remember whether place - 1 means moving “forward” or “backward” in the buffer.

Internally, each *BitSlice pointer contains an element address and a bit index. The pointer uses its BitOrder type parameter to translate the bit index into a one-hot selector mask that drives actual memory access.

BitOrder is open to user implementation, and implementations of it are trusted to be sound in the bitvec memory model. For this reason, the trait is unsafe to implement. Most users will not implement it; almost all users want one of the two monotonic orderings provided for you. However, some specialized users may have an ordering of their own, and they are still able to encode that ordering away from their index logic.

Provided Orderings

bitvec provides two orderings: Lsb0 and Msb0. These each refer to which numeric bit in a register element is considered to be the zero-index.

You can think of these as corresponding to the little-endian and big-endian processor byte orderings if you like, as long as you remember that your choice of bit-ordering is not at all related to the byte-ordering of your target processor.

`Lsb0`

The Lsb0 type sets the zero index at the least significant bit of a register (numeric value $1$ ) and each successive index selects the next more significant bit in the register, until the most significant bit is at the final index.

It is the expression mask = 1 << index;.

`Msb0`

The Msb0 type sets the zero index at the most significant bit of a register (numeric value $2^{n - 1}$ for an n-bit register) and each successive index selects the next less significant bit in the register, until the least significant bit is at the final index.

It is the expression mask = (iN::MIN as uN) >> index;.

`LocalBits`

The type alias LocalBits refers to the ordering that your target’s C compiler will likely choose for its bitfield direction. This is Lsb0 on little-endian byte-ordered processors, and Msb0 on big-endian byte-ordered processors. Remember that the BitOrder implementations and processor byte orderings have no relation to each other! This is only a custom, not a requirement of the processor architecture.

Default Ordering Parameter

Lsb0 is the default type parameter used by the sequence types, as it produces selection masks using the starting value 1, which encodes to smaller instructions than the Msb0 starting value.

On AMD64, the pairs <u64, Lsb0> and <u64, Msb0> produce the following object code and disassembly:

ba 01 00 00 00  movl $1, %edx
48 d3 e2        shlq %cl, %rdx

48 ba 00 00 00 00 00 00 00 80  movabsq $-9223372036854775808, %rdx
48 d3 ea                       shrq    %cl, %rdx

The Msb0 load requires an additional four bytes of zeros in its immediate, and the 64-bit modifier prefix (0x48), in order to encode movabsq instead of movl

Implementing `BitOrder`

As stated above, this is an unsafe trait because bitvec relies on its implementations to uphold certain mathematical invariants. Failure will result in memory unsafety and/or program crashes.

BitOrder has three functions: at, select, and mask. Implementors are required to provide at; select and mask have default implementations that rely on it. These functions are all generic over the BitMemory trait; this trait describes the register types (unsigned integers) that can be used by bitvec. It provides some useful associated constants, and is otherwise uninteresting.

::at() receives the semantic index of a bit within a register type, and must produce the concrete position corresponding to the semantic index. The input and output are both integers in the domain [0, W) where W is the bit-width of the register type being indexed.

at must implement an exactly one-to-one mapping from all inputs to all outputs in the [0, W) domain. This mapping must never observably change. These are strict requirements of the library, and failing to uphold either will break your program.
::select() receives the semantic index of a bit within a register type, and must produce a value of that register type with exactly one bit set. The produced value is a mask that selects only the bit specified by the provided index, and will be used in Boolean arithmetic to manipulate memory.

The default implementation is 1 << at(index). You are required to maintain this behavior in your override.
::mask() receives an inclusive start index and an exclusive end index, and must produce a register value that selects every bit in the indexed range.

The default implementation is (start .. end).map(select).sum(). You are required to maintain this behavior in your override.

Support Types

The BitOrder trait APIs use supporting types to enforce requirements on the bare numbers being passed through it. These types are documented in the index module. They all provide a .value() method that removes the wrapper and yields the inner number, and a ::new() constructor that ensures that values to be wrapped uphold the type’s requirements.

at and select receive a BitIdx<M> argument. This is a wrapper over u8 that ensures that the contained value is in the domain 0 .. M::BITS. It serves to indicate that a number is a semantic counter, not an electrical position.
at returns a BitPos<M> value. This has the same behavior and restrictions as BitIdx<M>, and indicates that the number is an electrical position within a register.
select returns a BitSel<M> value. This wraps an M register value, and ensures that the contained number is a power of two – exactly one bit is set, and all others are zero. This type indicates that the mask is guaranteed to select exactly one bit in a register.
mask receives an inclusive BitIdx<M> and an exclusive BitEnd<M> argument, and returns a BitMask<M> value. BitEnd<M> ensures that the contained number is in the domain 0 ..= M::BITS, including the final count, and marks a one-past-the-end exclusive boundary. BitMask<M> marks that the contained number may select any number of bits in a register.

New implementors of BitOrder will use these types to satisfy behavior requirements individually.

In addition, implementors’ test suites should call the function order::verify_for_type::<O, M>() to check that an implementation O satisfies the behavior requirements for a particular register type M, or call order::verify::<O>() to check that an implementation satisfies the behavior requirements for all supported register types. These functions are conditional upon cfg(test), and accept a verbose: bool parameter that allows the test functions to print diagnostics to stdout during evaluation.

If the verification functions panic, your implementation is incorrect, and cannot be safely used in bitvec.

BitStore

The BitStore trait governs the processor behaviors used to interact with the memory of a BitSlice buffer. These include both the width of the processor register used to contain a memory element, and the load/store instructions the processor uses to move data across the memory bus.

BitStore has no behavior of its own, and serves only to collect associated types and constants. It cannot be implemented outside bitvec, and is a closed portion of the API. You can freely use the trait as a bound in types that contain bitvec structures, but should not otherwise attempt to make use of it.

Implementations

BitStore is implemented on all unsigned integer types not larger than a target processor’s word size, all Cell<T> wrappers of them (as Cell is a compiler directive), and their AtomicT variants.

Not all processors have atomic instructions for all their scalar registers. The compiler maintains a list of all atomics available on all supported targets, and exposes this list as unstable attributes in the standard library, unavailable to user code such as bitvec. bitvec uses the radium crate (which I also maintain) to provide information about atomic support for a register width, and radium maintains a best-effort guess at what atomics are available.

On architectures with missing atomics, bitvec’s default feature set will cause a compiler error when you attempt to instantiate a bitvec structure with the register that is missing an atomic variant. You can fix this by using a narrower register that does have atomic instructions, or by disabling default-features and not enabling the "atomic" feature. Please file an issue with radium with your target and failing type, so that we can improve our precision.

Associated Types

The Mem associated type names the scalar integer corresponding to the BitStore type. u8, Cell<u8>, and AtomicU8 all implement BitStore with their Mem type assigned as u8; the same is true for the wider registers.

This type is used to create selection masks in the processor and permit access to unaliased memory.

The Access associated type names the type used to implement memory access. The BitAccess trait is an internal bridge to Radium that allows a consistent memory API, regardless of instructions used. All reads from and writes to memory route through this association and trait.

Lastly, the Alias associated type enables bitvec to gracefully and correctly handle events that cause multiple handles to alias the same memory address. This association is used in .split_at_mut() to select the alias-aware type used for all subsequent accesses.

The Mem and Alias types are exposed in public APIs according to local alias information. The Access type is never publicly exposed, and only used for code generation.

Practical Use

That’s enough theory; let’s talk about how to use the crate. This chapter is divided into two subsections, one for use cases that want an ordinary bool collection library with transparent memory compaction, and one for use cases that want a convenient way to precisely sculpt memory. Before getting in to either, let’s quickly recap how the bitvec types interact with memory ownership.

Rustic Memory Management

The first and most important thing to remember is that, despite the extra complexity just discussed about memory parameters and aliasing behavior, bitvec is just Rust. It obeys all of the rules and patterns that the rest of Rust does.

BitSlices, like regular slices, are exclusively borrowed from owned memory higher up the stack. Their references can be passed down the stack, and are subject to the same lifetime and mutation-exclusivity rules.

The owned memory that creates a BitSlice view can be an array, boxed slice, or vector of either ordinary integers, or their wrapper equivalents provided by bitvec:

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let array = [0u8; 8];
let boxed: Box<[u16]> = Box::new([0u16; 4]);
let vec = vec![0u32; 2];

let bits_a = bitarr![u8, Msb0; 0; 64];
let bits_b = bitbox![u16, Lsb0; 0; 64];
let bits_v = bitvec![u32, LocalBits; 0; 32];
}

Once memory is bound, it can be borrowed as a BitSlice by using the BitView trait (imported in the prelude), or by using the fact that all bitvec containers borrow themselves as BitSlices just like standard-library containers borrow themselves as slices:

#![allow(unused)]
fn main() {
use bitvec::prelude::*;
let array = [0u8; 8];
let boxed: Box<[u16]> = Box::new([0u16; 4]);
let vec = vec![0u32; 2];
let bits_a = bitarr![u8, Msb0; 0; 64];
let bits_b = bitbox![u16, Lsb0; 0; 64];
let bits_v = bitvec![u32, LocalBits; 0; 32];
let array_bits = array.view_bits::<Msb0>();
let boxed_bits = boxed.view_bits::<Lsb0>();
let vec_bits = vec.view_bits::<LocalBits>();

let bits_a_ref: &BitSlice<_, _> = &bits_a;
let bits_b_ref: &BitSlice<_, _> = &bits_b;
let bits_c_ref: &BitSlice<_, _> = &bits_c;
}

And, of course, mutability applies:

#![allow(unused)]
fn main() {
let mut arr = bitarr![0; 10];
let arr_ref: &mut BitSlice = &mut arr;
arr_ref.set(1, true);
assert!(arr_ref[1]);
}

Just as with ordinary Rust code, BitSlice is the type you want to use when working with memory that you are not responsible for moving around or destroying. However, when you do need to move memory around, you need to switch to either a static binding or a container: BitArray is always available, and BitBox and BitVec require an allocator.

I am spending this much time discussing the Rust memory management system because this is a very common question I receive. No other bit-stream crate has reference types, and therefore users coming from, e.g., bit-vec see the same BitSlice name and attempt to use their habits from that crate.

bitvec is not like any other bitstream library, in Rust, C++, or another language. bitvec is like ordinary Rust. I promise.

Bit Collections

As discussed in the Type Parameters chapter, you should use usize as the BitStore parameter for optimal performance in the generated program.

Once you have created some memory that you can view as individual bits, it is time to actually use it. Here is the one-sentence summary of what bitvec can do:

Every stable API present in the standard library is replicated in bitvec, except for BitSlice<T, O> as IndexMut<usize>, because bitvec cannot produce &mut bool.

If you were using ordinary collections to manage sequences of bools, then every part of your code will continue to work on bitvec types except for the array assignment slice[index] = value;. Until and unless the IndexMut trait is reshaped, you will need to use one of these two alternatives: slice.set(index, value); or *slice.get_mut(index).unwrap() = value;

Subslicing works:

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let bits = bits![0, 0, 0, 0, 1, 1, 1, 1];
assert!(bits[.. 4].not_any());
assert!(bits[4 ..].all());
}

Incremental munching works:

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let mut bits = bits![0, 0, 1, 1, 1, 0, 0, 0];
//  ^^^ modify the slice handle, not the slice contents

while let Some((&false, rest)) = bits.split_first() {
  bits = rest;
}
assert_eq!(bits, bits![1, 1, 1, 0, 0, 0]);

while let Some((&false, rest)) = bits.split_last() {
  bits = rest;
}
assert_eq!(bits, bits![1; 3]);
}

Mutation works:

#![allow(unused)]
fn main() {
use bitvec::prelude::*;
use std::{iter, thread};

let bits: &'static mut BitSlice = bits![mut 0; 8];

{
  let (left, right) = bits.split_at_mut(4);

  //  Pretend that better work is happening here
  let a = thread::spawn(|| left |= iter::repeat(true));
  let b = thread::spawn(|| right ^= iter::repeat(true));

  a.join().unwrap();
  b.join().unwrap();
}

assert_eq!(bits, bits![1; 8]);
}

Everything you can do with a slice, an array, or a vector of bits, you can do with bitvec’s equivalent types. Except for IndexMut<usize>. The only change from the standard library types is that you are now guaranteed to use one bit of storage for each bit of information, rather than eight bits of storage per bit.

Author’s note: Other than bragging about bitvec’s API fidelity, I don’t think this section is very useful or educational. If you want to read more about how to use bitvec for usize => bool collections, please let me know and I will expound!

Bitfields

bitvec’s more technically-interesting capability is that it provides load/store memory access behaviors that allow you to write values into, and read them back out of, any BitSlice in memory rather than being constrained to well-formed references to well-typed memory.

This is useful for the de/construction of packed memory buffers, such as transporting data through I/O protocols.

AUTHOR’S NOTE: If you are using bitvec to do anything related to the underlying memory representation of a bit-buffer, you must read this chapter, and all of the API docs of bitvec::field and its contents, in their entirety.

I have written extensively, and yet still insufficiently, about the intricacies involved in operating the BitField trait correctly. If you skim this documentation, you will have unexpected behavior, you will get frustrated with me for writing a bad library, you will file an issue about it, and I will probably tell you that the behavior is correct and that I already addressed it in the documentation.

It took me a long time to think about and a long time to write. It should take you also a long time to read and a long time to think about.

All of this behavior is contained in the BitField trait. Let’s explore that:

#![allow(unused)]
fn main() {
//  bitvec::field

pub trait BitField {
  fn load<M>(&self) -> M;
  fn store<M>(&mut self, value: M);
}

impl<T> BitField for BitSlice<T, Lsb0> {
  fn load<M>(&self) -> M { /* snip */ }
  fn store<M>(&mut self, value: M) { /* snip */ }
}

impl<T> BitField for BitSlice<T, Msb0> {
  fn load<M>(&self) -> M { /* snip */ }
  fn store<M>(&mut self, value: M) { /* snip */ }
}
}

The actual trait is more complex than this, and will be visited later. The important part, right now, is that BitField allows you to load values out of BitSlices and store values into them. Furthermore, it is implemented specifically on BitSlices that use the bit orderings provided by bitvec, and is not generic over all orderings.

While bitvec could in theory provide a default implementation for all <O: BitOrder>, this would by necessity have the most pessimal possible performance, and the lack of specialization for overlapping trait implementations means that faster performance can never be written.

The downside of the two specific implementations is that Rust coherence rules forbid implementation of a bitvec trait, on a bitvec type, parameterized with a local, but non-bitvec, ordering type. On the off chance that you find yourself writing a new BitOrder implementor, file an issue.

The M type parameter on the load and store methods is bounded by funty’s Integral, trait. It can store any unsigned or signed integer at any partial width. This parameterization allows you to combine any integer type for transfer with any integer type for storage, rather than being restricted to only transferring T data into and out of a BitSlice<T, _>.

Unfortunately, adding a second integer type parameter is not the only complication to the BitStore memory model. There is also a second dimension of segment ordering. bitvec tries to make explicitly clear that the Lsb0 and Msb0 types refer only to the ordering of bits within registers, and not to the ordering of bytes within registers. However, when the register being stored or stored does not fit within one register of the storage BitSlice, it must be split into multiple segments, and those segments must somehow be ordered in memory.

Segment Orderings

Author’s Note: READ THIS. I have received several issues about this exact concept. It is not obvious.

There are two segment orderings: little-endian and big-endian. You may select the segment endianness you prefer by using the _le or _be suffix, respectively, on the .load() and .store() methods. The unsuffixed method is an alias for the endianness of your processor: _be on big-endian targets, and _le on little-endian. This is a convenience only. If you are writing I/O buffers, you should really use the explicitly-named methods.

Let us imagine a BitSlice<u8, Lsb0> used to store a u16 that is misaligned, and thus stored in three successive bytes. This algorithm is true for all circumstances where the stored region occupies more than one register of the backing slice, but smaller examples are simpler to draw.

This diagram uses 0 to refer to the least significant bit, and 7 to refer to the most significant bit. The first row shows bytes of memory, the second row shows the bit indices in memory used by .store_le(), and the third row shows the bit indices in memory used by .store_be().

[ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ]
  3 2 1 0             b a 9 8 7 6 5 4             f e d c
  f e d c             b a 9 8 7 6 5 4             3 2 1 0

.store_le() places the least significant segment in the low address, while .store_be() places the most significant segment in the low address. The ordering of bits within a segment is always preserved, no matter which ordering parameter is used by the BitSlice.

Here is the same example, but using the Msb0 bit ordering instead. Again, the second row uses .store_le(), and the third row uses .store_be().

[ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ]
          3 2 1 0     b a 9 8 7 6 5 4     f e d c
          f e d c     b a 9 8 7 6 5 4     3 2 1 0

The only change is in how the segments are placed into memory. The ordering of bits within a segment never changes, and is always the processor’s significance order as implemented in hardware.

How to Use `BitField`

You will probably find real use of the BitField trait more educational than the previous section. It has a very straightforward API, that you can combine with println!-debugging or your favorite means of viewing memory in order to observe its actions.

Step one: create any BitSlice-capable buffer. This can be any of the Rust-native sequence types, or any of the bitvec types.

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let mut data = [0u8; 4];
let bits = data.view_bits_mut::<Msb0>();
}

Then, narrow the BitSlice to be the region you want to access as storage. It must be no wider than the integer type you are transferring: BitSlices outside the domain 1 ..= M::BITS will panic during .load() or .store(). The easiest way to narrow a BitSlice (or buffer type that dereferences to it) is by using range indexing, [start .. end].

#![allow(unused)]
fn main() {
use bitvec::prelude::*;
let bits = bits![mut u8, Msb0; 0; 32];
bits[10 ..][.. 13].store_be::<u16>(0x765);
assert_eq!(bits[10 .. 23].load_be::<u16>(), 0x765);

bits[10 .. 23].store_le::<u16>(0x432);
assert_eq!(bits[10 .. 23].load_le::<u16>(), 0x432);
}

That’s the entire API. .store() truncates the stored value to the width of the receiving BitSlice, and .load() zero-extends the loaded value to the width of the destination register type.

You can see an example that uses the BitField trait to implement an I/O protocol in the examples/ipv4.rs program in the repository. Use cargo run --example ipv4 to see it in action.

Runtime Performance

bitvec increases the instruction cost of each access to a bool in its data structures. This is an inevitable consequence of the fact that, even on architectures that have them, compilers typically do not emit object code instructions that access individual bits within a memory location. Therefore, each access in bitvec has, in addition to a memory operation, one or two shift instructions one or two AND/OR/NOT instructions.

This means that, inevitably, bitvec is slower in CPU time than [bool] is. Measurements indicate roughly a factor of ten, but with also about 10x more variance. However, this cost is only apparent and meaningful when walking the entirety of very large buffers, and tends to fade into noise on smaller buffers, or be obviated by compile-time fixed accesses. As always, try it on a representative workload.

Benchmarking

I have tried (with admittedly low priority) to have some benchmarks in the project to back up my claims that bitvec is fast. This has been difficult to maintain for a few reasons, but I have at least a few that have stayed present in order to demonstrate important claims, such as showing that specialization for matching types does provide massive performance benefits (it does).

In particular, LLVM is very good at propagating compile-time constants through the code, and bitvec strives to maintain an internal implementation that is easily accessible to optimizers. This means that basically any benchmark that takes input from a source file that the compiler can see gets artificially solved during codegen.

For instance, I don’t know how long it takes to construct a &BitSlice view over memory, because my benchmarks report 0ns: LLVM computes my pointer encoding at compile time, and a consequence of the way I designed my pointer encoding is that the only operation BitSlice::from_slice actually performs is .len << 3. When LLVM can see the original length, it just does this itself, and emits an immediate with the correct length instead of the constructor call.

Other constructor benchmarks are only showing me the time required to run memcpy, and arbitrary indexing just shows the time required to run three instructions, because LLVM solved the shift/mask arguments ahead of time.

The important takeäway here is that if your code is at all dependent on constants that the compiler can see, and is not exclusively performing indexing based on runtime inputs, then bitvec is going to be plenty fast.

Specialization

bitvec strives to use its knowledge of the underlying memory representation wherever possible. This means that operations between BitSlices with the same type parameters can rely on an identical representation and use integer behavior rather than walking each bit individually.

Try to do this wherever possible, especially in performance-sensitive code. You typically should not be mixing bitvec structures with different type parameters anyway: use the representation that serves your needs, and keep using it in all buffers that interact with it.

As an example, as of 1.0, walking two large BitSlices with incompatible type parameters takes 3µs (microseconds), while walking the same sizes with identical type parameters takes 100ns (nanoseconds). It’s roughly a 32x performance difference, which is only half the speedup that I expected using usize on a 64-bit machine, but still quite stark.

Memory Representation

As discussed in the Type Parameters chapter, bitvec allows users to select the specific ordering of bits within a memory element when constructing a type. This has consequences for how source code translates to an in-memory representation.

To review: bitvec provides two orderings of bits within a single memory element (Lsb0 and Msb0) and three or four types of memory elements (u8, u16, u32, and only on systems where it is 8-byte-aligned, u64). The usize type is also supported, but it is not portable, and behaves exactly as the named register of its width.

The Cell and atomic integer variants are not interesting here, as they only affect how the memory bus operates, not the processor register.

Let us now examine how each possible combination of register width, bit-ordering, and processor byte endianness affects the placement of bits in memory.

Memory Layout

The BitOrder and BitStore traits combine with your target architecture’s byte-ordering of register elements to create a matrix of memory traversals. This matrix informs what is the appropriate choice of parameters to use for your program whenever you are using bitvec for precise memory control rather than solely for a compact usize-to-bool data collection.

The tables below list bytes of a memory address space with addresses increasing to the right, and the bits within those bytes with numeric significance decreasing to the right. This is the ordering used in most debug-printing of memory, so hopefully the table contents should match up with your prior experience viewing memory bytes.

The L and M indicate the Lsb0 and Msb0 ordering parameters, respectively, and xx indicates that the row matches all register types. Within each row, traversal begins at zero and follows the arrows to each successive step. Boundaries between registers are marked with a column; boundaries between bytes within the same register are marked with a space.

Little-Endian Byte-Ordered Machines

On little-endian machines, the least-significant byte of a register type is stored at the lowest memory address, and each byte higher is one step more numerically significant than the last.

byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
bit  ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
 Lxx ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
─────╫─────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────
 M8  ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F
 M16 ║ 2 ---> 3 0 ---> 1│6 ---> 7 4 ---> 5│A ---> B 8 ---> 9│E ---> F C ---> D
 M32 ║ 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1│E ---> F C ---> D A ---> B 8 ---> 9
 M64 ║ E ---> F C ---> D A ---> B 8 ---> 9 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1

Big-Endian Byte-Ordered Machines

On big-endian machines, the most-significant byte of a register type is stored at the lowest memory address, and each byte higher is one step less numerically significant than the last.

byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
bit  ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
 L8  ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
 L16 ║ 3 <--- 2 1 <--- 0│7 <--- 6 5 <--- 4│B <--- A 9 <--- 8│F <--- E D <--- C
 L32 ║ 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0│F <--- E D <--- C B <--- A 9 <--- 8
 L64 ║ F <--- E D <--- C B <--- A 9 <--- 8 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0
─────╫─────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────
 Mxx ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F

If you need to care about the memory representation, then you most likely want to use the <u8, Msb0> pair. This provides a consistent ordering on all machines, and the numeric value of the underlying memory will probably match your expectations about the semantic contents of a data structure.

This chapter, and the BitField trait, are the two most common sources of questions about how bitvec operates. Their intersection is even more complex, and the layout of numeric integers stored into a BitSlice is an extremely common point of confusion.

Read these chapters and the API documentation thoroughly, and experiment with placing data into memory and changing the type parameters to observe their effects on buffer representation.

`bitvec` Memory Model

bitvec addresses individual bits, while computer hardware addresses register elements. As a result, bitvec has a more complex memory model than the Rust language does. The library implementation strives to satisfy users’ expectations, the Rust language’s rules, and performance in the produced artifact to the best solution for all parties, with as few compromises as possible. Unfortunately, this has the side effect of increasing the complexity of the codebase, both for maintainers and for readers.

This chapter explains the abstract concepts of the bitvec memory model and the means by which it is encoded in the Rust language without running afoul of Rust or LLVM rules that threaten undefined behavior.

The bitvec memory model is typed entirely within the store module’s BitStore trait definition and implementations. It utilizes the access module’s BitAccess trait to mediate memory access events through types that are known to be correct in the Rust and LLVM models, so that at any point in program execution, memory access will be consistent and sound.

In addition, the domain module’s types provide views which manipulate the store model to maximize performance. This chapter will discuss primarily store, and use domain only to provide examples of how store is used in practice to accomplish the library’s implementation.

Aliasing

To Rust and LLVM, “aliasing” occurs whenever there are two paths to a memory region, and at least one of them has write privileges. In Rust, this is represented by the &mut reference exclusion rule: it is always, always, always Undefined Behavior in the Rust memory model to have two references which can reach a memory element if at least one of them is marked &mut.

LLVM, which was created for, is written in, and culturally shaped by C++, takes a similar view with its noalias annotation, but struggles to enforce it as thoroughly as Rust does.

bitvec takes a similar view of the abstract meaning of the &mut reference type, but where the Rust memory model focuses on whole units T, and has no concept of subdivision from T into the bits that compose it, bitvec views each individual bit as a standalone, independent, atom of memory. It excludes the creation of two &mut BitSlice reference handles that are capable of viewing the same bit, but will happily produce two &mut BitSlice handles which are mutually-exclusive in bits, but reference the same location in memory.

Here we come to the first problem with the conflicting memory models: bitvec cannot ever create an &mut T through which it may write to memory, because it has no way of ensuring that no other &T or &mut T reference exists which is capable of viewing the memory region into which it writes.

Rust Shared Mutation

The problem isn’t just that the Rust standard library doesn’t offer any non-unsafe APIs to produce such references. The problem is that this is about the most illegal thing you can do in the eyes of the Rust compiler, and if it ever detects this transgression, it has full liberty to either reject, or worse, accept and miscompile, your program.

In Gankra’s immortal words on the topic of attempting to sneak past the compiler’s commandments,

Transmuting an & to &mut is always UB

No you can’t do it

No you’re not special

The solution is simple: Rust exposes a type, UnsafeCell, which is the axiomatic type for mutation through shared views. In the Rust memory model, any reference to a region of memory marked with UnsafeCell is permitted to write to it, and if any other references can view that region, then it is up to them to ensure consistent behavior.

This is, of course, not quite true in LLVM’s memory model, but we are not there yet.

Since an &mut BitSlice<T, _> handle cannot produce an &mut T reference to perform writes to memory, it must instead either use a *mut T bare pointer, which has absolutely no checks or optimizations whatsoëver, or use an &UnsafeCell<T> shared reference, which has all the usual guarantees present on all¹ reference types.

All UnsafeCell does is instruct the Rust compiler to politely look the other way about your program’s memory accesses. It is somewhat like the C keyword volatile, in that the compiler no longer believes that reads are stateless, or freely reörderable, but entirely unlike that keyword in that the compiler doesn’t have any obligation to keep your reads from or writes to such regions.

Rust provides an additional type called Cell. This is a wrapper over UnsafeCell² that defines a more useful API, including the only safe and guaranteed way to write into memory through a shared reference: Cell::set.

And voilà: &mut BitSlice<T, _> simply constructs &Cell<T> when writing, the Rust compiler does not see a violation of the &mut exclusion principle, and we are done.

LLVM Shared Mutation

No we’re not. If it was that easy, there wouldn’t be a trait system or this document dedicated specifically to dealing with this problem.

Cell is not thread-safe, because Cell does not modify the instructions used to access memory. It produces ordinary load and store instructions, carefree and ignorant of the bane of everyone who wants consistency and the delight of everyone who wants performance: concurrency.

Just as it is undefined behavior in Rust to manifest two &mut references that can view the same location, it is equally undefined behavior in LLVM to manifest two pointers into memory that ever, at all, no matter what, perform any memory access to the same location, at the same time, on multiple executors.

As with above:

Unsynchronized writes are always UB.

No you can’t do it.

No you’re not special.

LLVM has an even more insidious punishment for this transgression that Rust does not directly express: unsynchronized reads from a data race produce poison. Poison is a nifty concept, because it’s not illegal to obtain one. When LLVM gives you a poison, your program continues undergoing compilation as if nothing had happened. You can pass it around. You can write to it, and if you destroy it before reading, you’re fine.

As soon as you attempt to read the bit-wise value of poison, your program is undefined³.

So if bitvec wants to be threadsafe, which it does, and it wants to insist on its ability to safely alias the same memory location from multiple handles, which is non-negotiable, there’s only one avenue left to take.

Atomic Powered Microscopes

Rust doesn’t actually model atomics. It doesn’t have to do so: no harm can ever come from multiple handles reading out of the same immutable location, harm can only occur when writes are observable, and writes are not observable due to the &mut exclusion rule. Well, except for UnsafeCell, so everything that has an UnsafeCell in it gets marked as !Send, & references can’t cross threads, and the whole problem is avoided.

This is, of course, not good enough. Concurrent, mutable, access to a location is an important property in a computer. LLVM provides atomic types, which Rust transparently exports as wrappers over UnsafeCell that have their Sync implementation restored. Handles to a region marked as atomic will use some form of hardware-provided exclusion in order to preserve the one-writer-XOR-any-readers system behavior, and all is well.

Hardware-level exclusion has the unfortunate penalty of being, to put it lightly, “slow”. So while it’s the safest choice to be correct, and was in fact the default universal choice for all memory access in bitvec for some time, its costs⁴ called for a more prudent behavior.

It’s time for a new trick, something that the Rust compiler does at the region level, and that bitvec now does at the element level:

Run Time Alias Analysis

BitSlice handles can only be constructed from ordinary Rust references to memory, which Rust guarantees start out unaliased. The potential for aliasing only occurs when a &mut BitSlice is split into multiple subslices using any of the functions that eventually call .split_at_unchecked_mut(). Since that is the root function that introduces alias conditions, it returns subslices whose memory type parameters are tainted with the ::Alias marker. It has the following type signature:

#![allow(unused)]
fn main() {
impl<T, O> BitSlice<T, O>
where O: BitOrder, T: BitStore {
  pub fn split_at_unchecked_mut(&mut self, at: usize) -> (
    &mut BitSlice<T::Alias, O>,
    &mut BitSlice<T::Alias, O>,
  );
}
}

The BitStore trait defines an ::Alias associated type, which ensures that all memory accesses through it have appropriate aliasing guards. For builds which do not use the atomic feature (or where the target does not have an atomic variant of the integer), this is Cell, and its protections are the loss of thread-safety. For builds that do permit atomics, the marker enforces that all reads and writes use atomic instructions.

The ::Alias marker is applied, at compile time, by operations that split &mut BitSlice references into multiple coëxisting subslices. This is a good first step to reducing unnecessary synchrony, but not good enough. Consider the following:

#![allow(unused)]
fn main() {
use bitvec::prelude::*;

let mut data = [0u8; 2];
let bits = data.view_bits_mut::<Lsb0>();
let (one, two) = data.split_at_mut(8);
}

It so happens that this is equivalent to splitting data first, then viewing each subslice as bits, but this transformation can only be done if the partition point is known at compile-time to fall on an element boundary. There is no need for the subslices one and two to use alias-safe operations, as accesses to memory through them do not conflict with each other.

Bit Slice Domains

The in-memory domain of any bit slice can be generalized to one of two formats: either the slice touches zero edge-bits (0 or T::BITS - 1), or it touches at edge-bit in at least one element. Consider three bytes of memory (any element will do, but the extra width on this page is unnecessary), with some bitslice regions drawn within them:

|00000000│11111111│22222222| Element
|76543210│76543210│76543210│ Bit
├────────┼────────┼────────┤
┆        ┆        ┆        ┆ 1
┆        ╞═════╡  ┆        ┆ 2
┆        ┆ ╞════╡ ┆        ┆ 3
┆        ┆  ╞═════╡        ┆ 4
┆   ╞═════════╡   ┆        ┆ 5
┆    ╞════════════╡        ┆ 6
┆        ╞═════════════╡   ┆ 7
┆     ╞═════════════════╡  ┆ 8
╞══════════════════════════╡ 9

There are nine example slices here, but they can be reduced into six specific categories, and two general ones:

empty: row 1
partially-spanning tail, no body: row 2
minor (interior of an element, no edge indices): row 3
partially-spanning head, no body: rows 4
major (partial head, partial tail, no body): row 5
partially-spanning head, fully-spanning body: row 6
partially-spanning tail, fully-spanning body: row 7
major (partial head, partial tail, full body): row 8
spanning: row 9

The minor slice (row 3) is irreducible; the rest can all be divided into three subcomponents:

zero or one partially-occupied head element, where the slice touches the last index in it but not the first
zero or more fully-occupied middle elements, where the slice touches all indices in each
zero or one partially-occupied tail element, where the slice touches the first index in it but not the last

We can break each row down into these components:

Row	Head	Body	Tail
1	None	0	None
2	None	0	Some
4	Some	0	None
5	Some	0	Some
6	Some	1	None
7	None	1	Some
8	Some	1	Some
9	None	3	None

We can observe that where a slice fully-spans some elements, those elements cannot be mutated by any other reference. In the &BitSlice case, all &muts are forbidden by the compiler’s ordinary rules; in the &mut BitSlice case, bitvec’s obedience to those same rules forbids any other handle from observing the bits covered by the &mut BitSlice. As such, it is statically impossible for any alias to exist to the described memory in row 9, or for any alias to observe element 1 of rows 6 through 8.

bitvec happily permits element-aliasing &mut BitSlice references to observe the partially-filled elements in the outer columns, and middle column of rows 2 through 5, so writes to them must remain synchronized through either single-threaded Cell or concurrency-safe atomics. These domain components can be calculated from the three components of a slice pointer: the base address, the starting bit index, and the bit count.

This is expressed in the domain module’s twt enums.

`BitDomain`

This enum splits any BitSlice region into its subcomponents, immutably or mutably, respectively. It has the (rough) definition

#![allow(unused)]
fn main() {
pub enum BitDomain<'a, M, T, O>
where
  M: Mutability,
  T: BitStore,
  O: BitOrder,
{
  Enclave(&'a /* mut */ BitSlice<T, O>),
  Region {
    head: &'a /* mut */ BitSlice<T, O>,
    body: &'a /* mut */ BitSlice<T::Unalias, O>,
    tail: &'a /* mut */ BitSlice<T, O>,
  },
}
}

and, rather than granting direct memory access, merely removes any aliasing markers from as much memory as possible. The subslices that partially fill their base element do not need to add an additional aliasing marker, as the marker is only required when writes to the element may collide. If the slice is immutable, aliasing never occurs, so synchrony is never required. If the slice is mutable, then the only way to get a partial edge slice is to either forget about some bits from the main slice, which is not an alias event, or to split the slice, which is, and splitting already marks the alias.

`Domain`

The bit domains are still bit slices, and do not offer a way to access the backing memory. For operations where raw memory access is required, this enum produces the same domain definitions, but typed for the bare elements rather than their bits.

It has the (rough) definition

#![allow(unused)]
fn main() {
pub enum Domain<'a, M, T>
where
  M: Mutability,
  T: BitStore,
{
  Enclave(PartialElement<&'a T>),
  Region {
    head: Option<PartialElement<&'a T>>,
    body: &'a /* mut */ [T::Unalias],
    tail: Option<PartialElement<&'a T>>,
  },
}
}

(The PartialElement type prevents accessing bits that do not belong to the originating BitSlice.)

As with the bit domains, these domains will inherit any aliasing markers from their source bitslice. The ::Alias associated type enables the mutable domain to produce references that allow mutation without adding an unnecessary aliasing marker. Rust strongly forbids the production of &mut references to aliased memory elements, which is why the only &mut reference in these views is to memory that is fully known to be unaliased.

The Domain structure will produce bare atomic or Cell types in the alias condition. This is necessary in order to avoid production of &mut references which alias (as this is undefined in the Rust abstract machine, regardless of behavior), and safe because any other references to the same location will be similarly aliased and capable of handling external mutation.

LLVM Suboptimizations

LLVM considers a “racing read”, that is, any read from memory that could occur contemporaneously with an atomic write, to be undefined behavior. This is a reasonable view to take, but a pessimistic one. bitvec has information that cannot be expressed to LLVM about which bits of an element it will observe or modify, and bitvec is capable of guaranteeing that two distinct access points will not be able to interfere with each other electrically. LLVM does not know this, so it considers any write to memory to touch all bits of the touched element, and any read from memory to view all bits of the fetched element.

bitvec exclusively⁵ writes to contended memory with the Rust functions AtomicT::fetch_and and AtomicT::fetch_or, which are mapped to the LLVM instructions __atomic_fetch_and_N and __atomic_fetch_or_N. It uses bitmasks that bitvec can guarantee are non-intersecting, but this proof cannot be extended to even the Rust compiler, let alone LLVM. These bitmasks are applied to register values before using either of the fetch_op instructions, and after any reads that use AtomicT::load/__atomic_load_N.

If the bitvec mask proofs were extendible to LLVM, and LLVM were to expand its tracking of which bits of a memory address became poisoned by a memory write, and which bits of a fetched memory value became un-poisoned by a masking operation, then the compiler would be more able to observe that bitvec memory accesses do not observably interfere with each other. This observation would then define the behavior in the compiler’s memory model of racing writes/reads, and permit an increased (possibly even complete) removal of synchrony guards.

I am not aware of any processor hardware which fails to guarantee that all bits of memory are fully defined at the clock edges of all instructions that use the location. To the full extent my knowledge, all memory banks in all relevant processors have a stable bit-value at the start of a tick, when reads occur, and at the end of a tick, when writes commit. At no point does changing the value of one bit of a memory component affect the electrical value of other bits in the component.

This is not necessarily true of other storage devices, such as SSDs, but bitvec can only be used to access storage cells mapped in the RAM address space, which tend to all have this stability property.

Summary

The formal definition of the bitvec memory model extends the Rust mutable-exclusion rules by refining memory regions to have bit-precision instead of element-precision. The Rust compiler is only weakly capable of tracking the region status of individual bits, and only in compiler-internal structures. LLVM has a more robust arbitrary-bit-tracking capability, but similarly limits its interface to external code.

Barring any errors in the bitvec implementation, the bitvec memory model is fully sound in its behavior with regard to single-observer unsynchrony. Synchronization is only added in order to correctly interface with rustc and LLVM without causing either of them to introduce undefined behavior due to a lack of information.

Footnotes

all references to T where T is either !Sized, or mem::size_of::<T>() is non-zero, that is. A fun quirk of Rust’s first-class concept of zero-width types is that the only illegal value for a &Zst reference is null. Since there is nothing to load or store through a &Zst reference, the compiler doesn’t care what the reference value is, as it will never be used to perform memory access.

literally: https://doc.rust-lang.org/1.43.0/src/core/cell.rs.html#232-235

This is not absolutely true. Like we saw with UnsafeCell, the only immutable rule of compiler developers is that whenever they make an immutable rule, they also provide a way to sidestep it. If you freeze a poison, you are now free to read its value and act on it. LLVM just doesn’t make any guarantees about what you’ll see.

⁴

I’d feel a great deal more comfortable if I had firm knowledge of what those costs actually were. An atomic write always issues a lock instruction modifier on x86, and I have heard vastly different things about what that actually means, from “it’s free if no other cache holds that address” up to “it poisons the whole cacheline”, and have not had much luck producing a benchmark that firmly demonstrates that unneeded atomic access is a strict performance cost.

⁵

In multithreading environments. Disabling atomics also disables bitvec’s support for multithreading, so the penalty for aliasing is reduced to an inability to remove redundant reads.

Bit Slice Pointer Encoding

bitvec’s core value proposition rests on the fact that it is capable of defining an unsized slice type, and controlling references to it. The Rust language rests heavily on the two reference types & and &mut, and does not ordinarily allow these to be faked or created by anything other than the compiler.

Rust Reference Rules

It so happens that not only does rust strongly guarantee the in-memory layout of a reference to a slice, it also provides a stable API for constructing values of &[T] type without using mem::transmute. Subject to certain value requirements imposed by types, slice references can be constructed through these functions and the compiler will accept them as valid.

These requirements traditionally make it difficult to encode non-address information into a bare reference, since the compiler has a very firm expectation that a reference to a type is immediately dereferenceäble to a value of that type, but if your type happens to be zero-sized, then it can never exist in memory, no loads or stores to it can ever be produced, and the compiler no longer concerns itself with the actual bit-pattern value of references to it.

Which is why the definition of BitSlice is

#![allow(unused)]
fn main() {
//  src/slice.rs

#[repr(transparent)]
pub struct BitSlice<T, O>
where
  T: BitStore,
  O: BitOrder,
{
  _mem: [()],
  _typ: PhantomData<T>,
  _ord: PhantomData<O>,
}
}

BitSlice is [()] with some markers that only the type-checker can see. &BitSlice is thus &[()], and &[()] can have any values it wants (except, of course, null).

Pointer Encoding

Slice references contain two pieces of information: the address of the base element, and the number of elements, starting at the base, contained in the slice region. Theoretically, bit-slice references have the same pair of information: the address of the first bit, and the number of bits in the region.

However, computers are byte-addressed, not bit-addressed, so we need to store three more bits (to select a bit in the base byte) somewhere in the reference. Since slice references are defined as { base: *T, elts: usize }, and there are no¹ spare bits in *const _, the bits to store the base bit are taken out of the length counter.

Reference address values are also required to be integer multiples of the alignment of the referent type T. This alignment is, on all supported targets, the width in bytes of the referent type. As a result, there are as many low bits in the address of any T that are guaranteed to be the 0 value, as there are bits needed to select a byte within the element. The end result is that the length counter must always use three bits to store the starting bit, and the base address will be composed of an aligned T address and an index of the starting byte within it.

As Rust does not have bitfield syntax, a definition of the pointer structure in C++ looks like this:

template <typename T>
struct BitSpan<T> {
  static_assert(
    std::is_unsigned<T>
    && sizeof(T) <= sizeof(size_t)
    && sizeof(T) <= alignof(T)
  );

  uintptr_t ptr_head : __builtin_ctzll(alignof(T));
  uintptr_t ptr_addr : sizeof(uintptr_t) * CHAR_BITS;
                     - __builtin_tczll(alignof(T));

  size_t len_head : 3;
  size_t len_bits : sizeof(size_t) * 8 - 3;
}

In Rust, the structure is declared as

#![allow(unused)]
fn main() {
//  src/pointer.rs

#[repr(C)]
pub struct BitSpan<T, O>
where
  T: BitStore,
  O: BitOrder,
{
  ptr: NonNull<u8>,
  len: usize,
  _ty: PhantomData<T>,
  _or: PhantomData<O>,
}
}

and the logical components must be accessed through get/set functions, rather than through compiler-generated field stubs.

By marking the pointer as NonNull, BitSpan declares that it will never be a null pointer and becomes subject to the same peephole optimization that allows mem::size_of::<Option<&T>>() == mem::size_of::<&T>(). By marking it as unconditionally a pointer to u8, we declare that all low bits of the address value are in use, and none can be used as slots for anything else (since our encoding is using them to select a byte within the T).

Significant Values

The null value, { ptr: 0, len: 0 }, is not valid in BitSpan<T>, but rather is used to mark Option::<BitSpan<T>>::None.

Empty Slices

All slices with a bits logical field are considered equally empty, and equal to each other. The addr and head logical fields can be anything. However, they are not required to be clamped to 1 and 0, respectively, because they can contain important information about the region.

Specifically, the BitVec type is an owning type that manages a buffer assigned to it by the memory allocator. It is never allowed to forget the address of its buffer, even if the user has removed all live bits from the vector.

Summary

Rust requires that slice references have a specific ABI, but makes no requirements about the encoding of values of those references for certain types. We can supply our own ABI-equivalent structure, define functions that use the structural encoding to compute the information needed to actually interact with memory, and convert our structures into Rust-accepted slices through the provided compiler API in core.

Footnotes

On AMD64, pointers are actually aggregates of MMU translation pages, and processors only decode the low 48 or 57 bits of them, leaving the high 16 or 7 bits available for other information not part of the memory addressing system. However, these processors also trap when attempting to dereference a pointer whose high [48:] or [57:] bits do not have the same bit value as bit [47] or [56], and that bit is typically used to differentiate unprivileged user memory from privileged kernel memory. Furthermore, this dead region does not exist on 32-bit architectures, x86 or otherwise, and since bitvec explicitly supports 32-bit systems, the use of dead bits only present on a subset of supported targets and subject to their own extra rules

Miscellaneous Implementation Details

bitvec has a number of internal implementation details used to facilitate development but are not part of the public API. While not user-facing, these details are nevertheless important to document as explanations for how the library is built.

bitvec offloads abstraction over the fundamental integers to the funty crate. It provides traits such as Unsigned that generalize over any unsigned integer and allow them to be used in generic code.

funty only unifies the standard-library APIs of the integers into trait-based code; bitvec further extends this with useful constants in the BitRegister trait. This trait delimits the integers that correspond to machine registers actually available for use as storage, and provides minor conveniences for working with them.

Specialization Hacks

bitvec is built to be agnostic to the ordering of bits within an element. This is an important aspect of its design, and even if no other ordering than the provided Lsb0 and Msb0 are used, must remain so that these two orderings can be used on equal footing. However, the deferral of register operations to type parameters outside of the data structures’ control results in some unpleasant performance losses that would not occur in a hand-written equivalent.

Some operations, like copying between, or comparing, slices, can be accelerated with partial-element access, but require knowledge of the O ordering type to provide a semantic interpretation of register contents. Language-level specialization could allow writing override impl blocks, like this:

#![allow(unused)]
fn main() {
impl<T, O> BitSlice<T, O>
where
  T: BitStore,
  O: BitOrder,
{
  fn eq(&self, other: &Self) -> bool {
    todo!("baseline")
  }
}

impl<T> BitSlice<T, Lsb0>
where T: BitStore {
  fn eq(&self, other: &Self) -> bool {
    todo!("lsb0-accelerated version")
  }
}

impl<T> BitSlice<T, Msb0>
where T: BitStore {
  fn eq(&self, other: &Self) -> bool {
    todo!("msb0-accelerated version")
  }
}
}

While waiting on this feature, we can use the compiler’s stable TypeId API to simulate access to specialization. By comparing the TypeId of the O type argument to the TypeIds of bitvec’s provided orderings, functions can detect when they are in a monomorphization with an Lsb0 or Msb0 ordering argument and specialize accordingly. The above block can be replaced with:

#![allow(unused)]
fn main() {
impl<T, O> BitSlice<T, O>
where
  T: BitStore,
  O: BitOrder,
{
  fn eq(&self, other: &Self) -> bool {
    if let (Some(this), Some(that)) = (
      self.coerce::<T, Lsb0>(),
      other.coerce::<T, Lsb0>(),
    ) {
      todo!("lsb0-accelerated version")
    }
    else if let (Some(this), Some(that)) = (
      self.coerce::<T, Msb0>(),
      other.coerce::<T, Msb0>(),
    ) {
      todo!("msb0-accelerated version")
    }
    else {
      todo!("baseline")
    }
  }
}
}

and, during monomorphization, only one branch of the if stack will be preserved. The .coerce() method is defined in slice::specialization and provides easy access to a fully-typed value only within the monomorphization that matches it.

Afterword

Thanks for reading! I hope this book was useful to you.

Please feel both welcomed and encouraged to ask me any clarifying questions, or to file an issue if any of it is confusing or even send a PR if you have improvements!

This has been a work of intensely deep focus and research for three and a half years. It is a complex and difficult topic, and I am absolutely certain that I have information and context that I haven’t sufficiently serialized for everyone else to see.

Thank you for using bitvec. I am delighted that other people have found my work both interesting and useful, and I’m glad that I’ve been able to help people like you write better programs.

A User’s Guide to the `bitvec` Project