Dedication

I began working on bitvec shortly before I was told that my father had been diagnosed with cancer for the third time. Developing the library gave me something into which to sink my attention and keep my mind from dwelling on his rapidly-progressing illness. I wrote the core pointer representation that enables bitvec’s modern behavior at his side, in the last days that he remained conscious.

I never had the chance to explain to my dad what I was building. By the time it was developed enough to be worth explaining, he had lost too much of his brain to understand me. More than anything I’ve done for my employers, bitvec is the work of which I’m most proud, and which I most wish he could have seen.

The Ferrilab project, and in particular bitvec, is dedicated to the memory of my father, George M. Payne, who inspired me to build quality work, supported my search for my talents, and taught me to labor for the benefit of others beyond myself.

—Alexander Payne (myrrlyn)

Introduction to Ferrilab

Ferrilab is an umbrella project for crates that experiment with reshaping the Rust data model about primitive types. It began with the bitvec crate; as bitvec expanded, it eventually spun off its integer-generalization into funty, and radium was created independently but swiftly brought into the fold.

Ferrilab is currently focused on just these three crates, but may expand in the future as we discover new ideas to try out.

Behind the Name

The primary maintainer, myrrlyn, is from the Great Lakes region of the Americas and the crates here all reshape the fundamental types in the Rust language. We looked for names that had to do with early modern physics, and settled on Enrico Fermi, as he worked in atomic physics, has an eponymous laboratory near Chicago, and Fermilab was a single edit step away from Rust’s mascot.

Plus, bitvec began while myrrlyn was working for the US Government in New Mexico…

Radium

The radium crate provides a unifying model for shared-mutability over the primitives. It does not handle more complex shared-mutability topics such as mutices or locks: if you need to manage large structured data, you will need to look elsewhere.

Radium Trait

This trait allows your code to generically accept either an atomic type or a Cell and interact with it through a unified API. It is implemented by all of the standard library atomics, Cell<{bool,{i,u}{8,16,32,64,128,size},*mut T}>, and the type families that Radium provides (described below).

The primitive type that a Radium implementor encloses is indicated by the Radium::Item associated type. You can use this as a constraint in your trait bounds (i.e. <R: Radium<Item = i32>>) in order to gain direct access to the primitive, or you can require that R::Item implement other traits and interact with it through them.

Type Aliases

Radium provides type aliases for each primitive which could be atomic in the standard library. The Radium{Bool,{I,U}{8,16,32,64,128,size},Ptr} symbols all forward to their corresponding AtomicType when that symbol exists, or to Cell<Type> when it does not.

Since these are type aliases rather than newtypes, you can globally replace the AtomicT symbols with their RadiumT equivalent without any other changes to your codebase.

Type Families

Radium provides three type families which accept fundamentals as type parameters. These families have no inherent API, and only implement Radium, Debug, Default, and From<T>. They may or may not have a Sync implementation, depending on whether they have atomic behavior.

  1. The Atom<T> family corresponds to (and wraps) the standard library’s core::sync::atomic::* types. This family only accepts T parameters where an equivalent AtomicT symbol exists for the target; this means that on targets which do not have, for instance, AtomicU64, Atom<u64> will fail to compile. These are always Sync.

    Atom requires that type arguments implement radium::marker::Atomic.

  2. The Isotope<T> family functions similarly to Atom<T>, except that it wraps Radium’s RadiumT type aliases. As such, Isotope is portable across targets with different atomic supports, and will never fail to compile; however, it will silently degrade from atomic to Cell behavior (including loss of Sync) when the requisite atomic types are missing.

    Isotope requires that type arguments implement radium::marker::Nuclear.

  3. The Radon<T> family is a wrapper over Cell<T>. Like Isotope, it requires that type arguments implement radium::marker::Nuclear. It is never Sync.

Examples

This contrived example is taken from radium/examples/schroedinger.rs. It shows how the Radium trait can be used by a worker function to manipulate data, and how the different types can be used to work in sequence or in parallel.

Note: Radium’s MSRV is 1.60, while the scoped-threads API used here stabilized in 1.63.

use radium::{Radium, types::{RadiumU64, Atom, Isotope, Radon}};
use std::{
  cell::Cell,
  sync::atomic::{AtomicU64, Ordering},
  thread,
  time::Duration,
};

fn do_work<R: Radium<Item = u64>>(this: &R, ident: u8) {
  let on_entry = this.load(Ordering::SeqCst);
  println!("{: >2} step 0 sees: {: >2}", ident, on_entry);

  let before_add = this.fetch_add(10, Ordering::SeqCst);
  println!("{: >2} step 1 sees: {: >2}", ident, before_add);

  let after_add = this.load(Ordering::SeqCst);
  println!("{: >2} step 2 sees: {: >2}", ident, after_add);

  thread::sleep(Duration::from_millis(after_add));

  let before_sub = this.fetch_sub(3, Ordering::SeqCst);
  println!("{: >2} step 3 sees: {: >2}", ident, before_sub);

  let on_exit = this.load(Ordering::SeqCst);
  println!("{: >2} step 4 sees: {: >2}", ident, on_exit);
}

static ATOM: AtomicU64 = AtomicU64::new(0);
static RADIUM: RadiumU64 = RadiumU64::new(0);

fn main() {
  let cell = Cell::new(0u64);

  let atom = Atom::new(0u64);
  let isotope = Isotope::new(0u64);
  let radon = Radon::new(0u64);

  println!("atoms");
  thread::scope(|s| for ident in 0 .. 3 {
    s.spawn(move || do_work(&ATOM, ident));
  });
  println!();
  thread::scope(|s| for ident in 3 .. 6 {
    let atom = &atom;
    s.spawn(move || do_work(atom, ident));
  });
  println!();

  println!("isotopes");
  thread::scope(|s| for ident in 6 .. 9 {
    s.spawn(move || do_work(&RADIUM, ident));
  });
  println!();
  for ident in 9 .. 12 {
    do_work(&isotope, ident);
  }
  println!();

  println!("cells");
  for ident in 12 .. 15 {
    do_work(&cell, ident);
  }
  println!();
  for ident in 15 .. 18 {
    do_work(&radon, ident);
  }
  println!();
}

funty

The funty crate (fundamental types) provides traits that unify the Rust non-pointer primitives. It also unifies pointers and references by lifting access permissions into the trait system.

Fundamental Unification

The Rust primitives implement the following trait hierarchy and replicate their standard-library inherent API and trait implementations.

  • Fundamental: this is implemented by all primitives: bool, char, all integers, and both floats. It requires all traits that all primitives implement, and provides .as_other() methods that can replace as-casts.
    • Numeric: this is implemented by all integers and both floats. It adds the arithmetic operator traits, and methods for converting between the integer and its raw byte representation.
      • Integral: this is implemented by all integers. It adds bit-wise operator traits, attempted conversions between the other integers, and bit-shifts. It also provides most of the integer inherent API, as most of these methods are sign-agnostic.
        • Signed: this is implemented only by signed integers. It adds the absolute-value and sign-testing functions that unsigned integers don’t support.
        • Unsigned: this is implemented only by unsigned integers. It provides the {is,next}_power_of_two one-hot methods that only make sense on unsigned integers.
      • Floating: this is implemented by the floating-point numbers. Unlike the integral traits, it has a great deal of methods that only exist when cfg(feature = "std") is active, as they require the platform libm mathematics library and are not provided by Rust’s core crate. It also provides both all of the associated constants, as well as all of the constants stored in eponymous modules but not associated with the actual floating-point primitive types.

Additionally, funty provides marker traits for selecting bit-width: for N in 8, 16, 32, 64, and 128, the IsN trait is implemented by types that are exactly that wide, AtLeastN is implemented by types that are that width or more, and AtMostN is implemented by types that are that width or less.

You can use these traits as generic constraints in code that needs to accept a range of different primitives. The integral traits provide Peano constants (zero and one), and can be constructed from literals for non-const work.

Pointer Unification

The funty::ptr module provides Pointer and NonNullPtr types which are replacements for raw pointers and core::ptr::NonNull, respectively. They work by lifting the *const T/*mut T distinction into the trait system, through the Permission trait and the Shared, Unique, and (Shared, Unique) types.

The Permission trait and its implementors implement a less-capable version of the stacked-borrows experimental model found in Miri. Pointer<T, P> implements the read-only APIs found on both *const and *mut pointers, while Pointer<T, Unique> alone implements the write APIs only present on *mut pointers. Additionally, type-level transitions allow safely casting Unique pointers down to read-only and back up to Unique, and unsafely casting directly to a permission that you specify.

The NonNullPtr behaves similarly to Pointer, except that it encloses a core::ptr::NonNull in order to regain the null-pointer niche optimization. Its API strives to match both the NonNull and Pointer APIs. As both raw pointers and NonNull still have large amounts of unstable API surface in the standard library, these types will continue to grow in response to both Ferrilab’s needs and the standard library’s evolution.

Permission Changes

funty uses the Permission trait to create a graph of safe transitions between raw pointers and Pointer, and between different type parameters attached to Pointer values.

Pointers can be constructed from both raw pointers: *const T produces Pointer<T, Shared>, and *mut T produces Pointer<T, Unique>. Once constructed, all Pointer<T, P: Permission> values have access to the introspective and read-only memory accesses defined on the raw pointers. The memory-write APIs are only available on Pointer<T, Unique>.

Additionally, the method .cast_shared() moves Pointers from P to (Shared, P). The (Shared, P: Permission) tuple is itself an implementor of Permission, and can continue to be used as a read-only pointer. Pointer<T, (Shared, P)> also provides .cast_unshared(), which undoes .cast_shared() and transitions from (Shared, P) back to P.

All Pointers can produce *const raw pointers, but only the Unique permission can produce *mut raw pointers. If you need access to *mut raw pointers but are in generic code where you cannot satisfactorily prove to the compiler that you have a Unique, you have two options. The .unwind_to_unique() method recursively unwinds a (Shared, P) history stack until it reaches the base, then produces Some pointer if the original permission was Unique or None if it was Shared. The unsafe method .cast_mut() unconditionally produces a pointer with Unique permissions, but may violate Rust’s provenance rules and invoke undefined behavior.

bitvec

This is Ferrilab’s largest, most complex, project and likely the reason you’re reading this guide. It provides a system that allows Rust code to treat memory as if it were bit-addressed, rather than byte-addressed, and strives to be the most correct and capable such bit-addressing system available. This results in some unavoidable complexity and performance loss, and we are always working to improve the quality of generated code.

Introduction

bitvec was built out of myrrlyn’s experience and frustration with performing I/O buffer manipulation using C, C++, and Ruby. His work required programs capable of dynamically selecting an arbitrary region of a bit-stream (a task to which C’s structural bitfield declarations are unsuited), and it required those programs to be fast and portable to flight embedded systems (adjectives not commonly associated with Ruby engines).

Furthermore, his work involved message schemas that were permitted to select a bit ordering at the packet and field level. This is not a behavior that any existing bit-stream library or language feature provides. These experiences informed his goals and design choices from the very beginning.

bitvec matches, and exceeds, the functionality of every other bit-stream implementation we have found. It is also the only Rust crate that is a drop-in replacement for standard library types, and is able to do so while remaining well-formed and conformant to Rust’s rules about memory access. Thanks to excellent compiler engineering by the Rust and LLVM teams, it is able to do this while still producing reasonably good output code.

Goals for This Guide

We hope that this guide will explain bitvec’s design choices, philosophy, backing theory, and overall goals. It is not a detailed exploration of the crate API – this is hosted on docs.rs – but instead seeks to communicate how to think about bitvec so that you will know how to best use the APIs it offers.

The best way I (myrrlyn) know how to communicate this information is as a dialogue between me, the author, and you, the user. Since this is a book, not a live conversation, I actively encourage you to get in contact with me with any questions or feedback through the channels listed in Ferrilab’s CONTRIBUTING document, and throughout the guide I will periodically remind you that if a section is unclear, it is an error on my part, and I would appreciate an issue or other contact so that I can improve it.

Data Structures

bitvec’s primary exports are four data structures: BitSlice, BitArray, BitBox, and BitVec. These correspond to the [bool], [bool; N], Box<[bool]>, and Vec<bool> types in the Rust standard libraries. Unlike the Rust types, the bitvec types are not composable, and cannot be mixed with any other types, including pointers in the standard library.

You **cannot** produce `Box<BitSlice>`, `Rc<BitSlice>`, or `Arc<BitSlice>`;
attempts to do produce or use these may induce undefined behavior.

The bitvec types implement the APIs of their standard-library counterparts to the fullest extent possible. The only missing feature is currently IndexMut<usize>, preventing collection[index] = bit; assignment. This means that, for users looking for compact usize => bool collections, substituting types in your project codebase ought to be enough to make your project Just Work™️.

It is the fact that BitSlice acts exactly like [bool], and can only be used through &BitSlice and &mut BitSlice references, that makes bitvec unique. No other Rust library has this capability.

Before we explore the data structures in more detail, there are three caveats I must provide:

  1. bitvec uses an opaque, custom, pointer representation for everything except BitArray. You may not inspect or modify this pointer. You may not use it as a type or value parameter in any other types or functions. You will break your program if you try.

    bitvec ensures that this pointer encoding does not fail the compiler and language requirements for reference correctness. The rules used to do so are internal to the crate, and will not be present outside it. bitvec pointers are perfectly safe to use, as long as you treat them as completely opaque and use only the interfaces provided.

    Standard-library smart pointers do not understand this encoding and would attempt to dereference the encoded pointer; this is why trying to use BitSlice with them is undefined.

  2. These structures all have two type parameters, <T: BitStore, O: BitOrder>. These parameters are described in the next chapter. They govern the in-memory representation of data storage, but are not relevant to the general use of the handle types.

  3. bitvec trades an increased memory space efficiency for decreased instruction size and speed efficiency. The compiler may optimize some of the costs away, but bitvec structures are not free to use. The “zero-cost” of the bitvec abstraction is that you cannot write a better bitset, and not that it is equal in runtime performance to an ordinary bool sequence.

Now that the disclaimers are over, we can talk about how the types actually work.

Slices

  1. Getting a BitSlice
    1. Borrowing Constructors
    2. Macro Constructor
  2. What BitSlice Can Do
    1. … That [bool] Can
    2. … That [bool] Cannot
    3. Set Queries
    4. Boolean Arithmetic
    5. Writing To Memory
    6. Viewing the Underlying Memory
  3. Footnotes

The base type of the project is BitSlice. This is a region type, like [bool], and cannot be held directly. Instead, it is accessed by borrowed references (&BitSlice, &mut BitSlice) or owning handles (BitArray, BitBox, BitVec). The distinction between the handles and the region is the same as it is in ordinary Rust types.

The BitSlice region is able to begin and end at any bit in memory, and is not restricted to having one edge aligned to the edge of its initial element. This restriction, present in all of its competitors, is removed through the use of a special encoding in all pointers to the region, which stores the starting bit of the base element in part of the slice pointer that describes the real memory.

There are eight bits to a byte on all systems Rust targets, and therefore the index of a bit within a byte is itself three bits wide. These bits are taken from the length counter of a slice pointer, which means that BitSlice is able to address only ⅛th of the indices that [bool] can.

This is 64 [Mebibytes] on a 32-bit system, and 256 [Pebibytes] on a 64-bit
system. If you can even allocate that much real memory in one handle, then you
have very different operating conditions than I can help you with.

[Mebibytes]: https://en.wikipedia.org/wiki/Mebibyte
[Pebibytes]: https://en.wikipedia.org/wiki/Pebibyte

Getting a BitSlice

BitSlice is strictly a borrowed region. It can neither be created nor destroyed; rather, views to it are acquired from a memory buffer that some other binding owns.

The BitStore chapter covers this in more detail, but only slices of the unsigned integers u8, u16, u32, usize, and (on 64-bit targets) u64 can be used as the source memory for a BitSlice. (You can also use their Cell<> wrappers or atomic variants; this will be discussed later).

Borrowing Constructors

The simplest way to create a BitSlice reference is to borrow it from ordinary Rust data. The BitView trait, available in the prelude, implements methods on the supported unsigned integers, all arrays of them, and their slices.

use bitvec::prelude::*;

let byte = 0u8;
let bits = byte.view_bits::<LocalBits>();

let array = [0u16; 2];
let bits = array.view_bits::<Lsb0>();

let mut array = [0u32; 3];
let slice = &mut array[..];
let bits = slice.view_bits_mut::<Msb0>();

The .view_bits() and .view_bits_mut() methods take the other type parameter bitvec requires. This is described in the BitOrder chapter. Use Lsb0 until you have a specific need for a more precise parameter.

In addition, BitSlice offers constructor functions ::from_element(), ::from_slice(), and their _mut variants, which borrow elements and slices, respectively, and construct &/mut BitSlice references from them. The trait methods are generally easier, and certainly shorter to write, but they all do the same work.

Lastly, empty slices can be produced with the ::empty() or ::empty_mut() functions, since there is no &[] or &mut [] literal available.

Macro Constructor

In addition to these method constructors, you may also use the bits! constructor macro. This macro runs at compile-time to produce a buffer containing the correct data values, then borrows it as a BitSlice reference. It is a macro_rules! macro, not a procedural macro, and should not have a significant impact on your compilation times.

By default, the produced buffer is a temporary that the compiler will then extend to have the minimum lifetime of the produced reference handle. However, you can use the static keyword to cause the macro to produce a hidden and unnameable static BitArray backing buffer, which then provides the &'static BitSlice lifetime. Since this static buffer cannot be named, it is safe to use even when mutable, as the provided reference is the only handle to it.

The macro syntax extends that of vec!. The simplest invocations are sequences or repetitions of expressions, which can optionally be made mutable:

use bitvec::prelude::*;

let r = bits![0, 1, 0, 1];
let w = bits![mut 0, 1, 0, 1];

let r2 = bits![static 1; 4];
let w2 = unsafe { bits![static mut 1; 4] };
You are not required to use the literals `0` or `1`; you can use any expression
that is `const`-evaluable and can be placed into the expression `expr != 0`.
This means that you cannot use the names of runtime `let` bindings, but can use
the names of `const` bindings, or other literals. You probably do not want to do
this, but you *can*.

In addition, you can specify the bit-ordering integer storage type parameters for even more precise control over the memory layout. If you do not specify them, the macro uses the default parameters of usize storage and Lsb0 ordering.

use bitvec::prelude::*;

let in_bytes = bits![u8, LocalBits; 0, 1, 0, 1];
let in_shorts = bits![u16, Lsb0; 0, 1, 0, 1];
let in_ints = bits![mut u32, Msb0; 0; 4];

To summarize the macro rules:

  • If the first macro argument is mut, then the macro produces &mut BitSlice, otherwise it produces &BitSlice. You do not need to bind the name as mut unless you want to reässign it to a different slice.
  • You may then optionally provide the storage and ordering type parameters, followed by a semicolon. You must provide either both or neither.
  • The data input to the macro is one of the two vec! token lists:
    • One or more expressions that can be placed into $expr != 0, separated by commas. A trailing comma is permitted.
    • A single expression that can be placed into $expr != 0, followed by a semicolon and a repetition counter. The resulting BitSlice will be counter bits long, all set to expression.
Emulation tests indicate that `bitvec` correctly instructs the compiler to
produce suitable buffers even when compiling for a target with a different
byte-endianness than the host. However, I have not actually performed such
cross-compilation and testing with real hardware. It should be correct; please
file an issue if it is not.

What BitSlice Can Do

Now that you have acquired a BitSlice reference, either by borrowing memory from elsewhere in your program or by creating a temporary, it is time to do some actual work with it.

… That [bool] Can

Everything1. I am not going to rewrite the standard library’s slice documentation here.

… That [bool] Cannot

In addition to the standard library [bool] API, BitSlice offers some inherent methods tailored to its specialization.

Set Queries

The five query methods .any(), .all(), .not_any(), .not_all(), and .some() test how many bits in a region are set to 1. These methods have the following truth table:

Sliceanyallnot_anynot_allsome
00falsefalsetruetruefalse
01truefalsefalsetruetrue
11truetruefalsefalsefalse

any is the Boolean OR operator; all is the Boolean AND operator, and some is the Boolean XOR operator.

In addition, .count_ones() and .count_zeros() count how many bits of the slice are set to one or zero, rather than merely indicating whether any exist. These methods are slower than the Boolean queries, which are capable of short-circuiting once satisfied.

You can also use .iter_ones() and .iter_zeros() to walk each index of bits with the specified value. These are equivalent to running .filter() and .enumerate() calls on iterators of bool, but are specialized to use dedicated bit-counting instructions where processors provide them.

Boolean Arithmetic

bitvec data structures all implement the Boolean operators (&, |, ^, and !) against each other.

In version 0, they allowed any `impl Iterator<Item = bool>`. This has been
changed for performance reasons, since people never used the arbitrary iterator
support but did require improved behavior when operating on two bit-slices.

These still have performance degradations, as the conditions required to allow
specialized acceleration can be difficult to guarantee at either run- or
compile- time. We are working on it, but cannot make any promises.
use bitvec::prelude::*;

let mut or  =  bits![mut 0, 0, 1, 1];
        or |=  bits![    0, 1, 0, 1];
assert_eq!(or, bits![    0, 1, 1, 1]);

let mut and  =  bits![mut 0, 0, 1, 1];
        and &=  bits![    0, 1, 0, 1];
assert_eq!(and, bits![    0, 0, 0, 1]);

let mut xor  =  bits![mut 0, 0, 1, 1];
        xor ^=  bits![    0, 1, 0, 1];
assert_eq!(xor, bits![    0, 1, 1, 0]);

let mut not = bits![mut 0, 1];
        not = !not;
assert_eq!(not, bits![  1, 0]);

Writing To Memory

You can set all bits in a region to a new value by using the .fill() method, or you can set one bit in a region to a new value by using either the .set or .get_mut methods. .get_mut produces a proxy type which acts roughly like an &mut bool reference slot.

use bitvec::prelude::*;

let bits = bits![0; 4];
assert!(bits.not_any());

bits[0 .. 1].set_all(true);
assert!(bits[0]);

bits.set(1, true);
assert!(bits[1]);

*bits.get_mut(2).unwrap() = true;
assert!(bits[2]);

let mut bit = bits.get_mut(3).unwrap();
assert!(!bit);
*bit = true;
assert!(bits[3]);

assert!(bits.all());

The proxy type produced by .get_mut() implements DerefMut<Target = bool>, so you can assign into it and read out of it. However, it does not commit the value assigned into it back to its source BitSlice until it Drops.

You can force the destruction of a named proxy reference by using its .commit() method, which takes self by value, destroying it and releasing the borrow.

The proxy type is *not* a reference, which means you need to bind it with
`let mut` in order to be able to write through it as if it were a reference.

Viewing the Underlying Memory

The memory underlying any bit-slice region is subject to some restrictions about aliasing that are documented more thoroughly in the domain module and the Memory Model chapter. In short, borrowed BitSlice regions cannot view their underlying memory directly without violating aliasing rules established by either the Rust language or by bitvec itself. Instead, the .domain() and .domain_mut() methods provide views that correctly handle aliasing and edge conditions, and mediate access to the underlying memory.

The owning handles (BitArray, BitVec, and BitBox) do not have this limitation, as they can guarantee unique access to a memory location without any possibility of aliasing. As such, these types all have .as_raw_slice() and .as_raw_mut_slice() methods that provide ordinary slice views to their storage region.

Footnotes

1

Except write-assignment through indexing. I am not going to keep mentioning this exception.

Arrays

While BitSlice describes a region of borrowed data, BitArray provides a container that can hold and manage such a region.

It is most comparable to the C++ type std::bitset<N>. Unfortunately, the Rust support for type-level integers is still experimental, so it is unable to take the length of the BitSlice it contains as a type parameter. Instead, it must take the entire region it contains as a type parameter. The full type declaration is

use bitvec::prelude::*;
pub struct BitArray<
  A: BitViewSized,
  O: BitOrder,
> {
  _ord: PhantomData<O>,
  data: A,
}

As described in the previous chapter, the BitView trait is implemented on the unsigned integers, and on arrays of them.

Once type-level computation stabilizes, `BitArray` will change to have the type
parameters `<T: BitStore, O: BitOrder, const N: usize>`, matching the
`std::bitset<N>` length parameter. This will require a major-version increase.

This array dereferences to a BitSlice region over its entire length. It does not currently permit shortening its BitSlice from either end. If this is a behavior you want, please file an issue.

Declaring a BitArray Type

Until Rust allows type-level integer computation to affect the memory layout, BitArray types remain awkward to declare. You could declare types yourself by using the bitvec::mem::elts function:

use bitvec::{array::BitArray, mem, order::Lsb0};

type MyArray = BitArray<[u8; mem::elts::<u8>(50)], Lsb0>;

But for convenience, we provide a BitArr! type-declaration macro. It expands to exactly the expression above. It accepts the following syntaxes:

use bitvec::{BitArr, order::Lsb0};

// explicit ordering
type A = BitArr!(for 50, in u16, Lsb0);
// implicit ordering defaults to Lsb0
type B = BitArr!(for 50, in u32);
// implicit store defaults to usize
type C = BitArr!(for 50);

Creating a BitArray Value

The ::ZERO constant is a blank BitArray with its memory completely zeroed. The ::new() function wraps an existing element or array into a BitArray. In addition, the macro constructor bitarr! takes the exact same arguments as the bits! constructor, except that it returns an array directly rather than a reference to a buffer.

Furthermore, BitArray structures and references can be constructed from &BitSlice references using the TryFrom trait, just as arrays can be constructed in the standard library.

Using a BitArray

Once constructed, BitArray offers the .as_bitslice() and .as_mut_bitslice() explicit methods, as well as all the standard traits, to borrow its data as a BitSlice. The array has almost no functionality of its own, and serves primarily to own a region used as a BitSlice. Like standard library arrays, it natively produces a by-value iterator

Once you are done using BitSlice to manipulate the array, you can remove the array with .into_inner() and regain the A memory within.

That’s everything that the array does! Like regular arrays, it is useful primarily for its ability to move memory through a program, and has essentially no behavior in its own right. As a plain data structure, it is most useful for programs that do not have access to a dynamic allocator, and do not wish to use static buffers. However, if you do have access to an allocator, you will probably prefer to use BitVec instead.

BitArray is uniquely suited as a structural field for types which implement I/O protocols and have fixed-size buffers, such as the headers of internet transport packets, or CPU instruction set encodings. A full example can be found in the bitvec examples, but here is a short sample of how a TCP packet header might be represented:

use bitvec::prelude::*;

#[derive(Clone, Copy, Default)]
struct TcpHeader {
  data: BitArr!(for 160, in u8; Msb0),
}

impl TcpHeader {
  fn data_offset(&self) -> usize {
    self.data[96 .. 100].load::<u8>() as usize
  }

  fn set_data_offset(&mut self, value: u8) {
    if !(5 ..= 15).contains(&value) {
      panic!("invalid data offset value");
    }
    self.data[96 .. 100].store(value);
  }

  fn syn_flag(&self) -> bool {
    self.data[110]
  }

  fn sequence_number(&self) -> u32 {
    self.data[32 .. 64].load_be()
  }

  fn as_bytes(&self) -> &[u8] {
    self.data.as_raw_slice()
  }
}

This snippet shows how you can:

  • use BitArray as the storage inside a semantic new-type
  • select individual flag bits
  • use sub-byte regions for small integer storage
  • use multi-byte regions for large integer storage
  • access the raw storage for interaction with I/O systems

Dynamic Allocations

bitvec has two types that require compiling the crate with feature = "alloc" (and linking against the Rust distribution crate alloc): BitVec and BitBox. BitVec is a dynamically resizable vector, equivalent to the C++ type std::vector<bool> and the Rust type Vec<bool>. BitBox is a frozen vector, incapable of changing its length, equivalent to the Rust type Box<[bool]> or the C++ type std::unique_ptr<std::bitset<N>>.

Since BitBox is a vector that cannot .push() or .pop(), I will not discuss it in detail here. It is a heap-allocated BitSlice, and is otherwise uninteresting. It only exists because Box<BitSlice> is not expressible.

Getting a BitVec

BitVec implements the constructors shown in the standard library: ::new() creates a handle without any allocation, BitSlice implements the .to_bitvec() method, and and ToOwned trait, to copy a bit-slice into a new bit-vector. Additionally, the bitvec! macro takes all the bits! arguments and produces an owned allocation instead of a stack temporary. You can also construct a BitVec by .collecting any iterator of bools.

use bitvec::prelude::*;

let a: BitVec = BitVec::new();
let b = bits![0, 1, 0, 1].to_bitvec();
let c = bits![0; 4].clone();
let d = bits![u8, Msb0; 1; 4].to_owned();
let e = bitvec![0, 1, 1, 0];
let f = bitvec![u16, Msb0; 0; 4];
let g = [true, false, true, false]
  .iter() // &bool
  .copied() // bool
  .collect::<BitVec>();

Using a BitVec

Once constructed, BitVec implements the entire API that Vec does in the standard library. This remains uninteresting to write out.

Like BitSlice, BitVec and BitBox are implemented as stack handles that use the specially-encoded pointer to describe their region. This enables them to remain the same size as their standard-library counterparts, while making them completely opaque to inspection.

Because they are fully owned, BitVec and BitBox have some important behavioral differences from BitSlice. Primarily, because they do not have to worry about other handles viewing the memory under their control, they can modify the contents of their buffers outside the BitSlice that is considered live, and they do not exclude partial edge elements when viewing their buffer as raw memory. If you are using BitVec to construct an I/O buffer, these two facets can have surprising results if you are not careful to fully initialize a memory span.

Vectors, like slices, do not need to begin at the zeroth index of the base element. They can begin, and end, at any bit in any element. This will only happen when copying a vector from a source slice that was misaligned. The .force_align() method moves the vector’s live slice down to start at the zero index. Once done, extending the live slice to reach the last index of an element ensures that viewing the buffer as raw memory will have no uninitialized bits.

Type Parameters

bitvec uses type parameters to permit precise user control of its behavior and in-memory representation. The Rust generic system permits bitvec to have a more powerful and capable behavior than any other bitstream library yet implemented in any language.

All bitvec types take two type parameters:

  1. T: the storage type. The slice types (BitSlice, BitVec, BitBox) select an unsigned integer, since as slices they store the length in their runtime state. BitArray uses either a bare integer or an integer array as its storage parameter, since it has no runtime state of its own and stores all of its information in the type system.
  2. O: the ordering of bits within a single T element. We provide two orderings, Msb0 and Lsb0.

The combination of these two type parameters governs how bitvec translates its abstract storage (usize -> bool) into real memory; if you do not care about real-memory representation, then the default type parameters <usize, Lsb0> will give you the best performance. If you do care about this, then the memory representation chapter goes into detail about all the combinations and can help you select which one best fits your needs.

The BitOrder trait is open for third-party implementation. It describes its requirements in great detail in its API documentation, so if you have a memory representation that is neither Lsb0 nor Msb0, you can implement the ordering yourself and bitvec will use it without complaint.


Rust syntax requires explicitly choosing type parameters when using generic expressions, such as BitVec::<Store, Order>::new(), and will not substitute in the default parameters when attempting to elide the parameters with BitVec::new(). However, Rust will use the default type parameters in patterns: let bv: BitVec = BitVec::new(); will use the default type parameters in the : BitVec type annotation, which then completes the type of the expression on the right side of the assignment =.

Bit Ordering

  1. Provided Orderings
    1. Lsb0
    2. Msb0
    3. LocalBits
    4. Default Ordering Parameter
  2. Implementing BitOrder
    1. Support Types

bitvec expects user code to count semantic indices, and hides the actual position of a bit within a memory element from users. This allows user code to treat indices as a uniform domain of integers in 0 ..= !0 >> 3, and not have to remember whether place - 1 means moving “forward” or “backward” in the buffer.

Internally, each *BitSlice pointer contains an element address and a bit index. The pointer uses its BitOrder type parameter to translate the bit index into a mask that drives actual memory access.

BitOrder is open to user implementation, and implementations of it are trusted to be sound in the bitvec memory model. For this reason, the trait is unsafe to implement. Most users will not implement it; almost all users want one of the two monotonic orderings provided for you. However, some specialized users may have an ordering of their own, and they are still able to encode that ordering away from their index logic.

Provided Orderings

bitvec provides two orderings: Lsb0 and Msb0. These each refer to which numeric bit in a register element is considered to be the zero-index.

You can think of these as corresponding to the little-endian and big-endian processor byte orderings if you like, as long as you remember that your choice of bit-ordering is not at all related to the byte-ordering of your target processor.

Lsb0

The Lsb0 type sets the zero index at the least significant bit of a register (numeric value 1) and each successive index selects the next more significant bit in the register, until the most significant bit is at the final index.

It is the expression mask = 1 << index;.

Msb0

The Msb0 type sets the zero index at the most significant bit of a register (numeric value 2n - 1 for an n-bit register) and each successive index selects the next less significant bit in the register, until the least significant bit is at the final index.

It is the expression mask = (iN::MIN as uN) >> index;.

LocalBits

The type alias LocalBits refers to the ordering that your target’s C compiler will likely choose for its bitfield direction. This is Lsb0 on little-endian byte-ordered processors, and Msb0 on big-endian byte-ordered processors. Remember that the BitOrder implementations and processor byte orderings have no relation to each other! This is only a custom, not a requirement of the processor architecture.

Default Ordering Parameter

Lsb0 is the default type parameter used by the sequence types, as it produces selection masks using the starting value 1, which encodes to smaller instructions than the Msb0 starting value.

On AMD64, the pairs <u64, Lsb0> and <u64, Msb0> produce the following object code and disassembly:

ba 01 00 00 00  movl $1, %edx
48 d3 e2        shlq %cl, %rdx

48 ba 00 00 00 00 00 00 00 80  movabsq $-9223372036854775808, %rdx
48 d3 ea                       shrq    %cl, %rdx

The Msb0 load requires an additional four bytes of zeros in its immediate, and the 64-bit modifier prefix (0x48), in order to encode movabsq instead of movl

Implementing BitOrder

As stated above, this is an unsafe trait because bitvec relies on its implementations to uphold certain mathematical invariants. Failure will result in memory unsafety and/or program crashes.

BitOrder has three functions: at, select, and mask. Implementors are required to provide at; select and mask have default implementations that rely on it. These functions are all generic over the BitMemory trait; this trait describes the register types (unsigned integers) that can be used by bitvec. It provides some useful associated constants, and is otherwise uninteresting.

  • ::at() receives the semantic index of a bit within a register type, and must produce the concrete position corresponding to the semantic index. The input and output are both integers in the domain [0, W) where W is the bit-width of the register type being indexed.

    at must implement an exactly one-to-one mapping from all inputs to all outputs in the [0, W) domain. This mapping must never observably change. These are strict requirements of the library, and failing to uphold either will break your program.

  • ::select() receives the semantic index of a bit within a register type, and must produce a value of that register type with exactly one bit set. The produced value is a mask that selects only the bit specified by the provided index, and will be used in Boolean arithmetic to manipulate memory.

    The default implementation is 1 << at(index). You are required to maintain this behavior in your override.

  • ::mask() receives an inclusive start index and an exclusive end index, and must produce a register value that selects every bit in the indexed range.

    The default implementation is (start .. end).map(select).sum(). You are required to maintain this behavior in your override.

Support Types

The BitOrder trait APIs use supporting types to enforce requirements on the bare numbers being passed through it. These types are documented in the index module. They all provide a .value() method that removes the wrapper and yields the inner number, and a ::new() constructor that ensures that values to be wrapped uphold the type’s requirements.

  • at and select receive a BitIdx<M> argument. This is a wrapper over u8 that ensures that the contained value is in the domain 0 .. M::BITS. It serves to indicate that a number is a semantic counter, not an electrical position.
  • at returns a BitPos<M> value. This has the same behavior and restrictions as BitIdx<M>, and indicates that the number is an electrical position within a register.
  • select returns a BitSel<M> value. This wraps an M register value, and ensures that the contained number is a power of two – exactly one bit is set, and all others are zero. This type indicates that the mask is guaranteed to select exactly one bit in a register.
  • mask receives an inclusive BitIdx<M> and an exclusive BitEnd<M> argument, and returns a BitMask<M> value. BitEnd<M> ensures that the contained number is in the domain 0 ..= M::BITS, including the final count, and marks a one-past-the-end exclusive boundary. BitMask<M> marks that the contained number may select any number of bits in a register.

New implementors of BitOrder will use these types to satisfy behavior requirements individually.

In addition, implementors’ test suites should call the function order::verify_for_type::<O, M>() to check that an implementation O satisfies the behavior requirements for a particular register type M, or call order::verify::<O>() to check that an implementation satisfies the behavior requirements for all supported register types. These functions are conditional upon cfg(test), and accept a verbose: bool parameter that allows the test functions to print diagnostics to stdout during evaluation.

If the verification functions panic, your implementation is incorrect, and cannot be safely used in bitvec.

Bit Storage Within Registers

The BitStore trait governs the processor behaviors used to interact with the memory of a BitSlice buffer. These include both the width of the processor register used to contain a memory element, and the load/store instructions the processor uses to move data across the memory bus.

BitStore has no behavior of its own, and serves only to collect associated types and constants. It cannot be implemented outside bitvec, and is a closed portion of the API. You can freely use the trait as a bound in types that contain bitvec structures, but should not otherwise attempt to make use of it.

Implementations

BitStore is implemented on all unsigned integer types not larger than a target processor’s word size, all Cell<T> wrappers of them (as Cell is a compiler directive), and their AtomicT variants.

Not all processors have atomic instructions for all their scalar registers. bitvec uses the [radium] crate to manage its support for atomic types.

On architectures with missing atomics, bitvec’s default feature set will cause a compiler error when you attempt to instantiate a bitvec structure with the register that is missing an atomic variant. You can fix this by using a narrower register that does have atomic instructions, or by disabling default-features and not enabling the "atomic" feature.

Associated Types

The Mem associated type names the scalar integer corresponding to the BitStore type. u8, Cell<u8>, and AtomicU8 all implement BitStore with their Mem type assigned as u8; the same is true for the wider registers.

This type is used to create selection masks in the processor and permit access to unaliased memory.

The Access associated type names the type used to implement memory access. The BitAccess trait is an internal bridge to Radium that allows a consistent memory API, regardless of instructions used. All reads from and writes to memory route through this association and trait.

Lastly, the Alias and Unalias associated types enable bitvec to gracefully and correctly handle events that cause multiple handles to alias the same memory address. This association is used in .split_at_mut() to select the alias-aware type used for all subsequent accesses.

The Mem and Alias types are exposed in public APIs according to local alias information. The Access type is never publicly exposed, and only used for code generation.

Practical Use

That’s enough theory; let’s talk about how to use the crate. This chapter is divided into two subsections, one for use cases that want an ordinary bool collection library with transparent memory compaction, and one for use cases that want a convenient way to precisely sculpt memory. Before getting in to either, let’s quickly recap how the bitvec types interact with memory ownership.

Rustic Memory Management

The first and most important thing to remember is that, despite the extra complexity just discussed about memory parameters and aliasing behavior, bitvec is just Rust. It obeys all of the rules and patterns that the rest of Rust does.

BitSlices, like regular slices, are exclusively borrowed from owned memory higher up the call stack. Their references can be passed down the stack, and are subject to the same lifetime and mutation-exclusivity rules.

The owned memory that creates a BitSlice view can be an array, boxed slice, or vector of either ordinary integers, or their wrapper equivalents provided by bitvec:

use bitvec::prelude::*;

let array = [0u8; 8];
let boxed: Box<[u16]> = Box::new([0u16; 4]);
let vec = vec![0u32; 2];

let bits_a = bitarr![u8, Msb0; 0; 64];
let bits_b = bitbox![u16, Lsb0; 0; 64];
let bits_v = bitvec![u32, LocalBits; 0; 32];

Once memory is bound, it can be borrowed as a BitSlice by using the BitView trait (imported in the prelude), or by using the fact that all bitvec containers borrow themselves as BitSlices just like standard-library containers borrow themselves as slices:

use bitvec::prelude::*;
let array = [0u8; 8];
let boxed: Box<[u16]> = Box::new([0u16; 4]);
let vec = vec![0u32; 2];
let bits_a = bitarr![u8, Msb0; 0; 64];
let bits_b = bitbox![u16, Lsb0; 0; 64];
let bits_v = bitvec![u32, LocalBits; 0; 32];
let array_bits = array.view_bits::<Msb0>();
let boxed_bits = boxed.view_bits::<Lsb0>();
let vec_bits = vec.view_bits::<LocalBits>();

let bits_a_ref: &BitSlice<_, _> = &bits_a;
let bits_b_ref: &BitSlice<_, _> = &bits_b;
let bits_c_ref: &BitSlice<_, _> = &bits_c;

And, of course, mutability applies:

let mut arr = bitarr![0; 10];
let arr_ref: &mut BitSlice = &mut arr;
arr_ref.set(1, true);
assert!(arr_ref[1]);

Just as with ordinary Rust code, BitSlice is the type you want to use when working with memory that you are not responsible for moving around or destroying. However, when you do need to move memory around, you need to switch to either a static binding or a container: BitArray is always available, and BitBox and BitVec require an allocator.

I am spending this much time discussing the Rust memory management system because this is a very common question I receive. No other bit-stream crate has reference types, and therefore users coming from, e.g., bit-vec see the same BitSlice name and attempt to use their habits from that crate.

bitvec is not like any other bitstream library, in Rust, C++, or another language. bitvec is like ordinary Rust. I promise.

Bit Collections

As discussed in the Type Parameters chapter, you should use usize as the BitStore parameter for optimal performance in the generated program.

Once you have created some memory that you can view as individual bits, it is time to actually use it. Here is the one-sentence summary of what bitvec can do:

Every stable API present in the standard library is replicated in bitvec, except for BitSlice<T, O> as IndexMut<usize>, because bitvec cannot produce &mut bool.

If you were using ordinary collections to manage sequences of bools, then every part of your code will continue to work on bitvec types except for the array assignment slice[index] = value;. Until and unless the IndexMut trait is reshaped, you will need to use one of these two alternatives: slice.set(index, value); or *slice.get_mut(index).unwrap() = value;

Subslicing works:

use bitvec::prelude::*;

let bits = bits![0, 0, 0, 0, 1, 1, 1, 1];
assert!(bits[.. 4].not_any());
assert!(bits[4 ..].all());

Incremental munching works:

use bitvec::prelude::*;

let mut bits = bits![0, 0, 1, 1, 1, 0, 0, 0];
//  ^^^ modify the slice handle, not the slice contents

while let Some((&false, rest)) = bits.split_first() {
  bits = rest;
}
assert_eq!(bits, bits![1, 1, 1, 0, 0, 0]);

while let Some((&false, rest)) = bits.split_last() {
  bits = rest;
}
assert_eq!(bits, bits![1; 3]);

Mutation works:

use bitvec::prelude::*;
use std::{iter, thread};

let bits: &'static mut BitSlice = bits![mut 0; 8];

{
  let (left, right) = bits.split_at_mut(4);

  //  Pretend that better work is happening here
  let a = thread::spawn(|| left |= iter::repeat(true));
  let b = thread::spawn(|| right ^= iter::repeat(true));

  a.join().unwrap();
  b.join().unwrap();
}

assert_eq!(bits, bits![1; 8]);

Everything you can do with a slice, an array, or a vector of bits, you can do with bitvec’s equivalent types. Except for IndexMut<usize>. The only change from the standard library types is that you are now guaranteed to use one bit of storage for each bit of information, rather than eight bits of storage per bit.

Author’s note: Other than bragging about `bitvec`’s API fidelity, I don’t think
this section is very useful or educational. If you want to read more about how
to use `bitvec` for `usize => bool` collections, please let me know and I will
expound!

Bitfields

bitvec’s more technically-interesting capability is that it provides load/store memory access behaviors that allow you to write values into, and read them back out of, any BitSlice in memory rather than being constrained to well-formed references to well-typed memory.

This is useful for the de/construction of packed memory buffers, such as transporting data through I/O protocols.

**AUTHOR’S NOTE**: If you are using `bitvec` to do **anything** related to the
underlying memory representation of a bit-buffer, you **must** read this
chapter, and **all** of the API docs of [`bitvec::field`] and its contents,
**in their entirety**.

I have written extensively, and yet still insufficiently, about the intricacies
involved in operating the `BitField` trait correctly. If you skim this
documentation, you *will* have unexpected behavior, you *will* get frustrated
with me for writing a bad library, you *will* file an issue about it, and I will
*probably* tell you that the behavior is correct and that I already addressed it
in the documentation.

It took me a long time to think about and a long time to write. It should take
you *also* a long time to read and a long time to think about.

All of this behavior is contained in the BitField trait. Let’s explore that:

//  bitvec::field

pub trait BitField {
  fn load<M>(&self) -> M;
  fn store<M>(&mut self, value: M);
}

impl<T> BitField for BitSlice<T, Lsb0> {
  fn load<M>(&self) -> M { /* snip */ }
  fn store<M>(&mut self, value: M) { /* snip */ }
}

impl<T> BitField for BitSlice<T, Msb0> {
  fn load<M>(&self) -> M { /* snip */ }
  fn store<M>(&mut self, value: M) { /* snip */ }
}

The actual trait is more complex than this, and will be visited later. The important part, right now, is that BitField allows you to load values out of BitSlices and store values into them. Furthermore, it is implemented specifically on BitSlices that use the bit orderings provided by bitvec, and is not generic over all orderings.

While bitvec could in theory provide a default implementation for all <O: BitOrder>, this would by necessity have the most pessimal possible performance, and the lack of specialization for overlapping trait implementations means that faster performance can never be written.

The downside of the two specific implementations is that Rust coherence rules forbid implementation of a bitvec trait, on a bitvec type, parameterized with a local, but non-bitvec, implementor of BitOrder. On the off chance that you find yourself writing a new BitOrder implementor, file an issue.

The M type parameter on the load and store methods is bounded by funty’s Integral, trait. It can store any unsigned or signed integer at any partial width. This parameterization allows you to combine any integer type for transfer with any integer type for storage, rather than being restricted to only transferring T data into and out of a BitSlice<T, _>.

Unfortunately, adding a second integer type parameter is not the only complication to the BitStore memory model. There is also a second dimension of segment ordering. bitvec tries to make explicitly clear that the Lsb0 and Msb0 types refer only to the ordering of bits within registers, and not to the ordering of bytes within registers. However, when the integer bit-sequence being stored or stored does not fit within one register of the storage BitSlice, it must be split into multiple segments, and those segments must somehow be ordered in memory.

Segment Orderings

Author’s Note: **READ THIS**. I have received *several* issues about this exact
concept. *It is not obvious*.

There are two segment orderings: little-endian and big-endian. You may select the segment endianness you prefer by using the _le or _be suffix, respectively, on the .load() and .store() methods. The unsuffixed method is an alias for the endianness of your processor: _be on big-endian targets, and _le on little-endian. This is a convenience only. If you are writing I/O buffers, you should really use the explicitly-named methods.

Let us imagine a BitSlice<u8, Lsb0> used to store a u16 that is misaligned, and thus stored in three successive bytes. This algorithm is true for all circumstances where the stored region occupies more than one register of the backing slice, but smaller examples are simpler to draw.

This diagram uses 0 to refer to the least significant bit, and 7 to refer to the most significant bit. The first row shows bytes of memory, the second row shows the bit indices in memory used by .store_le(), and the third row shows the bit indices in memory used by .store_be().

[ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ]
  3 2 1 0             b a 9 8 7 6 5 4             f e d c
  f e d c             b a 9 8 7 6 5 4             3 2 1 0

.store_le() places the least significant segment in the low address, while .store_be() places the most significant segment in the low address. The ordering of bits within a segment is always preserved, no matter which ordering parameter is used by the BitSlice.

Here is the same example, but using the Msb0 bit ordering instead. Again, the second row uses .store_le(), and the third row uses .store_be().

[ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ]
          3 2 1 0     b a 9 8 7 6 5 4     f e d c
          f e d c     b a 9 8 7 6 5 4     3 2 1 0

The only change is in how the segments are placed into memory. The ordering of bits within a segment never changes, and is always the processor’s significance order as implemented in hardware.

How to Use BitField

You will probably find real use of the BitField trait more educational than the previous section. It has a very straightforward API, that you can combine with println!-debugging or your favorite means of viewing memory in order to observe its actions.

Step one: create any BitSlice-capable buffer. This can be any of the Rust-native sequence types, or any of the bitvec types.

use bitvec::prelude::*;

let mut data = [0u8; 4];
let bits = data.view_bits_mut::<Msb0>();

Then, narrow the BitSlice to be the region you want to access as storage. It must be no wider than the integer type you are transferring: BitSlices whose length is outside the domain 1 ..= M::BITS will panic during .load() or .store(). The easiest way to narrow a BitSlice (or buffer type that dereferences to it) is by using range indexing, [start .. end].

use bitvec::prelude::*;
let bits = bits![mut u8, Msb0; 0; 32];
bits[10 ..][.. 13].store_be::<u16>(0x765);
assert_eq!(bits[10 .. 23].load_be::<u16>(), 0x765);

bits[10 .. 23].store_le::<u16>(0x432);
assert_eq!(bits[10 .. 23].load_le::<u16>(), 0x432);

That’s the entire API. .store() truncates the stored value to the width of the receiving BitSlice, and .load() sign-extends the loaded value to the width of the destination register type.

Storing signed integers can be surprisingly fraught: `bitvec` **will not**
attempt to detect and preserve the most-significant sign bit when truncating! If
you store the number `-12i8` (`0b1111_0100`) in a 4-bit slot, it will be stored
as `4i8` and reloaded as such! Similarly, storing `12i8` (`0b0000_1100)` in a
4-bit slot will produce a load of `-4i8` (`0b1111_1100`).

Signed integers do not behave like unsigned integers. You are wholly responsible
for ensuring that you remember that allowing negative numbers halves the
magnitude: 4 bits unsigned is `0 .. 16`, but 4 bits signed is `-8 .. 8`.

`bitvec` **only stores bit patterns**. Retaining numerical intelligibility is
**your** responsibility.

You can see an example that uses the BitField trait to implement an I/O protocol in the examples/ipv4.rs program in the repository. Use cargo run --example ipv4 to see it in action.

Memory Representation

As discussed in the Type Parameters chapter, bitvec allows users to select the specific ordering of bits within a memory element when constructing a type. This has consequences for how source code translates to an in-memory representation.

To review: bitvec provides two orderings of bits within a single memory element (Lsb0 and Msb0) and three or four types of memory elements (u8, u16, u32, and only on systems where it is 8-byte-aligned, u64). The usize type is also supported, but it is not portable, and behaves exactly as the named register of its width.

The `Cell` and atomic integer variants are not interesting here, as they only
affect how the memory bus operates, not the processor register.

Let us now examine how each possible combination of register width, bit-ordering, and processor byte endianness affects the placement of bits in memory.

Memory Layout

The BitOrder and BitStore traits combine with your target architecture’s byte-ordering of register elements to create a matrix of memory traversals. This matrix informs what is the appropriate choice of parameters to use for your program whenever you are using bitvec for precise memory control rather than solely for a compact usize-to-bool data collection.

The tables below list bytes of a memory address space with addresses increasing to the right, and the bits within those bytes with numeric significance decreasing to the right. This is the ordering used in most debug-printing of memory, so hopefully the table contents should match up with your prior experience viewing memory bytes.

The L and M indicate the Lsb0 and Msb0 ordering parameters, respectively, and xx indicates that the row matches all register types. Within each row, traversal begins at zero and follows the arrows to each successive step. Boundaries between registers are marked with a column; boundaries between bytes within the same register are marked with a space.

Little-Endian Byte-Ordered Machines

On little-endian machines, the least-significant byte of a register type is stored at the lowest memory address, and each byte-address higher is one step more numerically significant than the last.

byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
bit  ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
 Lxx ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
─────╫─────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────
 M8  ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F
 M16 ║ 2 ---> 3 0 ---> 1│6 ---> 7 4 ---> 5│A ---> B 8 ---> 9│E ---> F C ---> D
 M32 ║ 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1│E ---> F C ---> D A ---> B 8 ---> 9
 M64 ║ E ---> F C ---> D A ---> B 8 ---> 9 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1

Big-Endian Byte-Ordered Machines

On big-endian machines, the most-significant byte of a register type is stored at the lowest memory address, and each byte-address higher is one step less numerically significant than the last.

byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
bit  ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
 L8  ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
 L16 ║ 3 <--- 2 1 <--- 0│7 <--- 6 5 <--- 4│B <--- A 9 <--- 8│F <--- E D <--- C
 L32 ║ 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0│F <--- E D <--- C B <--- A 9 <--- 8
 L64 ║ F <--- E D <--- C B <--- A 9 <--- 8 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0
─────╫─────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────
 Mxx ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F

If you need to care about the memory representation, then you most likely want to use the <u8, Msb0> pair. This provides a consistent ordering on all machines, and the numeric value of the underlying memory will probably match your expectations about the semantic contents of a data structure.

This chapter, and the BitField trait, are the two most common sources of questions about how bitvec operates. Their intersection is even more complex, and the layout of numeric integers stored into a BitSlice is an extremely common point of confusion.

Read these chapters and the API documentation thoroughly, and experiment with placing data into memory and changing the type parameters to observe their effects on buffer representation.

bitvec Memory Model

bitvec addresses individual bits, while computer hardware addresses register elements. As a result, bitvec has a more complex memory model than the Rust language does. The library implementation strives to satisfy users’ expectations, the Rust language’s rules, and performance in the produced artifact to the best solution for all parties, with as few compromises as possible. Unfortunately, this has the side effect of increasing the complexity of the codebase, both for maintainers and for readers.

This chapter explains the abstract concepts of the bitvec memory model and the means by which it is encoded in the Rust language without running afoul of Rust or LLVM rules that threaten undefined behavior.

The bitvec memory model is typed entirely within the store module’s BitStore trait definition and implementations. It utilizes the access module’s BitAccess trait to mediate memory access events through types that are known to be correct in the Rust and LLVM models, so that at any point in program execution, memory access will be consistent and sound.

In addition, the domain module’s types provide views which manipulate the store model to maximize performance. This chapter will discuss primarily store, and use domain only to provide examples of how store is used in practice to accomplish the library’s implementation.

Aliasing

To Rust and LLVM, “aliasing” occurs whenever there are two paths to a memory region, and at least one of them has write privileges. In Rust, this is represented by the &mut reference exclusion rule: it is always, always, always Undefined Behavior in the Rust memory model to have two references which can reach a memory element if at least one of them is marked &mut.

LLVM, which was created for, is written in, and culturally shaped by C++, takes a similar view with its noalias annotation, but struggles to enforce it as thoroughly as Rust does.

bitvec takes a similar view of the abstract meaning of the &mut reference type, but where the Rust memory model focuses on whole units T, and has no concept of subdivision from T into the bits that compose it, bitvec views each individual bit as a standalone, independent, atom of memory. It excludes the creation of two &mut BitSlice reference handles that are capable of viewing the same bit, but will happily produce two &mut BitSlice handles which are mutually-exclusive in bits, but reference the same location in memory.

Think of it like disjoint `&mut u64` references whose referent words are in the
same cacheline. The software model doesn’t care; making this work is the
hardware’s problem. Similarly, users of the `bitvec` library don’t need to care
about the fact that disjoint `&mut BitSlice` references might refer to the same
element.

Here we come to the first problem with the conflicting memory models: bitvec cannot ever create an &mut T through which it may write to memory, because it has no way of ensuring that no other &T or &mut T reference exists which is capable of viewing the memory region into which it writes.

Rust Shared Mutation

The problem isn’t just that the Rust standard library doesn’t offer any non-unsafe APIs to produce such references. The problem is that this is about the most illegal thing you can do in the eyes of the Rust compiler, and if it ever detects this transgression, it has full liberty to either reject, or worse, accept and miscompile, your program.

In Gankra’s immortal words on the topic of attempting to sneak past the compiler’s commandments,

  • Transmuting an & to &mut is always UB
  • No you can’t do it
  • No you’re not special

The solution is simple: Rust exposes a type, UnsafeCell, which is the axiomatic type for mutation through shared views. In the Rust memory model, any reference to a region of memory marked with UnsafeCell is permitted to write to it, and if any other references can view that region, then it is up to them to ensure consistent behavior.

This is, of course, not quite true in LLVM’s memory model, but we are not there yet.

Since an &mut BitSlice<T, _> handle cannot produce an &mut T reference to perform writes to memory, it must instead either use a *mut T bare pointer, which has absolutely no checks or optimizations whatsoëver, or use an &UnsafeCell<T> shared reference, which has all the usual guarantees present on all1 reference types.

All UnsafeCell does is instruct the Rust compiler to politely look the other way about your program’s memory accesses. It is somewhat like the C keyword volatile, in that the compiler no longer believes that reads are stateless, or freely reörderable, but entirely unlike that keyword in that the compiler doesn’t have any obligation to keep your reads from or writes to such regions.

Rust provides an additional type called Cell. This is a wrapper over UnsafeCell2 that defines a more useful API, including the only safe and guaranteed way to write into memory through a shared reference: Cell::set.

And voilà: &mut BitSlice<T, _> simply constructs &Cell<T> when writing, the Rust compiler does not see a violation of the &mut exclusion principle, and we are done.

LLVM Shared Mutation

No we’re not. If it was that easy, there wouldn’t be a trait system or this document dedicated specifically to dealing with this problem.

Cell is not thread-safe, because Cell does not modify the instructions used to access memory. It produces ordinary load and store instructions, carefree and ignorant of the bane of everyone who wants consistency and the delight of everyone who wants performance: concurrency.

Just as it is undefined behavior in Rust to manifest two &mut references that can view the same location, it is equally undefined behavior in LLVM to manifest two pointers into memory that ever, at all, no matter what, perform any memory access to the same location, at the same time, on multiple threads of execution.

As with above:

  • Unsynchronized writes are always UB.
  • No you can’t do it.
  • No you’re not special.

LLVM has an even more insidious punishment for this transgression that Rust does not directly express: unsynchronized reads from a data race produce poison. Poison is a nifty concept, because it’s not illegal to obtain one. When LLVM gives you a poison, your program continues undergoing compilation as if nothing had happened. You can pass it around. You can write to it, and if you destroy it before reading, you’re fine.

As soon as you attempt to read the bit-wise value of poison, your program is undefined3.

So if bitvec wants to be threadsafe, which it does, and it wants to insist on its ability to safely alias the same memory location from multiple handles, which is non-negotiable, there’s only one avenue left to take.

Atomic Powered Microscopes

Rust doesn’t actually model atomics. It doesn’t have to do so: no harm can ever come from multiple handles reading out of the same immutable location, harm can only occur when writes are observable, and writes are not observable due to the &mut exclusion rule. Well, except for UnsafeCell, so everything that has an UnsafeCell in it gets marked as !Send, & references can’t cross threads, and the whole problem is avoided.

This is, of course, not good enough. Concurrent, mutable, access to a location is an important property in a computer. LLVM provides atomic types, which Rust transparently exports as wrappers over UnsafeCell that have their Sync implementation restored. Handles to a region marked as atomic will use some form of hardware-provided exclusion in order to preserve the one-writer-XOR-any-readers system behavior, and all is well.

Hardware-level exclusion has the unfortunate penalty of being, to put it lightly, “slow”. So while it’s the safest choice to be correct, and was in fact the default universal choice for all memory access in bitvec for some time, its costs4 called for a more prudent behavior.

It’s time for a new trick, something that the Rust compiler does at the region level, and that bitvec now does at the element level:

Run Time Alias Analysis

BitSlice handles can only be constructed from ordinary Rust references to memory, which Rust guarantees start out unaliased. The potential for aliasing only occurs when a &mut BitSlice is split into multiple subslices using any of the functions that eventually call .split_at_unchecked_mut(). Since that is the root function that introduces alias conditions, it returns subslices whose memory type parameters are tainted with the ::Alias marker. It has the following type signature:

impl<T, O> BitSlice<T, O>
where O: BitOrder, T: BitStore {
  pub fn split_at_unchecked_mut(&mut self, at: usize) -> (
    &mut BitSlice<T::Alias, O>,
    &mut BitSlice<T::Alias, O>,
  );
}

The BitStore trait defines an ::Alias associated type, which ensures that all memory accesses through it have appropriate aliasing guards. For builds which do not use the atomic feature (or where the target does not have an atomic variant of the integer), this is Cell, and its protections are the loss of thread-safety. For builds that do permit atomics, the marker enforces that all reads and writes use atomic instructions.

The ::Alias marker is applied, at compile time, by operations that split &mut BitSlice references into multiple coëxisting subslices. This is a good first step to reducing unnecessary synchrony, but not good enough. Consider the following:

use bitvec::prelude::*;

let mut data = [0u8; 2];
let bits = data.view_bits_mut::<Lsb0>();
let (one, two) = data.split_at_mut(8);

It so happens that this is equivalent to splitting data first, then viewing each subslice as bits, but this transformation can only be done if the partition point is known at compile-time to fall on an element boundary. There is no need for the subslices one and two to use alias-safe operations, as accesses to memory through them do not conflict with each other.

Bit Slice Domains

The in-memory domain of any bit slice can be generalized to one of two formats: either the slice touches zero edge-bits (0 or T::BITS - 1), or it touches at least one edge-bit in at least one element. Consider three bytes of memory (any element will do, but the extra width on this page is unnecessary), with some bitslice regions drawn within them:

|00000000│11111111│22222222| Element
|76543210│76543210│76543210│ Bit
├────────┼────────┼────────┤
│        ┆        ┆        │ 1
│        ┆xxxxx   ┆        │ 2
│        ┆ xxxxx  ┆        │ 3
│        ┆   xxxxx┆        │ 4
│  xxxxxx┆xxxx    ┆        │ 5
│    xxxx┆xxxxxxxx┆        │ 6
│        ┆xxxxxxxx┆xxxx    │ 7
│      xx┆xxxxxxxx┆xx      │ 8
│xxxxxxxx┆xxxxxxxx┆xxxxxxxx│ 9

There are nine example slices here, but they can be reduced into six specific categories, and two general ones:

  1. empty: row 1
  2. minor (interior of an element, no edge indices): row 3
  3. partially-spanning head, fully-spanning body: rows 3 and 6
  4. partially-spanning tail, fully-spanning body: rows 2 and 7
  5. major (partial head, partial tail, full body): rows 5 and 8
  6. spanning: row 9

The minor slice (row 3) is irreducible; the rest can all be divided into three subcomponents:

  • zero or one partially-occupied head element, where the slice touches the last index in it but not the first
  • zero or more fully-occupied middle elements, where the slice touches all indices in each
  • zero or one partially-occupied tail element, where the slice touches the first index in it but not the last

We can break each row down into these components:

RowHeadBodyTail
1None0None
2None0Some
4Some0None
5Some0Some
6Some1None
7None1Some
8Some1Some
9None3None

We can observe that where a slice fully-spans some elements, those elements cannot be mutated by any other reference. In the &BitSlice case, all &muts are forbidden by the compiler’s ordinary rules; in the &mut BitSlice case, bitvec’s obedience to those same rules forbids any other handle from observing the bits covered by the &mut BitSlice. As such, it is statically impossible for any alias to exist to the described memory in row 9, or for any alias to observe element 1 of rows 6 through 8.

bitvec happily permits element-aliasing &mut BitSlice references to observe the partially-filled elements in the outer columns, and middle column of rows 2 through 5, so writes to them must remain synchronized through either single-threaded Cell or concurrency-safe atomics. These domain components can be calculated from the three components of a slice pointer: the base address, the starting bit index, and the bit count.

This is expressed in the domain module’s two enums.

BitDomain

This enum splits any BitSlice region into its subcomponents, immutably or mutably, respectively. It has the (rough) definition

pub enum BitDomain<'a, M, T, O>
where
  M: Mutability,
  T: BitStore,
  O: BitOrder,
{
  Enclave(Reference<'a, M, BitSlice<T, O>>),
  Region {
    head: Reference<'a, M, BitSlice<T, O>>,
    body: Reference<'a, M, BitSlice<T::Unalias, O>>,
    tail: Reference<'a, M, BitSlice<T, O>>,
  },
}

and, rather than granting direct memory access, merely removes any aliasing markers from as much memory as possible. The subslices that partially fill their base element do not need to add an additional aliasing marker, as the marker is only required when writes to the element may collide. If the slice is immutable, aliasing never occurs, so synchrony is never required. If the slice is mutable, then the only way to get a partial edge slice is to either forget about some bits from the main slice, which is not an alias event, or to split the slice, which is, and splitting already marks the alias.

Domain

The bit domains are still bit slices, and do not offer a way to access the backing memory. For operations where raw memory access is required, this enum produces the same domain definitions, but typed for the bare elements rather than their bits.

It has the (rough) definition

pub enum Domain<'a, M, T>
where
  M: Mutability,
  T: BitStore,
{
  Enclave(PartialElement<T>),
  Region {
    head: Option<PartialElement<T>>,
    body: Reference<'a, M, [T::Unalias]>,
    tail: Option<PartialElement<T>>,
  },
}

(The PartialElement type is a guarded reference which prevents accessing bits that do not belong to the originating BitSlice.)

As with the bit domains, these domains will inherit any aliasing markers from their source bitslice. The ::Alias associated type enables the mutable domain to produce references that allow mutation without adding an unnecessary aliasing marker. Rust strongly forbids the production of &mut references to aliased memory elements, which is why the only &mut reference in these views is to memory that is fully known to be unaliased.

The Domain structure will produce bare atomic or Cell types in the alias condition. This is necessary in order to avoid production of &mut references which alias (as this is undefined in the Rust abstract machine, regardless of behavior), and safe because any other references to the same location will be similarly aliased and capable of handling external mutation.

LLVM Suboptimizations

LLVM considers a “racing read”, that is, any read from memory that could occur contemporaneously with an atomic write, to be undefined behavior. This is a reasonable view to take, but a pessimistic one. bitvec has information that cannot be expressed to LLVM about which bits of an element it will observe or modify, and bitvec is capable of guaranteeing that two distinct access points will not be able to interfere with each other electrically. LLVM does not know this, so it considers any write to memory to touch all bits of the touched element, and any read from memory to view all bits of the fetched element.

bitvec exclusively5 writes to contended memory with the Rust functions AtomicT::fetch_and and AtomicT::fetch_or, which are mapped to the LLVM instructions __atomic_fetch_and_N and __atomic_fetch_or_N. It uses bitmasks that bitvec can guarantee are non-intersecting, but this proof cannot be extended to even the Rust compiler, let alone LLVM. These bitmasks are applied to register values before using either of the fetch_op instructions, and after any reads that use AtomicT::load/__atomic_load_N.

If the bitvec mask proofs were extendible to LLVM, and LLVM were to expand its tracking of which bits of a memory address became poisoned by a memory write, and which bits of a fetched memory value became un-poisoned by a masking operation, then the compiler would be more able to observe that bitvec memory accesses do not observably interfere with each other. This observation would then define the behavior in the compiler’s memory model of racing writes/reads, and permit an increased (possibly even complete) removal of synchrony guards.

I am not aware of any processor hardware which fails to guarantee that all bits
of memory are fully defined at the clock edges of all instructions that use the
location. To the full extent my knowledge, all memory banks in all relevant
processors have a stable bit-value at the start of a tick, when reads occur, and
at the end of a tick, when writes commit. At no point does changing the value of
one bit of a memory component affect the electrical value of other bits in the
component.

This is not necessarily true of other storage devices, such as SSDs, but
`bitvec` can only be used to access storage cells mapped in the RAM address
space, which tend to all have this stability property.

Summary

The formal definition of the bitvec memory model extends the Rust mutable-exclusion rules by refining memory regions to have bit-precision instead of element-precision. The Rust compiler is only weakly capable of tracking the region status of individual bits, and only in compiler-internal structures. LLVM has a more robust arbitrary-bit-tracking capability, but similarly limits its interface to external code.

Barring any errors in the bitvec implementation, the bitvec memory model is fully sound in its behavior with regard to single-observer unsynchrony. Synchronization is only added in order to correctly interface with rustc and LLVM without causing either of them to introduce undefined behavior due to a lack of information.

Footnotes

1

all references to T where T is either !Sized, or mem::size_of::<T>() is non-zero, that is. A fun quirk of Rust’s first-class concept of zero-width types is that the only illegal value for a &Zst reference is null. Since there is nothing to load or store through a &Zst reference, the compiler doesn’t care what the reference value is, as it will never be used to perform memory access.

3

This is not absolutely true. Like we saw with UnsafeCell, the only immutable rule of compiler developers is that whenever they make an immutable rule, they also provide a way to sidestep it. If you freeze a poison, you are now free to read its value and act on it. LLVM just doesn’t make any guarantees about what bit-pattern you’ll see.

4

I’d feel a great deal more comfortable if I had firm knowledge of what those costs actually were. An atomic write always issues a lock instruction modifier on x86, and I have heard vastly different things about what that actually means, from “it’s free if no other cache holds that address” up to “it poisons the whole cacheline”, and have not had much luck producing a benchmark that firmly demonstrates that unneeded atomic access is a strict performance cost.

5

In multithreading environments. Disabling atomics also disables bitvec’s support for multithreading, so the penalty for aliasing is reduced to an inability to remove redundant reads.

Runtime Performance

bitvec increases the instruction cost of each access to a bool in its data structures. This is an inevitable consequence of the fact that, even on architectures that have them, compilers typically do not emit object code instructions that access individual bits within a memory location. Therefore, each access in bitvec has, in addition to a memory operation, one or two shift instructions one or two AND/OR/NOT instructions.

This means that, inevitably, bitvec is slower in CPU time than [bool] is. Measurements indicate roughly a factor of ten, but with also about 10x more variance. However, this cost is only apparent and meaningful when walking the entirety of very large buffers, and tends to fade into noise on smaller buffers, or be obviated by compile-time fixed accesses. As always, try it on a representative workload.

Benchmarking

I have tried (with admittedly low priority) to have some benchmarks in the project to back up my claims that bitvec is fast. This has been difficult to maintain for a few reasons, but I have at least a few that have stayed present in order to demonstrate important claims, such as showing that specialization for matching types does provide massive performance benefits (it does).

In particular, LLVM is very good at propagating compile-time constants through the code, and bitvec strives to maintain an internal implementation that is easily accessible to optimizers. This means that basically any benchmark that takes input from a source file that the compiler can see gets artificially solved during codegen.

For instance, I don’t know how long it takes to construct a &BitSlice view over memory, because my benchmarks report 0ns: LLVM computes my pointer encoding at compile time, and a consequence of the way I designed my pointer encoding is that the only operation BitSlice::from_slice actually performs is .len << 3. When LLVM can see the original length, it just does this itself, and emits an immediate with the correct length instead of the constructor call.

Other constructor benchmarks are only showing me the time required to run memcpy, and arbitrary indexing just shows the time required to run three instructions, because LLVM solved the shift/mask arguments ahead of time.

The important takeäway here is that if your code is at all dependent on constants that the compiler can see, and is not exclusively performing indexing based on runtime inputs, then bitvec is going to be plenty fast.

Pitfalls

Everything stated above relies on having information available to the compiler. bitvec falls behind other bit-reading libraries when you start using access patterns only known at runtime, such as iteration.

Until Rust stabilizes the ptr_metadata feature (see #81513), bitvec will necessarily take a performance hit because it has to decode the &BitSlice pointer every time you access memory, and reëncode it every time you munch through the region.

The bitvec pointer encoding (described here) requires manipulating both words in the pointer with at least three instructions each.

Except when using `u8` storage, which discards *most* of the modifications to
the address half of the pointer. Try that out if you do a lot more munching
through a region than bulk work on its contents!

This would not be necessary if the pointer could use its own metadata type rather than usize. Until that stabilizes, the entire value proposition of the crate rests on the fact that &BitSlice has slice-pointer ABI. If you absolutely cannot afford to have decoding/reëncoding costs in your hot loop, you may have to try other libraries.

bitvec strives to use batch operations on entire integer registers when possible. However, doing so requires routing through the domain module, which has to perform similar decoding and processing of a &BitSlice pointer every time it is entered. This is another unavoidable cost. It is generally a tolerable overhead compared to walking through each bit individually, especially with wider storage integers, but it is one more thing that other bit-addressing libraries don’t do.

Other libraries also do not have alias safety or tolerance for starting a region away from the zero-index in an integer. Power has a price. We do our best to cut down the library’s runtime cost as much as possible, but this computation simply has to be done somewhere, and it's not built in to silicon anymore. Sorry.

Specialization

bitvec strives to use its knowledge of the underlying memory representation wherever possible. This means that operations between BitSlices with the same type parameters can rely on an identical representation and use integer behavior rather than walking each bit individually.

Try to do this wherever possible, especially in performance-sensitive code. You typically should not be mixing bitvec structures with different type parameters anyway: use the representation that serves your needs, and keep using it in all buffers that interact with it.

As an example, as of 1.0, walking two large BitSlices with incompatible type parameters takes 3µs (microseconds), while walking the same sizes with identical type parameters takes 100ns (nanoseconds). It’s roughly a 32x performance difference, which is only half the speedup that I expected using usize on a 64-bit machine, but still quite stark.

Bit Slice Pointer Encoding

bitvec’s core value proposition rests on the fact that it is capable of defining an unsized slice type, and controlling references to it. The Rust language rests heavily on the two reference types & and &mut, and does not ordinarily allow these to be faked or created by anything other than the compiler.

Rust Reference Rules

It so happens that not only does Rust strongly guarantee the in-memory layout of a reference to a slice, it also provides a stable API for constructing values of &[T] type without using mem::transmute. Subject to certain value requirements imposed by types, slice references can be constructed through these functions and the compiler will accept them as valid.

These requirements traditionally make it difficult to encode non-address information into a bare reference, since the compiler has a very firm expectation that a reference to a type is immediately dereferenceäble to a value of that type, but if your type happens to be zero-sized, then it can never exist in memory, no loads or stores to it can ever be produced, and the compiler no longer concerns itself with the actual bit-pattern value of references to it.

Which is why the definition of BitSlice is

//  src/slice.rs

#[repr(transparent)]
pub struct BitSlice<T, O>
where
  T: BitStore,
  O: BitOrder,
{
  _mem: [()],
  _typ: PhantomData<T>,
  _ord: PhantomData<O>,
}

BitSlice is [()] (a slice of the unit value) with some markers that only the type-checker can see. &BitSlice is thus &[()], and &[()] can have any values it wants (except, of course, null) – the unit value has no alignment requirements, can be placed anywhere in memory without worrying about whether there is a backing allocation, and can have as many instances of itself as desired.

Zero-sized types are an absurdly powerful concept when working with memory that the language expects to be able to manifest at any time.

Pointer Encoding

Slice references contain two pieces of information: the address of the base element, and the number of elements, starting at the base, contained in the slice region. Theoretically, bit-slice references have the same pair of information: the address of the first bit, and the number of bits in the region.

However, computers are byte-addressed, not bit-addressed, so we need to store three more bits (to select a bit in the base byte) somewhere in the reference. Since slice references are defined as { base: *T, elts: usize }, and there are no1 spare bits in *const _, the bits to store the base bit are taken out of the length counter.

Reference address values are also required to be integer multiples of the alignment of the referent type T. This alignment is, on all supported targets, the width in bytes of the referent type. As a result, there are as many low bits in the address of any T that are guaranteed to be the 0 value, as there are bits needed to select a byte within the element. The end result is that the length counter must always use three bits to store the starting bit, and the base address will be composed of an aligned T address and an index of the starting byte within it.

As Rust does not have bitfield syntax, a definition of the pointer structure in C++ looks like something like this2:

template <typename T>
struct BitSpan {
  static_assert(
    std::is_unsigned<T>()
    && sizeof(T) <= sizeof(std::size_t)
    && sizeof(T) <= alignof(T)
  );

  // on little-endian systems, bitfields are
  // allocated from LSbit and move towards MSbit
  uintptr_t ptr_head : __builtin_ctzll(alignof(T));
  uintptr_t ptr_addr : (sizeof(uintptr_t) * 8)
                     - __builtin_ctzll(alignof(T));

  size_t len_head : 3;
  size_t len_bits : (sizeof(size_t) * 8) - 3;
};

In Rust, the structure is declared as

// src/pointer.rs

#[repr(C)]
pub struct BitSpan<T, O>
where
  T: BitStore,
  O: BitOrder,
{
  ptr: NonNull<u8>,
  len: usize,
  _ty: PhantomData<T>,
  _or: PhantomData<O>,
}

and the logical components must be accessed through get/set functions, rather than through compiler-generated field stubs.

By marking the pointer as NonNull, BitSpan declares that it will never be a null pointer and becomes subject to the same peephole optimization that allows mem::size_of::<Option<&T>>() == mem::size_of::<&T>(). By marking it as unconditionally a pointer to u8, we declare that all low bits of the address value are in use, and none can be used as slots for anything else (since our encoding is using them to select a byte within the T).

Significant Values

BitSpan<T, O> does not have any sentinel values of its own, but inherits from NonNull<T>. The completely zero value is not a valid member of the BitSpan type, but rather indicates Option::<BitSpan<_, _>>::None, and it uses the dangling NonNull pointer value to indicate an instantiated pointer object without an associated allocation.

Not all zero-length regions are dead: a cleared BitVec region has zero length but owns an allocation, so it cannot discard its address information.

Summary

Rust requires that slice references have a specific ABI, but makes no requirements about the encoding of values of those references for certain types. We can supply our own ABI-equivalent structure, define functions that use the structural encoding to compute the information needed to actually interact with memory, and convert our structures into Rust-accepted slices through the provided compiler API in core.

If and when the ptr_metadata feature stabilizes, bitvec will experiment with discarding this packed encoding in favor of a three-word pointer. If the unpacked pointer results in better performance by eliminating the need for the special encoding, bitvec will release a new minor version with the changed structure.

`bitvec`’s MSRV policy is that raising compiler requirements is a minor change,
not major, and the pointer ABI is **not public interface**! You are already
forbidden from moving bit-region pointers out of a program, so this change will
not affect your program’s behavior.

Footnotes

1

On AMD64, pointers are actually aggregates of MMU translation pages, and processors only decode the low 48 or 57 bits of them, leaving the high 16 or 7 bits available for other information not part of the memory addressing system. However, these processors also trap when attempting to dereference a pointer whose high [48:64] or [57:64] bits do not have the same bit value as bit [47] or [56], and that bit is typically used to differentiate unprivileged user memory from privileged kernel memory. Furthermore, this dead region does not exist on 32-bit architectures, x86 or otherwise, and since bitvec explicitly supports 32-bit systems, the use of dead bits only present on a subset of supported targets and subject to their own extra rules is not worthwhile.

2

Here is a full code listing which you can also view on Godbolt:

// compiles on x86-64 clang 15.0.0
#include <climits>
#include <cstddef>
#include <cstdint>
#include <type_traits>

static_assert(CHAR_BIT == 8, "this target is not supported");

template <typename T>
struct BitSpan {
  static_assert(
    std::is_unsigned<T>()
    && sizeof(T) <= sizeof(std::size_t)
    && sizeof(T) <= alignof(T),
    "this type is not supported as BitSpan storage"
  );

  uintptr_t ptr_head : __builtin_ctzll(alignof(T));
  uintptr_t ptr_addr : (sizeof(uintptr_t) * 8)
                    - __builtin_ctzll(alignof(T));

  size_t len_head : 3;
  size_t len_bits : (sizeof(size_t) * 8) - 3;
};

template <>
struct BitSpan<uint8_t> {
  // ptr_head is zero bits wide when targeting bytes
  uintptr_t ptr_addr;
  size_t len_head : 3;
  size_t len_bits : (sizeof(size_t) * 8) - 3;
};

static uint64_t data[4];

BitSpan<uint8_t> one() {
  return {
    .ptr_addr = (uintptr_t)&data[0],
    .len_head = 1,
    .len_bits = 6,
  };
}

BitSpan<uint16_t> two() {
  return {
    .ptr_head = 1,
    .ptr_addr = (uintptr_t)&data[1],
    .len_head = 1,
    .len_bits = 5,
  };
}
BitSpan<uint32_t> four() {
  return {
    .ptr_head = 2,
    .ptr_addr = (uintptr_t)&data[2],
    .len_head = 3,
    .len_bits = 10,
  };
}
BitSpan<uint64_t> eight() {
  return {
    .ptr_head = 4,
    .ptr_addr = (uintptr_t)&data[3],
    .len_head = 5,
    .len_bits = 25,
  };
}

Miscellaneous Implementation Details

bitvec has a number of internal implementation details used to facilitate development but are not part of the public API. While not user-facing, these details are nevertheless important to document as explanations for how the library is built.

Integer Refinement

bitvec offloads abstraction over the fundamental integers to funty, described earlier in this guide.

funty only unifies the standard-library APIs of the integers into trait-based code; bitvec further extends this with useful constants in the BitRegister trait. This trait delimits the integers that correspond to machine registers actually available for use as storage, and provides minor conveniences for working with them.

Specialization Hacks

bitvec is built to be agnostic to the ordering of bits within an element. This is an important aspect of its design, and even if no other ordering than the provided Lsb0 and Msb0 are used, must remain so that these two orderings can be used on equal footing. However, the deferral of register operations to type parameters outside of the data structures’ control results in some unpleasant performance losses that would not occur in a hand-written equivalent.

Some operations, like copying between, or comparing, slices, can be accelerated with partial-element access, but require knowledge of the O ordering type to provide a semantic interpretation of register contents. Language-level specialization could allow writing override impl blocks, like this:

impl<T, O> BitSlice<T, O>
where
  T: BitStore,
  O: BitOrder,
{
  fn eq(&self, other: &Self) -> bool {
    todo!("baseline")
  }
}

impl<T> BitSlice<T, Lsb0>
where T: BitStore {
  fn eq(&self, other: &Self) -> bool {
    todo!("lsb0-accelerated version")
  }
}

impl<T> BitSlice<T, Msb0>
where T: BitStore {
  fn eq(&self, other: &Self) -> bool {
    todo!("msb0-accelerated version")
  }
}

While waiting on this feature, we can use the compiler’s stable TypeId API to simulate access to specialization. By comparing the TypeId of the O type argument to the TypeIds of bitvec’s provided orderings, functions can detect when they are in a monomorphization with an Lsb0 or Msb0 ordering argument and specialize accordingly. The above block can be replaced with:

impl<T, O> BitSlice<T, O>
where
  T: BitStore,
  O: BitOrder,
{
  fn eq(&self, other: &Self) -> bool {
    if let (Some(this), Some(that)) = (
      self.coerce::<T, Lsb0>(),
      other.coerce::<T, Lsb0>(),
    ) {
      todo!("lsb0-accelerated version")
    }
    else if let (Some(this), Some(that)) = (
      self.coerce::<T, Msb0>(),
      other.coerce::<T, Msb0>(),
    ) {
      todo!("msb0-accelerated version")
    }
    else {
      todo!("baseline")
    }
  }
}

and, during monomorphization, only one branch of the if stack will be preserved. The .coerce() method is defined in slice::specialization and provides easy access to a fully-typed value only within the monomorphization that matches it. It is not public API – your code using bitvec should prefer being fully typed rather than generic.

Afterword

Thanks for reading! We hope this guide was useful to you.

Please feel both welcomed and encouraged to ask us any clarifying questions (contact information for myrrlyn is in the CONTRIBUTING document), or to file an issue if any of it is confusing or even send a PR if you have improvements!

Ferrilab has been a work of intensely deep focus and research for over four years. It is a complex and difficult topic, and we are absolutely certain that we have information and context that we haven’t sufficiently serialized for everyone else to see.

Thank you for using our crates. We are delighted that other people have found our work both interesting and useful, and we’re glad that we’ve been able to help people like you write better programs.