Dedication
I began working on bitvec
shortly before I was told that my father had been
diagnosed with cancer for the third time. Developing the library gave me
something into which to sink my attention and keep my mind from dwelling on his
rapidly-progressing illness. I wrote the core pointer representation that
enables bitvec
’s modern behavior at his side, in the last days that he
remained conscious.
I never had the chance to explain to my dad what I was building. By the time it
was developed enough to be worth explaining, he had lost too much of his brain
to understand me. More than anything I’ve done for my employers, bitvec
is the
work of which I’m most proud, and which I most wish he could have seen.
The Ferrilab project, and in particular bitvec
, is dedicated to the memory of
my father, George M. Payne, who inspired me to build quality work, supported my
search for my talents, and taught me to labor for the benefit of others beyond
myself.
—Alexander Payne (myrrlyn)
Introduction to Ferrilab
Ferrilab is an umbrella project for crates that experiment with reshaping the
Rust data model about primitive types. It began with the bitvec
crate; as
bitvec
expanded, it eventually spun off its integer-generalization into
funty
, and radium
was created independently but swiftly brought into the
fold.
Ferrilab is currently focused on just these three crates, but may expand in the future as we discover new ideas to try out.
Behind the Name
The primary maintainer, myrrlyn, is from the Great Lakes region of the Americas and the crates here all reshape the fundamental types in the Rust language. We looked for names that had to do with early modern physics, and settled on Enrico Fermi, as he worked in atomic physics, has an eponymous laboratory near Chicago, and Fermilab was a single edit step away from Rust’s mascot.
Plus, bitvec
began while myrrlyn was working for the US Government in
New Mexico…
Radium
The radium
crate provides a unifying model for shared-mutability over the
primitives. It does not handle more complex shared-mutability topics such as
mutices or locks: if you need to manage large structured data, you will need to
look elsewhere.
Radium
Trait
This trait allows your code to generically accept either an atomic type or a
Cell
and interact with it through a unified API. It is implemented by all of
the standard library atomics, Cell<{bool,{i,u}{8,16,32,64,128,size},*mut T}>
,
and the type families that Radium provides (described below).
The primitive type that a Radium
implementor encloses is indicated by the
Radium::Item
associated type. You can use this as a constraint in your trait
bounds (i.e. <R: Radium<Item = i32>>
) in order to gain direct access to the
primitive, or you can require that R::Item
implement other traits and interact
with it through them.
Type Aliases
Radium provides type aliases for each primitive which could be atomic in the
standard library. The Radium{Bool,{I,U}{8,16,32,64,128,size},Ptr}
symbols all
forward to their corresponding AtomicType
when that symbol exists, or to
Cell<Type>
when it does not.
Since these are type aliases rather than newtypes, you can globally replace the
AtomicT
symbols with their RadiumT
equivalent without any other changes to
your codebase.
Type Families
Radium provides three type families which accept fundamentals as type
parameters. These families have no inherent API, and only implement Radium
,
Debug
, Default
, and From<T>
. They may or may not have a Sync
implementation, depending on whether they have atomic behavior.
-
The
Atom<T>
family corresponds to (and wraps) the standard library’score::sync::atomic::*
types. This family only acceptsT
parameters where an equivalentAtomicT
symbol exists for the target; this means that on targets which do not have, for instance,AtomicU64
,Atom<u64>
will fail to compile. These are alwaysSync
.Atom
requires that type arguments implementradium::marker::Atomic
. -
The
Isotope<T>
family functions similarly toAtom<T>
, except that it wraps Radium’sRadiumT
type aliases. As such,Isotope
is portable across targets with different atomic supports, and will never fail to compile; however, it will silently degrade from atomic toCell
behavior (including loss ofSync
) when the requisite atomic types are missing.Isotope
requires that type arguments implementradium::marker::Nuclear
. -
The
Radon<T>
family is a wrapper overCell<T>
. LikeIsotope
, it requires that type arguments implementradium::marker::Nuclear
. It is neverSync
.
Examples
This contrived example is taken from radium/examples/schroedinger.rs
. It
shows how the Radium
trait can be used by a worker function to manipulate
data, and how the different types can be used to work in sequence or in
parallel.
Note: Radium’s MSRV is 1.60, while the scoped-threads API used here stabilized in 1.63.
use radium::{Radium, types::{RadiumU64, Atom, Isotope, Radon}};
use std::{
cell::Cell,
sync::atomic::{AtomicU64, Ordering},
thread,
time::Duration,
};
fn do_work<R: Radium<Item = u64>>(this: &R, ident: u8) {
let on_entry = this.load(Ordering::SeqCst);
println!("{: >2} step 0 sees: {: >2}", ident, on_entry);
let before_add = this.fetch_add(10, Ordering::SeqCst);
println!("{: >2} step 1 sees: {: >2}", ident, before_add);
let after_add = this.load(Ordering::SeqCst);
println!("{: >2} step 2 sees: {: >2}", ident, after_add);
thread::sleep(Duration::from_millis(after_add));
let before_sub = this.fetch_sub(3, Ordering::SeqCst);
println!("{: >2} step 3 sees: {: >2}", ident, before_sub);
let on_exit = this.load(Ordering::SeqCst);
println!("{: >2} step 4 sees: {: >2}", ident, on_exit);
}
static ATOM: AtomicU64 = AtomicU64::new(0);
static RADIUM: RadiumU64 = RadiumU64::new(0);
fn main() {
let cell = Cell::new(0u64);
let atom = Atom::new(0u64);
let isotope = Isotope::new(0u64);
let radon = Radon::new(0u64);
println!("atoms");
thread::scope(|s| for ident in 0 .. 3 {
s.spawn(move || do_work(&ATOM, ident));
});
println!();
thread::scope(|s| for ident in 3 .. 6 {
let atom = &atom;
s.spawn(move || do_work(atom, ident));
});
println!();
println!("isotopes");
thread::scope(|s| for ident in 6 .. 9 {
s.spawn(move || do_work(&RADIUM, ident));
});
println!();
for ident in 9 .. 12 {
do_work(&isotope, ident);
}
println!();
println!("cells");
for ident in 12 .. 15 {
do_work(&cell, ident);
}
println!();
for ident in 15 .. 18 {
do_work(&radon, ident);
}
println!();
}
funty
The funty
crate (fundamental types) provides traits that unify
the Rust non-pointer primitives. It also unifies pointers and references by
lifting access permissions into the trait system.
Fundamental Unification
The Rust primitives implement the following trait hierarchy and replicate their standard-library inherent API and trait implementations.
Fundamental
: this is implemented by all primitives:bool
,char
, all integers, and both floats. It requires all traits that all primitives implement, and provides.as_other()
methods that can replaceas
-casts.Numeric
: this is implemented by all integers and both floats. It adds the arithmetic operator traits, and methods for converting between the integer and its raw byte representation.Integral
: this is implemented by all integers. It adds bit-wise operator traits, attempted conversions between the other integers, and bit-shifts. It also provides most of the integer inherent API, as most of these methods are sign-agnostic.Signed
: this is implemented only by signed integers. It adds the absolute-value and sign-testing functions that unsigned integers don’t support.Unsigned
: this is implemented only by unsigned integers. It provides the{is,next}_power_of_two
one-hot methods that only make sense on unsigned integers.
Floating
: this is implemented by the floating-point numbers. Unlike the integral traits, it has a great deal of methods that only exist whencfg(feature = "std")
is active, as they require the platformlibm
mathematics library and are not provided by Rust’score
crate. It also provides both all of the associated constants, as well as all of the constants stored in eponymous modules but not associated with the actual floating-point primitive types.
Additionally, funty
provides marker traits for selecting bit-width: for N
in 8
, 16
, 32
, 64
, and 128
, the IsN
trait is implemented by types
that are exactly that wide, AtLeastN
is implemented by types that are that
width or more, and AtMostN
is implemented by types that are that width or
less.
You can use these traits as generic constraints in code that needs to accept a
range of different primitives. The integral traits provide Peano constants (zero
and one), and can be constructed from literals for non-const
work.
Pointer Unification
The funty::ptr
module provides Pointer
and NonNullPtr
types which are
replacements for raw pointers and core::ptr::NonNull
, respectively. They work
by lifting the *const T
/*mut T
distinction into the trait system, through
the Permission
trait and the Shared
, Unique
, and (Shared, Unique)
types.
The Permission
trait and its implementors implement a less-capable version of
the stacked-borrows experimental model found in Miri. Pointer<T, P>
implements
the read-only APIs found on both *const
and *mut
pointers, while
Pointer<T, Unique>
alone implements the write APIs only present on *mut
pointers. Additionally, type-level transitions allow safely casting Unique
pointers down to read-only and back up to Unique
, and unsafely casting
directly to a permission that you specify.
The NonNullPtr
behaves similarly to Pointer
, except that it encloses a
core::ptr::NonNull
in order to regain the null-pointer niche optimization. Its
API strives to match both the NonNull
and Pointer
APIs. As both raw pointers
and NonNull
still have large amounts of unstable API surface in the standard
library, these types will continue to grow in response to both Ferrilab’s needs
and the standard library’s evolution.
Permission Changes
funty
uses the Permission
trait to create a graph of safe transitions
between raw pointers and Pointer
, and between different type parameters
attached to Pointer
values.
Pointer
s can be constructed from both raw pointers: *const T
produces
Pointer<T, Shared>
, and *mut T
produces Pointer<T, Unique>
. Once
constructed, all Pointer<T, P: Permission>
values have access to the
introspective and read-only memory accesses defined on the raw pointers. The
memory-write APIs are only available on Pointer<T, Unique>
.
Additionally, the method .cast_shared()
moves Pointer
s from P
to
(Shared, P)
. The (Shared, P: Permission)
tuple is itself an implementor of
Permission
, and can continue to be used as a read-only pointer.
Pointer<T, (Shared, P)>
also provides .cast_unshared()
, which undoes
.cast_shared()
and transitions from (Shared, P)
back to P
.
All Pointer
s can produce *const
raw pointers, but only the Unique
permission can produce *mut
raw pointers. If you need access to *mut
raw pointers but are in generic code where you cannot satisfactorily prove to
the compiler that you have a Unique
, you have two options. The
.unwind_to_unique()
method recursively unwinds a (Shared, P)
history stack
until it reaches the base, then produces Some
pointer if the original
permission was Unique
or None
if it was Shared
. The unsafe
method
.cast_mut()
unconditionally produces a pointer with Unique
permissions, but
may violate Rust’s provenance rules and invoke undefined behavior.
bitvec
This is Ferrilab’s largest, most complex, project and likely the reason you’re reading this guide. It provides a system that allows Rust code to treat memory as if it were bit-addressed, rather than byte-addressed, and strives to be the most correct and capable such bit-addressing system available. This results in some unavoidable complexity and performance loss, and we are always working to improve the quality of generated code.
Introduction
bitvec
was built out of myrrlyn’s experience and frustration with performing
I/O buffer manipulation using C, C++, and Ruby. His work required programs
capable of dynamically selecting an arbitrary region of a bit-stream (a task to
which C’s structural bitfield declarations are unsuited), and it required those
programs to be fast and portable to flight embedded systems (adjectives not
commonly associated with Ruby engines).
Furthermore, his work involved message schemas that were permitted to select a bit ordering at the packet and field level. This is not a behavior that any existing bit-stream library or language feature provides. These experiences informed his goals and design choices from the very beginning.
bitvec
matches, and exceeds, the functionality of every other bit-stream
implementation we have found. It is also the only Rust crate that is a drop-in
replacement for standard library types, and is able to do so while remaining
well-formed and conformant to Rust’s rules about memory access. Thanks to
excellent compiler engineering by the Rust and LLVM teams, it is able to do this
while still producing reasonably good output code.
Goals for This Guide
We hope that this guide will explain bitvec
’s design choices, philosophy,
backing theory, and overall goals. It is not a detailed exploration of the crate
API – this is hosted on docs.rs – but instead seeks to communicate how
to think about bitvec
so that you will know how to best use the APIs it
offers.
The best way I (myrrlyn) know how to communicate this information is as a dialogue between me, the author, and you, the user. Since this is a book, not a live conversation, I actively encourage you to get in contact with me with any questions or feedback through the channels listed in Ferrilab’s CONTRIBUTING document, and throughout the guide I will periodically remind you that if a section is unclear, it is an error on my part, and I would appreciate an issue or other contact so that I can improve it.
Data Structures
bitvec
’s primary exports are four data structures: BitSlice
, BitArray
,
BitBox
, and BitVec
. These correspond to the [bool]
,
[bool; N]
, Box<[bool]>
, and Vec<bool>
types in the
Rust standard libraries. Unlike the Rust types, the bitvec
types are not
composable, and cannot be mixed with any other types, including pointers in the
standard library.
You **cannot** produce `Box<BitSlice>`, `Rc<BitSlice>`, or `Arc<BitSlice>`;
attempts to do produce or use these may induce undefined behavior.
The bitvec
types implement the APIs of their standard-library counterparts to
the fullest extent possible. The only missing feature is currently
IndexMut<usize>
, preventing collection[index] = bit;
assignment. This means
that, for users looking for compact usize => bool
collections, substituting
types in your project codebase ought to be enough to make your project Just
Work™️.
It is the fact that BitSlice
acts exactly like [bool]
, and can only be used
through &BitSlice
and &mut BitSlice
references, that makes bitvec
unique.
No other Rust library has this capability.
Before we explore the data structures in more detail, there are three caveats I must provide:
-
bitvec
uses an opaque, custom, pointer representation for everything exceptBitArray
. You may not inspect or modify this pointer. You may not use it as a type or value parameter in any other types or functions. You will break your program if you try.bitvec
ensures that this pointer encoding does not fail the compiler and language requirements for reference correctness. The rules used to do so are internal to the crate, and will not be present outside it.bitvec
pointers are perfectly safe to use, as long as you treat them as completely opaque and use only the interfaces provided.Standard-library smart pointers do not understand this encoding and would attempt to dereference the encoded pointer; this is why trying to use
BitSlice
with them is undefined. -
These structures all have two type parameters,
<T: BitStore, O: BitOrder>
. These parameters are described in the next chapter. They govern the in-memory representation of data storage, but are not relevant to the general use of the handle types. -
bitvec
trades an increased memory space efficiency for decreased instruction size and speed efficiency. The compiler may optimize some of the costs away, butbitvec
structures are not free to use. The “zero-cost” of thebitvec
abstraction is that you cannot write a better bitset, and not that it is equal in runtime performance to an ordinarybool
sequence.
Now that the disclaimers are over, we can talk about how the types actually work.
Slices
The base type of the project is BitSlice
. This is a region type, like
[bool]
, and cannot be held directly. Instead, it is accessed by borrowed
references (&BitSlice
, &mut BitSlice
) or owning handles (BitArray
,
BitBox
, BitVec
). The distinction between the handles and the region is the
same as it is in ordinary Rust types.
The BitSlice
region is able to begin and end at any bit in memory, and is not
restricted to having one edge aligned to the edge of its initial element. This
restriction, present in all of its competitors, is removed through the use of a
special encoding in all pointers to the region, which stores the starting bit of
the base element in part of the slice pointer that describes the real memory.
There are eight bits to a byte on all systems Rust targets, and therefore the
index of a bit within a byte is itself three bits wide. These bits are taken
from the length counter of a slice pointer, which means that BitSlice
is able
to address only ⅛th of the indices that [bool]
can.
This is 64 [Mebibytes] on a 32-bit system, and 256 [Pebibytes] on a 64-bit
system. If you can even allocate that much real memory in one handle, then you
have very different operating conditions than I can help you with.
[Mebibytes]: https://en.wikipedia.org/wiki/Mebibyte
[Pebibytes]: https://en.wikipedia.org/wiki/Pebibyte
Getting a BitSlice
BitSlice
is strictly a borrowed region. It can neither be created nor
destroyed; rather, views to it are acquired from a memory buffer that some other
binding owns.
The BitStore
chapter covers this in more detail, but only slices of the
unsigned integers u8
, u16
, u32
, usize
, and (on 64-bit targets) u64
can
be used as the source memory for a BitSlice
. (You can also use their Cell<>
wrappers or atomic variants; this will be discussed later).
Borrowing Constructors
The simplest way to create a BitSlice
reference is to borrow it from ordinary
Rust data. The BitView
trait, available in the prelude, implements methods
on the supported unsigned integers, all arrays of them, and their slices.
use bitvec::prelude::*;
let byte = 0u8;
let bits = byte.view_bits::<LocalBits>();
let array = [0u16; 2];
let bits = array.view_bits::<Lsb0>();
let mut array = [0u32; 3];
let slice = &mut array[..];
let bits = slice.view_bits_mut::<Msb0>();
The .view_bits()
and .view_bits_mut()
methods take the other type parameter
bitvec
requires. This is described in the BitOrder
chapter. Use Lsb0
until you have a specific need for a more precise parameter.
In addition, BitSlice
offers constructor functions ::from_element()
,
::from_slice()
, and their _mut
variants, which borrow elements and slices,
respectively, and construct &/mut BitSlice
references from them. The trait
methods are generally easier, and certainly shorter to write, but they all do
the same work.
Lastly, empty slices can be produced with the ::empty()
or ::empty_mut()
functions, since there is no &[]
or &mut []
literal available.
Macro Constructor
In addition to these method constructors, you may also use the bits!
constructor macro. This macro runs at compile-time to produce a buffer
containing the correct data values, then borrows it as a BitSlice
reference.
It is a macro_rules!
macro, not a procedural macro, and should not have a
significant impact on your compilation times.
By default, the produced buffer is a temporary that the compiler will then
extend to have the minimum lifetime of the produced reference handle. However,
you can use the static
keyword to cause the macro to produce a hidden and
unnameable static BitArray
backing buffer, which then provides the
&'static BitSlice
lifetime. Since this static
buffer cannot be named, it is
safe to use even when mut
able, as the provided reference is the only handle to
it.
The macro syntax extends that of vec!
. The simplest invocations are sequences
or repetitions of expressions, which can optionally be made mut
able:
use bitvec::prelude::*;
let r = bits![0, 1, 0, 1];
let w = bits![mut 0, 1, 0, 1];
let r2 = bits![static 1; 4];
let w2 = unsafe { bits![static mut 1; 4] };
You are not required to use the literals `0` or `1`; you can use any expression
that is `const`-evaluable and can be placed into the expression `expr != 0`.
This means that you cannot use the names of runtime `let` bindings, but can use
the names of `const` bindings, or other literals. You probably do not want to do
this, but you *can*.
In addition, you can specify the bit-ordering integer storage type parameters
for even more precise control over the memory layout. If you do not specify
them, the macro uses the default parameters of usize
storage and Lsb0
ordering.
use bitvec::prelude::*;
let in_bytes = bits![u8, LocalBits; 0, 1, 0, 1];
let in_shorts = bits![u16, Lsb0; 0, 1, 0, 1];
let in_ints = bits![mut u32, Msb0; 0; 4];
To summarize the macro rules:
- If the first macro argument is
mut
, then the macro produces&mut BitSlice
, otherwise it produces&BitSlice
. You do not need to bind the name asmut
unless you want to reässign it to a different slice. - You may then optionally provide the storage and ordering type parameters, followed by a semicolon. You must provide either both or neither.
- The data input to the macro is one of the two
vec!
token lists:- One or more expressions that can be placed into
$expr != 0
, separated by commas. A trailing comma is permitted. - A single expression that can be placed into
$expr != 0
, followed by a semicolon and a repetition counter. The resultingBitSlice
will becounter
bits long, all set toexpression
.
- One or more expressions that can be placed into
Emulation tests indicate that `bitvec` correctly instructs the compiler to
produce suitable buffers even when compiling for a target with a different
byte-endianness than the host. However, I have not actually performed such
cross-compilation and testing with real hardware. It should be correct; please
file an issue if it is not.
What BitSlice
Can Do
Now that you have acquired a BitSlice
reference, either by borrowing memory
from elsewhere in your program or by creating a temporary, it is time to do some
actual work with it.
… That [bool]
Can
Everything1. I am not going to rewrite the standard library’s slice documentation here.
… That [bool]
Cannot
In addition to the standard library [bool]
API, BitSlice
offers some
inherent methods tailored to its specialization.
Set Queries
The five query methods .any()
, .all()
, .not_any()
, .not_all()
, and
.some()
test how many bits in a region are set to 1
. These methods have the
following truth table:
Slice | any | all | not_any | not_all | some |
---|---|---|---|---|---|
00 | false | false | true | true | false |
01 | true | false | false | true | true |
11 | true | true | false | false | false |
any
is the Boolean OR operator; all
is the Boolean AND operator, and some
is the Boolean XOR operator.
In addition, .count_ones()
and .count_zeros()
count how many bits of the
slice are set to one or zero, rather than merely indicating whether any exist.
These methods are slower than the Boolean queries, which are capable of
short-circuiting once satisfied.
You can also use .iter_ones()
and .iter_zeros()
to walk each index of bits
with the specified value. These are equivalent to running .filter()
and
.enumerate()
calls on iterators of bool
, but are specialized to use
dedicated bit-counting instructions where processors provide them.
Boolean Arithmetic
bitvec
data structures all implement the Boolean operators (&
, |
, ^
, and
!
) against each other.
In version 0, they allowed any `impl Iterator<Item = bool>`. This has been
changed for performance reasons, since people never used the arbitrary iterator
support but did require improved behavior when operating on two bit-slices.
These still have performance degradations, as the conditions required to allow
specialized acceleration can be difficult to guarantee at either run- or
compile- time. We are working on it, but cannot make any promises.
use bitvec::prelude::*;
let mut or = bits![mut 0, 0, 1, 1];
or |= bits![ 0, 1, 0, 1];
assert_eq!(or, bits![ 0, 1, 1, 1]);
let mut and = bits![mut 0, 0, 1, 1];
and &= bits![ 0, 1, 0, 1];
assert_eq!(and, bits![ 0, 0, 0, 1]);
let mut xor = bits![mut 0, 0, 1, 1];
xor ^= bits![ 0, 1, 0, 1];
assert_eq!(xor, bits![ 0, 1, 1, 0]);
let mut not = bits![mut 0, 1];
not = !not;
assert_eq!(not, bits![ 1, 0]);
Writing To Memory
You can set all bits in a region to a new value by using the .fill()
method,
or you can set one bit in a region to a new value by using either the .set
or
.get_mut
methods. .get_mut
produces a proxy type which acts roughly like an
&mut bool
reference slot.
use bitvec::prelude::*;
let bits = bits![0; 4];
assert!(bits.not_any());
bits[0 .. 1].set_all(true);
assert!(bits[0]);
bits.set(1, true);
assert!(bits[1]);
*bits.get_mut(2).unwrap() = true;
assert!(bits[2]);
let mut bit = bits.get_mut(3).unwrap();
assert!(!bit);
*bit = true;
assert!(bits[3]);
assert!(bits.all());
The proxy type produced by .get_mut()
implements DerefMut<Target = bool>
, so
you can assign into it and read out of it. However, it does not commit the value
assigned into it back to its source BitSlice
until it Drop
s.
You can force the destruction of a named proxy reference by using its
.commit()
method, which takes self
by value, destroying it and releasing the
borrow.
The proxy type is *not* a reference, which means you need to bind it with
`let mut` in order to be able to write through it as if it were a reference.
Viewing the Underlying Memory
The memory underlying any bit-slice region is subject to some restrictions about
aliasing that are documented more thoroughly in the domain
module and the
Memory Model chapter. In short, borrowed BitSlice
regions cannot view
their underlying memory directly without violating aliasing rules established by
either the Rust language or by bitvec
itself. Instead, the .domain()
and
.domain_mut()
methods provide views that correctly handle aliasing and edge
conditions, and mediate access to the underlying memory.
The owning handles (BitArray
, BitVec
, and BitBox
) do not have this
limitation, as they can guarantee unique access to a memory location without any
possibility of aliasing. As such, these types all have .as_raw_slice()
and
.as_raw_mut_slice()
methods that provide ordinary slice views to their storage
region.
Footnotes
Except write-assignment through indexing. I am not going to keep mentioning this exception.
Arrays
While BitSlice
describes a region of borrowed data, BitArray
provides a
container that can hold and manage such a region.
It is most comparable to the C++ type std::bitset<N>
. Unfortunately, the
Rust support for type-level integers is still experimental, so it is unable to
take the length of the BitSlice
it contains as a type parameter. Instead, it
must take the entire region it contains as a type parameter. The full type
declaration is
use bitvec::prelude::*;
pub struct BitArray<
A: BitViewSized,
O: BitOrder,
> {
_ord: PhantomData<O>,
data: A,
}
As described in the previous chapter, the BitView
trait is implemented on
the unsigned integers, and on arrays of them.
Once type-level computation stabilizes, `BitArray` will change to have the type
parameters `<T: BitStore, O: BitOrder, const N: usize>`, matching the
`std::bitset<N>` length parameter. This will require a major-version increase.
This array dereferences to a BitSlice
region over its entire length. It does
not currently permit shortening its BitSlice
from either end. If this is a
behavior you want, please file an issue.
Declaring a BitArray
Type
Until Rust allows type-level integer computation to affect the memory layout,
BitArray
types remain awkward to declare. You could declare types yourself by
using the bitvec::mem::elts
function:
use bitvec::{array::BitArray, mem, order::Lsb0};
type MyArray = BitArray<[u8; mem::elts::<u8>(50)], Lsb0>;
But for convenience, we provide a BitArr!
type-declaration macro. It expands
to exactly the expression above. It accepts the following syntaxes:
use bitvec::{BitArr, order::Lsb0};
// explicit ordering
type A = BitArr!(for 50, in u16, Lsb0);
// implicit ordering defaults to Lsb0
type B = BitArr!(for 50, in u32);
// implicit store defaults to usize
type C = BitArr!(for 50);
Creating a BitArray
Value
The ::ZERO
constant is a blank BitArray
with its memory completely zeroed.
The ::new()
function wraps an existing element or array into a BitArray
. In
addition, the macro constructor bitarr!
takes the exact same arguments as the
bits!
constructor, except that it returns an array directly rather than a
reference to a buffer.
Furthermore, BitArray
structures and references can be constructed from
&BitSlice
references using the TryFrom
trait, just as arrays can be
constructed in the standard library.
Using a BitArray
Once constructed, BitArray
offers the .as_bitslice()
and
.as_mut_bitslice()
explicit methods, as well as all the standard traits, to
borrow its data as a BitSlice
. The array has almost no functionality of its
own, and serves primarily to own a region used as a BitSlice
. Like standard
library arrays, it natively produces a by-value iterator
Once you are done using BitSlice
to manipulate the array, you can remove the
array with .into_inner()
and regain the A
memory within.
That’s everything that the array does! Like regular arrays, it is useful
primarily for its ability to move memory through a program, and has essentially
no behavior in its own right. As a plain data structure, it is most useful for
programs that do not have access to a dynamic allocator, and do not wish to use
static
buffers. However, if you do have access to an allocator, you will
probably prefer to use BitVec
instead.
BitArray
is uniquely suited as a structural field for types which implement
I/O protocols and have fixed-size buffers, such as the headers of internet
transport packets, or CPU instruction set encodings. A full example can be found
in the bitvec
examples, but here is a short sample of how a TCP packet
header might be represented:
use bitvec::prelude::*;
#[derive(Clone, Copy, Default)]
struct TcpHeader {
data: BitArr!(for 160, in u8; Msb0),
}
impl TcpHeader {
fn data_offset(&self) -> usize {
self.data[96 .. 100].load::<u8>() as usize
}
fn set_data_offset(&mut self, value: u8) {
if !(5 ..= 15).contains(&value) {
panic!("invalid data offset value");
}
self.data[96 .. 100].store(value);
}
fn syn_flag(&self) -> bool {
self.data[110]
}
fn sequence_number(&self) -> u32 {
self.data[32 .. 64].load_be()
}
fn as_bytes(&self) -> &[u8] {
self.data.as_raw_slice()
}
}
This snippet shows how you can:
- use
BitArray
as the storage inside a semantic new-type - select individual flag bits
- use sub-byte regions for small integer storage
- use multi-byte regions for large integer storage
- access the raw storage for interaction with I/O systems
Dynamic Allocations
bitvec
has two types that require compiling the crate with feature = "alloc"
(and linking against the Rust distribution crate alloc
): BitVec
and
BitBox
. BitVec
is a dynamically resizable vector, equivalent to the C++ type
std::vector<bool>
and the Rust type Vec<bool>
. BitBox
is a frozen
vector, incapable of changing its length, equivalent to the Rust type
Box<[bool]>
or the C++ type std::unique_ptr<std::bitset<N>>
.
Since BitBox
is a vector that cannot .push()
or .pop()
, I will not discuss
it in detail here. It is a heap-allocated BitSlice
, and is otherwise
uninteresting. It only exists because Box<BitSlice>
is not expressible.
Getting a BitVec
BitVec
implements the constructors shown in the standard library: ::new()
creates a handle without any allocation, BitSlice
implements the
.to_bitvec()
method, and and ToOwned
trait, to copy a bit-slice into a new
bit-vector. Additionally, the bitvec!
macro takes all the bits!
arguments
and produces an owned allocation instead of a stack temporary. You can also
construct a BitVec
by .collect
ing any iterator of bool
s.
use bitvec::prelude::*;
let a: BitVec = BitVec::new();
let b = bits![0, 1, 0, 1].to_bitvec();
let c = bits![0; 4].clone();
let d = bits![u8, Msb0; 1; 4].to_owned();
let e = bitvec![0, 1, 1, 0];
let f = bitvec![u16, Msb0; 0; 4];
let g = [true, false, true, false]
.iter() // &bool
.copied() // bool
.collect::<BitVec>();
Using a BitVec
Once constructed, BitVec
implements the entire API that Vec
does in the
standard library. This remains uninteresting to write out.
Like BitSlice
, BitVec
and BitBox
are implemented as stack handles that use
the specially-encoded pointer to describe their region. This enables them to
remain the same size as their standard-library counterparts, while making them
completely opaque to inspection.
Because they are fully owned, BitVec
and BitBox
have some important
behavioral differences from BitSlice
. Primarily, because they do not have to
worry about other handles viewing the memory under their control, they can
modify the contents of their buffers outside the BitSlice
that is considered
live, and they do not exclude partial edge elements when viewing their buffer as
raw memory. If you are using BitVec
to construct an I/O buffer, these two
facets can have surprising results if you are not careful to fully initialize a
memory span.
Vectors, like slices, do not need to begin at the zeroth index of the base
element. They can begin, and end, at any bit in any element. This will only
happen when copying a vector from a source slice that was misaligned. The
.force_align()
method moves the vector’s live slice down to start at the zero
index. Once done, extending the live slice to reach the last index of an element
ensures that viewing the buffer as raw memory will have no uninitialized bits.
Type Parameters
bitvec
uses type parameters to permit precise user control of its behavior and
in-memory representation. The Rust generic system permits bitvec
to have a
more powerful and capable behavior than any other bitstream library yet
implemented in any language.
All bitvec
types take two type parameters:
T
: the storage type. The slice types (BitSlice
,BitVec
,BitBox
) select an unsigned integer, since as slices they store the length in their runtime state.BitArray
uses either a bare integer or an integer array as its storage parameter, since it has no runtime state of its own and stores all of its information in the type system.O
: the ordering of bits within a singleT
element. We provide two orderings,Msb0
andLsb0
.
The combination of these two type parameters governs how bitvec
translates its
abstract storage (usize -> bool
) into real memory; if you do not care about
real-memory representation, then the default type parameters <usize, Lsb0>
will give you the best performance. If you do care about this, then the
memory representation chapter goes into detail about all the combinations
and can help you select which one best fits your needs.
The BitOrder
trait is open for third-party implementation. It describes its
requirements in great detail in its API documentation, so if you have a memory
representation that is neither Lsb0
nor Msb0
, you can implement the ordering
yourself and bitvec
will use it without complaint.
Rust syntax requires explicitly choosing type parameters when using generic
expressions, such as BitVec::<Store, Order>::new()
, and will not substitute in
the default parameters when attempting to elide the parameters with
BitVec::new()
. However, Rust will use the default type parameters in
patterns: let bv: BitVec = BitVec::new();
will use the default type parameters
in the : BitVec
type annotation, which then completes the type of the
expression on the right side of the assignment =
.
Bit Ordering
bitvec
expects user code to count semantic indices, and hides the actual
position of a bit within a memory element from users. This allows user code to
treat indices as a uniform domain of integers in 0 ..= !0 >> 3
, and not have
to remember whether place - 1
means moving “forward” or “backward” in the
buffer.
Internally, each *BitSlice
pointer contains an element address and a bit
index. The pointer uses its BitOrder
type parameter to translate the bit index
into a mask that drives actual memory access.
BitOrder
is open to user implementation, and implementations of it are trusted
to be sound in the bitvec
memory model. For this reason, the trait is unsafe
to implement. Most users will not implement it; almost all users want one of the
two monotonic orderings provided for you. However, some specialized users may
have an ordering of their own, and they are still able to encode that ordering
away from their index logic.
Provided Orderings
bitvec
provides two orderings: Lsb0
and Msb0
. These each refer to
which numeric bit in a register element is considered to be the zero-index.
You can think of these as corresponding to the little-endian and big-endian processor byte orderings if you like, as long as you remember that your choice of bit-ordering is not at all related to the byte-ordering of your target processor.
Lsb0
The Lsb0
type sets the zero index at the least significant bit of a register
(numeric value 1) and each successive index selects the next more significant
bit in the register, until the most significant bit is at the final index.
It is the expression mask = 1 << index;
.
Msb0
The Msb0
type sets the zero index at the most significant bit of a register
(numeric value 2n - 1 for an n
-bit register) and each successive
index selects the next less significant bit in the register, until the least
significant bit is at the final index.
It is the expression mask = (iN::MIN as uN) >> index;
.
LocalBits
The type alias LocalBits
refers to the ordering that your target’s C compiler
will likely choose for its bitfield direction. This is Lsb0
on little-endian
byte-ordered processors, and Msb0
on big-endian byte-ordered processors.
Remember that the BitOrder
implementations and processor byte orderings have
no relation to each other! This is only a custom, not a requirement of the
processor architecture.
Default Ordering Parameter
Lsb0
is the default type parameter used by the sequence types, as it produces
selection masks using the starting value 1
, which encodes to smaller
instructions than the Msb0
starting value.
On AMD64, the pairs <u64, Lsb0>
and <u64, Msb0>
produce the following object
code and disassembly:
ba 01 00 00 00 movl $1, %edx
48 d3 e2 shlq %cl, %rdx
48 ba 00 00 00 00 00 00 00 80 movabsq $-9223372036854775808, %rdx
48 d3 ea shrq %cl, %rdx
The Msb0
load requires an additional four bytes of zeros in its immediate,
and the 64-bit modifier prefix (0x48
), in order to encode movabsq
instead of
movl
Implementing BitOrder
As stated above, this is an unsafe trait
because bitvec
relies on its
implementations to uphold certain mathematical invariants. Failure will result
in memory unsafety and/or program crashes.
BitOrder
has three functions: at
, select
, and mask
. Implementors are
required to provide at
; select
and mask
have default implementations that
rely on it. These functions are all generic over the BitMemory
trait; this
trait describes the register types (unsigned integers) that can be used by
bitvec
. It provides some useful associated constants, and is otherwise
uninteresting.
-
::at()
receives the semantic index of a bit within a register type, and must produce the concrete position corresponding to the semantic index. The input and output are both integers in the domain[0, W)
whereW
is the bit-width of the register type being indexed.at
must implement an exactly one-to-one mapping from all inputs to all outputs in the[0, W)
domain. This mapping must never observably change. These are strict requirements of the library, and failing to uphold either will break your program. -
::select()
receives the semantic index of a bit within a register type, and must produce a value of that register type with exactly one bit set. The produced value is a mask that selects only the bit specified by the provided index, and will be used in Boolean arithmetic to manipulate memory.The default implementation is
1 << at(index)
. You are required to maintain this behavior in your override. -
::mask()
receives an inclusive start index and an exclusive end index, and must produce a register value that selects every bit in the indexed range.The default implementation is
(start .. end).map(select).sum()
. You are required to maintain this behavior in your override.
Support Types
The BitOrder
trait APIs use supporting types to enforce requirements on the
bare numbers being passed through it. These types are documented in the
index
module. They all provide a .value()
method that removes the wrapper
and yields the inner number, and a ::new()
constructor that ensures that
values to be wrapped uphold the type’s requirements.
at
andselect
receive aBitIdx<M>
argument. This is a wrapper overu8
that ensures that the contained value is in the domain0 .. M::BITS
. It serves to indicate that a number is a semantic counter, not an electrical position.at
returns aBitPos<M>
value. This has the same behavior and restrictions asBitIdx<M>
, and indicates that the number is an electrical position within a register.select
returns aBitSel<M>
value. This wraps anM
register value, and ensures that the contained number is a power of two – exactly one bit is set, and all others are zero. This type indicates that the mask is guaranteed to select exactly one bit in a register.mask
receives an inclusiveBitIdx<M>
and an exclusiveBitEnd<M>
argument, and returns aBitMask<M>
value.BitEnd<M>
ensures that the contained number is in the domain0 ..= M::BITS
, including the final count, and marks a one-past-the-end exclusive boundary.BitMask<M>
marks that the contained number may select any number of bits in a register.
New implementors of BitOrder
will use these types to satisfy behavior
requirements individually.
In addition, implementors’ test suites should call the function
order::verify_for_type::<O, M>()
to check that an implementation O
satisfies
the behavior requirements for a particular register type M
, or call
order::verify::<O>()
to check that an implementation satisfies the behavior
requirements for all supported register types. These functions are conditional
upon cfg(test)
, and accept a verbose: bool
parameter that allows the test
functions to print diagnostics to stdout
during evaluation.
If the verification functions panic, your implementation is incorrect, and
cannot be safely used in bitvec
.
Bit Storage Within Registers
The BitStore
trait governs the processor behaviors used to interact with the
memory of a BitSlice
buffer. These include both the width of the processor
register used to contain a memory element, and the load/store instructions the
processor uses to move data across the memory bus.
BitStore
has no behavior of its own, and serves only to collect associated
types and constants. It cannot be implemented outside bitvec
, and is a closed
portion of the API. You can freely use the trait as a bound in types that
contain bitvec
structures, but should not otherwise attempt to make use of it.
Implementations
BitStore
is implemented on all unsigned integer types not larger than a target
processor’s word size, all Cell<T>
wrappers of them (as Cell
is a compiler
directive), and their AtomicT
variants.
Not all processors have atomic instructions for all their scalar registers.
bitvec
uses the [radium
] crate to manage its support for atomic types.
On architectures with missing atomics, bitvec
’s default feature set will cause
a compiler error when you attempt to instantiate a bitvec
structure with the
register that is missing an atomic variant. You can fix this by using a narrower
register that does have atomic instructions, or by disabling default-features
and not enabling the "atomic"
feature.
Associated Types
The Mem
associated type names the scalar integer corresponding to the
BitStore
type. u8
, Cell<u8>
, and AtomicU8
all implement BitStore
with
their Mem
type assigned as u8
; the same is true for the wider registers.
This type is used to create selection masks in the processor and permit access to unaliased memory.
The Access
associated type names the type used to implement memory access. The
BitAccess
trait is an internal bridge to Radium
that allows a consistent
memory API, regardless of instructions used. All reads from and writes to memory
route through this association and trait.
Lastly, the Alias
and Unalias
associated types enable bitvec
to gracefully
and correctly handle events that cause multiple handles to alias the same memory
address. This association is used in .split_at_mut()
to select the alias-aware
type used for all subsequent accesses.
The Mem
and Alias
types are exposed in public APIs according to local alias
information. The Access
type is never publicly exposed, and only used for code
generation.
Practical Use
That’s enough theory; let’s talk about how to use the crate. This chapter is
divided into two subsections, one for use cases that want an ordinary bool
collection library with transparent memory compaction, and one for use cases
that want a convenient way to precisely sculpt memory. Before getting in to
either, let’s quickly recap how the bitvec
types interact with memory
ownership.
Rustic Memory Management
The first and most important thing to remember is that, despite the extra
complexity just discussed about memory parameters and aliasing behavior,
bitvec
is just Rust. It obeys all of the rules and patterns that the rest of
Rust does.
BitSlice
s, like regular slices, are exclusively borrowed from owned memory
higher up the call stack. Their references can be passed down the stack, and are
subject to the same lifetime and mutation-exclusivity rules.
The owned memory that creates a BitSlice
view can be an array, boxed slice, or
vector of either ordinary integers, or their wrapper equivalents provided by
bitvec
:
use bitvec::prelude::*;
let array = [0u8; 8];
let boxed: Box<[u16]> = Box::new([0u16; 4]);
let vec = vec![0u32; 2];
let bits_a = bitarr![u8, Msb0; 0; 64];
let bits_b = bitbox![u16, Lsb0; 0; 64];
let bits_v = bitvec![u32, LocalBits; 0; 32];
Once memory is bound, it can be borrowed as a BitSlice
by using the BitView
trait (imported in the prelude), or by using the fact that all bitvec
containers borrow themselves as BitSlices
just like standard-library
containers borrow themselves as slices:
use bitvec::prelude::*;
let array = [0u8; 8];
let boxed: Box<[u16]> = Box::new([0u16; 4]);
let vec = vec![0u32; 2];
let bits_a = bitarr![u8, Msb0; 0; 64];
let bits_b = bitbox![u16, Lsb0; 0; 64];
let bits_v = bitvec![u32, LocalBits; 0; 32];
let array_bits = array.view_bits::<Msb0>();
let boxed_bits = boxed.view_bits::<Lsb0>();
let vec_bits = vec.view_bits::<LocalBits>();
let bits_a_ref: &BitSlice<_, _> = &bits_a;
let bits_b_ref: &BitSlice<_, _> = &bits_b;
let bits_c_ref: &BitSlice<_, _> = &bits_c;
And, of course, mutability applies:
let mut arr = bitarr![0; 10]; let arr_ref: &mut BitSlice = &mut arr; arr_ref.set(1, true); assert!(arr_ref[1]);
Just as with ordinary Rust code, BitSlice
is the type you want to use when
working with memory that you are not responsible for moving around or
destroying. However, when you do need to move memory around, you need to switch
to either a static
binding or a container: BitArray
is always available, and
BitBox
and BitVec
require an allocator.
I am spending this much time discussing the Rust memory management system
because this is a very common question I receive. No other bit-stream crate
has reference types, and therefore users coming from, e.g., bit-vec
see the
same BitSlice
name and attempt to use their habits from that crate.
bitvec
is not like any other bitstream library, in Rust, C++, or another
language. bitvec
is like ordinary Rust. I promise.
Bit Collections
As discussed in the Type Parameters chapter, you should use usize
as the
BitStore
parameter for optimal performance in the generated program.
Once you have created some memory that you can view as individual bits, it is
time to actually use it. Here is the one-sentence summary of what bitvec
can
do:
Every stable API present in the standard library is replicated in
bitvec
, except forBitSlice<T, O> as IndexMut<usize>
, becausebitvec
cannot produce&mut bool
.
If you were using ordinary collections to manage sequences of bool
s, then
every part of your code will continue to work on bitvec
types except for the
array assignment slice[index] = value;
. Until and unless the IndexMut
trait
is reshaped, you will need to use one of these two alternatives:
slice.set(index, value);
or *slice.get_mut(index).unwrap() = value;
Subslicing works:
use bitvec::prelude::*;
let bits = bits![0, 0, 0, 0, 1, 1, 1, 1];
assert!(bits[.. 4].not_any());
assert!(bits[4 ..].all());
Incremental munching works:
use bitvec::prelude::*;
let mut bits = bits![0, 0, 1, 1, 1, 0, 0, 0];
// ^^^ modify the slice handle, not the slice contents
while let Some((&false, rest)) = bits.split_first() {
bits = rest;
}
assert_eq!(bits, bits![1, 1, 1, 0, 0, 0]);
while let Some((&false, rest)) = bits.split_last() {
bits = rest;
}
assert_eq!(bits, bits![1; 3]);
Mutation works:
use bitvec::prelude::*;
use std::{iter, thread};
let bits: &'static mut BitSlice = bits![mut 0; 8];
{
let (left, right) = bits.split_at_mut(4);
// Pretend that better work is happening here
let a = thread::spawn(|| left |= iter::repeat(true));
let b = thread::spawn(|| right ^= iter::repeat(true));
a.join().unwrap();
b.join().unwrap();
}
assert_eq!(bits, bits![1; 8]);
Everything you can do with a slice, an array, or a vector of bits, you can do
with bitvec
’s equivalent types. Except for IndexMut<usize>
. The only change
from the standard library types is that you are now guaranteed to use one bit of
storage for each bit of information, rather than eight bits of storage per bit.
Author’s note: Other than bragging about `bitvec`’s API fidelity, I don’t think
this section is very useful or educational. If you want to read more about how
to use `bitvec` for `usize => bool` collections, please let me know and I will
expound!
Bitfields
bitvec
’s more technically-interesting capability is that it provides
load/store memory access behaviors that allow you to write values into, and read
them back out of, any BitSlice
in memory rather than being constrained to
well-formed references to well-typed memory.
This is useful for the de/construction of packed memory buffers, such as transporting data through I/O protocols.
**AUTHOR’S NOTE**: If you are using `bitvec` to do **anything** related to the
underlying memory representation of a bit-buffer, you **must** read this
chapter, and **all** of the API docs of [`bitvec::field`] and its contents,
**in their entirety**.
I have written extensively, and yet still insufficiently, about the intricacies
involved in operating the `BitField` trait correctly. If you skim this
documentation, you *will* have unexpected behavior, you *will* get frustrated
with me for writing a bad library, you *will* file an issue about it, and I will
*probably* tell you that the behavior is correct and that I already addressed it
in the documentation.
It took me a long time to think about and a long time to write. It should take
you *also* a long time to read and a long time to think about.
All of this behavior is contained in the BitField
trait. Let’s explore that:
// bitvec::field
pub trait BitField {
fn load<M>(&self) -> M;
fn store<M>(&mut self, value: M);
}
impl<T> BitField for BitSlice<T, Lsb0> {
fn load<M>(&self) -> M { /* snip */ }
fn store<M>(&mut self, value: M) { /* snip */ }
}
impl<T> BitField for BitSlice<T, Msb0> {
fn load<M>(&self) -> M { /* snip */ }
fn store<M>(&mut self, value: M) { /* snip */ }
}
The actual trait is more complex than this, and will be visited later. The
important part, right now, is that BitField
allows you to load values out of
BitSlice
s and store values into them. Furthermore, it is implemented
specifically on BitSlices
that use the bit orderings provided by bitvec
,
and is not generic over all orderings.
While bitvec
could in theory provide a default implementation for all
<O: BitOrder>
, this would by necessity have the most pessimal possible
performance, and the lack of specialization for overlapping trait
implementations means that faster performance can never be written.
The downside of the two specific implementations is that Rust coherence rules
forbid implementation of a bitvec
trait, on a bitvec
type, parameterized
with a local, but non-bitvec
, implementor of BitOrder
. On the off chance
that you find yourself writing a new BitOrder
implementor, file an issue.
The M
type parameter on the load and store methods is bounded by funty’s
Integral
, trait. It can store any unsigned or signed integer at any partial
width. This parameterization allows you to combine any integer type for transfer
with any integer type for storage, rather than being restricted to only
transferring T
data into and out of a BitSlice<T, _>
.
Unfortunately, adding a second integer type parameter is not the only
complication to the BitStore
memory model. There is also a second dimension of
segment ordering. bitvec
tries to make explicitly clear that the Lsb0
and
Msb0
types refer only to the ordering of bits within registers, and not to
the ordering of bytes within registers. However, when the integer bit-sequence
being stored or stored does not fit within one register of the storage
BitSlice
, it must be split into multiple segments, and those segments must
somehow be ordered in memory.
Segment Orderings
Author’s Note: **READ THIS**. I have received *several* issues about this exact
concept. *It is not obvious*.
There are two segment orderings: little-endian and big-endian. You may select
the segment endianness you prefer by using the _le
or _be
suffix,
respectively, on the .load()
and .store()
methods. The unsuffixed method is
an alias for the endianness of your processor: _be
on big-endian targets, and
_le
on little-endian. This is a convenience only. If you are writing I/O
buffers, you should really use the explicitly-named methods.
Let us imagine a BitSlice<u8, Lsb0>
used to store a u16
that is misaligned,
and thus stored in three successive bytes. This algorithm is true for all
circumstances where the stored region occupies more than one register of the
backing slice, but smaller examples are simpler to draw.
This diagram uses 0
to refer to the least significant bit, and 7
to refer to
the most significant bit. The first row shows bytes of memory, the second row
shows the bit indices in memory used by .store_le()
, and the third row shows
the bit indices in memory used by .store_be()
.
[ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ]
3 2 1 0 b a 9 8 7 6 5 4 f e d c
f e d c b a 9 8 7 6 5 4 3 2 1 0
.store_le()
places the least significant segment in the low address, while
.store_be()
places the most significant segment in the low address. The
ordering of bits within a segment is always preserved, no matter which
ordering parameter is used by the BitSlice
.
Here is the same example, but using the Msb0
bit ordering instead. Again, the
second row uses .store_le()
, and the third row uses .store_be()
.
[ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ]
3 2 1 0 b a 9 8 7 6 5 4 f e d c
f e d c b a 9 8 7 6 5 4 3 2 1 0
The only change is in how the segments are placed into memory. The ordering of bits within a segment never changes, and is always the processor’s significance order as implemented in hardware.
How to Use BitField
You will probably find real use of the BitField
trait more educational than
the previous section. It has a very straightforward API, that you can combine
with println!
-debugging or your favorite means of viewing memory in order to
observe its actions.
Step one: create any BitSlice
-capable buffer. This can be any of the
Rust-native sequence types, or any of the bitvec
types.
use bitvec::prelude::*;
let mut data = [0u8; 4];
let bits = data.view_bits_mut::<Msb0>();
Then, narrow the BitSlice
to be the region you want to access as storage. It
must be no wider than the integer type you are transferring: BitSlice
s whose
length is outside the domain 1 ..= M::BITS
will panic during .load()
or
.store()
. The easiest way to narrow a BitSlice
(or buffer type that
dereferences to it) is by using range indexing, [start .. end]
.
use bitvec::prelude::*;
let bits = bits![mut u8, Msb0; 0; 32];
bits[10 ..][.. 13].store_be::<u16>(0x765);
assert_eq!(bits[10 .. 23].load_be::<u16>(), 0x765);
bits[10 .. 23].store_le::<u16>(0x432);
assert_eq!(bits[10 .. 23].load_le::<u16>(), 0x432);
That’s the entire API. .store()
truncates the stored value to the width of the
receiving BitSlice
, and .load()
sign-extends the loaded value to the width
of the destination register type.
Storing signed integers can be surprisingly fraught: `bitvec` **will not**
attempt to detect and preserve the most-significant sign bit when truncating! If
you store the number `-12i8` (`0b1111_0100`) in a 4-bit slot, it will be stored
as `4i8` and reloaded as such! Similarly, storing `12i8` (`0b0000_1100)` in a
4-bit slot will produce a load of `-4i8` (`0b1111_1100`).
Signed integers do not behave like unsigned integers. You are wholly responsible
for ensuring that you remember that allowing negative numbers halves the
magnitude: 4 bits unsigned is `0 .. 16`, but 4 bits signed is `-8 .. 8`.
`bitvec` **only stores bit patterns**. Retaining numerical intelligibility is
**your** responsibility.
You can see an example that uses the BitField
trait to implement an I/O
protocol in the examples/ipv4.rs
program in the repository. Use
cargo run --example ipv4
to see it in action.
Memory Representation
As discussed in the Type Parameters chapter, bitvec
allows users to select
the specific ordering of bits within a memory element when constructing a type.
This has consequences for how source code translates to an in-memory
representation.
To review: bitvec
provides two orderings of bits within a single memory
element (Lsb0
and Msb0
) and three or four types of memory elements (u8
,
u16
, u32
, and only on systems where it is 8-byte-aligned, u64
). The
usize
type is also supported, but it is not portable, and behaves exactly as
the named register of its width.
The `Cell` and atomic integer variants are not interesting here, as they only
affect how the memory bus operates, not the processor register.
Let us now examine how each possible combination of register width, bit-ordering, and processor byte endianness affects the placement of bits in memory.
Memory Layout
The BitOrder
and BitStore
traits combine with your target architecture’s
byte-ordering of register elements to create a matrix of memory traversals. This
matrix informs what is the appropriate choice of parameters to use for your
program whenever you are using bitvec
for precise memory control rather than
solely for a compact usize
-to-bool
data collection.
The tables below list bytes of a memory address space with addresses increasing to the right, and the bits within those bytes with numeric significance decreasing to the right. This is the ordering used in most debug-printing of memory, so hopefully the table contents should match up with your prior experience viewing memory bytes.
The L
and M
indicate the Lsb0
and Msb0
ordering parameters,
respectively, and xx
indicates that the row matches all register types.
Within each row, traversal begins at zero and follows the arrows to each
successive step. Boundaries between registers are marked with a column;
boundaries between bytes within the same register are marked with a space.
Little-Endian Byte-Ordered Machines
On little-endian machines, the least-significant byte of a register type is stored at the lowest memory address, and each byte-address higher is one step more numerically significant than the last.
byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
bit ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
Lxx ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
─────╫─────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────
M8 ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F
M16 ║ 2 ---> 3 0 ---> 1│6 ---> 7 4 ---> 5│A ---> B 8 ---> 9│E ---> F C ---> D
M32 ║ 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1│E ---> F C ---> D A ---> B 8 ---> 9
M64 ║ E ---> F C ---> D A ---> B 8 ---> 9 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1
Big-Endian Byte-Ordered Machines
On big-endian machines, the most-significant byte of a register type is stored at the lowest memory address, and each byte-address higher is one step less numerically significant than the last.
byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
bit ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
L8 ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
L16 ║ 3 <--- 2 1 <--- 0│7 <--- 6 5 <--- 4│B <--- A 9 <--- 8│F <--- E D <--- C
L32 ║ 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0│F <--- E D <--- C B <--- A 9 <--- 8
L64 ║ F <--- E D <--- C B <--- A 9 <--- 8 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0
─────╫─────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────
Mxx ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F
If you need to care about the memory representation, then you most likely want
to use the <u8, Msb0>
pair. This provides a consistent ordering on all
machines, and the numeric value of the underlying memory will probably match
your expectations about the semantic contents of a data structure.
This chapter, and the BitField
trait, are the two most common sources of
questions about how bitvec
operates. Their intersection is even more complex,
and the layout of numeric integers stored into a BitSlice
is an extremely
common point of confusion.
Read these chapters and the API documentation thoroughly, and experiment with placing data into memory and changing the type parameters to observe their effects on buffer representation.
bitvec
Memory Model
bitvec
addresses individual bits, while computer hardware addresses register
elements. As a result, bitvec
has a more complex memory model than the Rust
language does. The library implementation strives to satisfy users’
expectations, the Rust language’s rules, and performance in the produced
artifact to the best solution for all parties, with as few compromises as
possible. Unfortunately, this has the side effect of increasing the complexity
of the codebase, both for maintainers and for readers.
This chapter explains the abstract concepts of the bitvec
memory model and the
means by which it is encoded in the Rust language without running afoul of Rust
or LLVM rules that threaten undefined behavior.
The bitvec
memory model is typed entirely within the store
module’s
BitStore
trait definition and implementations. It utilizes the access
module’s BitAccess
trait to mediate memory access events through types that
are known to be correct in the Rust and LLVM models, so that at any point in
program execution, memory access will be consistent and sound.
In addition, the domain
module’s types provide views which manipulate the
store
model to maximize performance. This chapter will discuss primarily
store
, and use domain
only to provide examples of how store
is used in
practice to accomplish the library’s implementation.
Aliasing
To Rust and LLVM, “aliasing” occurs whenever there are two paths to a memory
region, and at least one of them has write privileges. In Rust, this is
represented by the &mut
reference exclusion rule: it is always, always,
always Undefined Behavior in the Rust memory model to have two
references which can reach a memory element if at least one of them is marked
&mut
.
LLVM, which was created for, is written in, and culturally shaped by C++, takes
a similar view with its noalias
annotation, but struggles to enforce it as
thoroughly as Rust does.
bitvec
takes a similar view of the abstract meaning of the &mut
reference
type, but where the Rust memory model focuses on whole units T
, and has no
concept of subdivision from T
into the bits that compose it, bitvec
views
each individual bit as a standalone, independent, atom of memory. It excludes
the creation of two &mut BitSlice
reference handles that are capable of
viewing the same bit, but will happily produce two &mut BitSlice
handles
which are mutually-exclusive in bits, but reference the same location in memory.
Think of it like disjoint `&mut u64` references whose referent words are in the
same cacheline. The software model doesn’t care; making this work is the
hardware’s problem. Similarly, users of the `bitvec` library don’t need to care
about the fact that disjoint `&mut BitSlice` references might refer to the same
element.
Here we come to the first problem with the conflicting memory models: bitvec
cannot ever create an &mut T
through which it may write to memory, because it
has no way of ensuring that no other &T
or &mut T
reference exists which is
capable of viewing the memory region into which it writes.
Rust Shared Mutation
The problem isn’t just that the Rust standard library doesn’t offer any
non-unsafe
APIs to produce such references. The problem is that this is about
the most illegal thing you can do in the eyes of the Rust compiler, and if it
ever detects this transgression, it has full liberty to either reject, or worse,
accept and miscompile, your program.
In Gankra’s immortal words on the topic of attempting to sneak past the compiler’s commandments,
- Transmuting an
&
to&mut
is always UB- No you can’t do it
- No you’re not special
The solution is simple: Rust exposes a type, UnsafeCell
, which is the
axiomatic type for mutation through shared views. In the Rust memory model, any
reference to a region of memory marked with UnsafeCell
is permitted to write
to it, and if any other references can view that region, then it is up to them
to ensure consistent behavior.
This is, of course, not quite true in LLVM’s memory model, but we are not there yet.
Since an &mut BitSlice<T, _>
handle cannot produce an &mut T
reference to
perform writes to memory, it must instead either use a *mut T
bare pointer,
which has absolutely no checks or optimizations whatsoëver, or use an
&UnsafeCell<T>
shared reference, which has all the usual guarantees present on
all1 reference types.
All UnsafeCell
does is instruct the Rust compiler to politely look the other
way about your program’s memory accesses. It is somewhat like the C keyword
volatile
, in that the compiler no longer believes that reads are stateless, or
freely reörderable, but entirely unlike that keyword in that the compiler
doesn’t have any obligation to keep your reads from or writes to such regions.
Rust provides an additional type called Cell
. This is a wrapper over
UnsafeCell
2 that defines a more useful API, including the only safe and
guaranteed way to write into memory through a shared reference: Cell::set
.
And voilà: &mut BitSlice<T, _>
simply constructs &Cell<T>
when writing,
the Rust compiler does not see a violation of the &mut
exclusion principle,
and we are done.
LLVM Shared Mutation
No we’re not. If it was that easy, there wouldn’t be a trait system or this document dedicated specifically to dealing with this problem.
Cell
is not thread-safe, because Cell
does not modify the instructions used
to access memory. It produces ordinary load and store instructions, carefree and
ignorant of the bane of everyone who wants consistency and the delight of
everyone who wants performance: concurrency.
Just as it is undefined behavior in Rust to manifest two &mut
references that
can view the same location, it is equally undefined behavior in LLVM to manifest
two pointers into memory that ever, at all, no matter what, perform any
memory access to the same location, at the same time, on multiple threads of
execution.
As with above:
- Unsynchronized writes are always UB.
- No you can’t do it.
- No you’re not special.
LLVM has an even more insidious punishment for this transgression that Rust does
not directly express: unsynchronized reads from a data race produce poison
.
Poison is a nifty concept, because it’s not illegal to obtain one. When LLVM
gives you a poison
, your program continues undergoing compilation as if
nothing had happened. You can pass it around. You can write to it, and if you
destroy it before reading, you’re fine.
As soon as you attempt to read the bit-wise value of poison
, your program is
undefined3.
So if bitvec
wants to be threadsafe, which it does, and it wants to insist on
its ability to safely alias the same memory location from multiple handles,
which is non-negotiable, there’s only one avenue left to take.
Atomic Powered Microscopes
Rust doesn’t actually model atomics. It doesn’t have to do so: no harm can ever
come from multiple handles reading out of the same immutable location, harm can
only occur when writes are observable, and writes are not observable due to the
&mut
exclusion rule. Well, except for UnsafeCell
, so everything that has an
UnsafeCell
in it gets marked as !Send
, &
references can’t cross threads,
and the whole problem is avoided.
This is, of course, not good enough. Concurrent, mutable, access to a location
is an important property in a computer. LLVM provides atomic types, which Rust
transparently exports as wrappers over UnsafeCell
that have their Sync
implementation restored. Handles to a region marked as atomic will use some form
of hardware-provided exclusion in order to preserve the
one-writer-XOR-any-readers system behavior, and all is well.
Hardware-level exclusion has the unfortunate penalty of being, to put it
lightly, “slow”. So while it’s the safest choice to be correct, and was in fact
the default universal choice for all memory access in bitvec
for some time,
its costs4 called for a more prudent behavior.
It’s time for a new trick, something that the Rust compiler does at the region
level, and that bitvec
now does at the element level:
Run Time Alias Analysis
BitSlice
handles can only be constructed from ordinary Rust references to
memory, which Rust guarantees start out unaliased. The potential for aliasing
only occurs when a &mut BitSlice
is split into multiple subslices using any
of the functions that eventually call .split_at_unchecked_mut()
. Since that is
the root function that introduces alias conditions, it returns subslices whose
memory type parameters are tainted with the ::Alias
marker. It has the
following type signature:
impl<T, O> BitSlice<T, O>
where O: BitOrder, T: BitStore {
pub fn split_at_unchecked_mut(&mut self, at: usize) -> (
&mut BitSlice<T::Alias, O>,
&mut BitSlice<T::Alias, O>,
);
}
The BitStore
trait defines an ::Alias
associated type, which ensures that
all memory accesses through it have appropriate aliasing guards. For builds
which do not use the atomic
feature (or where the target does not have an
atomic variant of the integer), this is Cell
, and its protections are the loss
of thread-safety. For builds that do permit atomics, the marker enforces that
all reads and writes use atomic instructions.
The ::Alias
marker is applied, at compile time, by operations that split
&mut BitSlice
references into multiple coëxisting subslices. This is a good
first step to reducing unnecessary synchrony, but not good enough. Consider the
following:
use bitvec::prelude::*;
let mut data = [0u8; 2];
let bits = data.view_bits_mut::<Lsb0>();
let (one, two) = data.split_at_mut(8);
It so happens that this is equivalent to splitting data
first, then viewing
each subslice as bits, but this transformation can only be done if the partition
point is known at compile-time to fall on an element boundary. There is no need
for the subslices one
and two
to use alias-safe operations, as accesses to
memory through them do not conflict with each other.
Bit Slice Domains
The in-memory domain of any bit slice can be generalized to one of two formats:
either the slice touches zero edge-bits (0
or T::BITS - 1
), or it touches at
least one edge-bit in at least one element. Consider three bytes of memory (any
element will do, but the extra width on this page is unnecessary), with some
bitslice regions drawn within them:
|00000000│11111111│22222222| Element
|76543210│76543210│76543210│ Bit
├────────┼────────┼────────┤
│ ┆ ┆ │ 1
│ ┆xxxxx ┆ │ 2
│ ┆ xxxxx ┆ │ 3
│ ┆ xxxxx┆ │ 4
│ xxxxxx┆xxxx ┆ │ 5
│ xxxx┆xxxxxxxx┆ │ 6
│ ┆xxxxxxxx┆xxxx │ 7
│ xx┆xxxxxxxx┆xx │ 8
│xxxxxxxx┆xxxxxxxx┆xxxxxxxx│ 9
There are nine example slices here, but they can be reduced into six specific categories, and two general ones:
- empty: row 1
- minor (interior of an element, no edge indices): row 3
- partially-spanning head, fully-spanning body: rows 3 and 6
- partially-spanning tail, fully-spanning body: rows 2 and 7
- major (partial head, partial tail, full body): rows 5 and 8
- spanning: row 9
The minor slice (row 3) is irreducible; the rest can all be divided into three subcomponents:
- zero or one partially-occupied head element, where the slice touches the last index in it but not the first
- zero or more fully-occupied middle elements, where the slice touches all indices in each
- zero or one partially-occupied tail element, where the slice touches the first index in it but not the last
We can break each row down into these components:
Row | Head | Body | Tail |
---|---|---|---|
1 | None | 0 | None |
2 | None | 0 | Some |
4 | Some | 0 | None |
5 | Some | 0 | Some |
6 | Some | 1 | None |
7 | None | 1 | Some |
8 | Some | 1 | Some |
9 | None | 3 | None |
We can observe that where a slice fully-spans some elements, those elements
cannot be mutated by any other reference. In the &BitSlice
case, all &mut
s
are forbidden by the compiler’s ordinary rules; in the &mut BitSlice
case,
bitvec
’s obedience to those same rules forbids any other handle from observing
the bits covered by the &mut BitSlice
. As such, it is statically impossible
for any alias to exist to the described memory in row 9, or for any alias to
observe element 1
of rows 6 through 8.
bitvec
happily permits element-aliasing &mut BitSlice
references to observe
the partially-filled elements in the outer columns, and middle column of rows 2
through 5, so writes to them must remain synchronized through either
single-threaded Cell
or concurrency-safe atomics. These domain components can
be calculated from the three components of a slice pointer: the base address,
the starting bit index, and the bit count.
This is expressed in the domain
module’s two enums.
BitDomain
This enum splits any BitSlice
region into its subcomponents, immutably or
mutably, respectively. It has the (rough) definition
pub enum BitDomain<'a, M, T, O>
where
M: Mutability,
T: BitStore,
O: BitOrder,
{
Enclave(Reference<'a, M, BitSlice<T, O>>),
Region {
head: Reference<'a, M, BitSlice<T, O>>,
body: Reference<'a, M, BitSlice<T::Unalias, O>>,
tail: Reference<'a, M, BitSlice<T, O>>,
},
}
and, rather than granting direct memory access, merely removes any aliasing markers from as much memory as possible. The subslices that partially fill their base element do not need to add an additional aliasing marker, as the marker is only required when writes to the element may collide. If the slice is immutable, aliasing never occurs, so synchrony is never required. If the slice is mutable, then the only way to get a partial edge slice is to either forget about some bits from the main slice, which is not an alias event, or to split the slice, which is, and splitting already marks the alias.
Domain
The bit domains are still bit slices, and do not offer a way to access the backing memory. For operations where raw memory access is required, this enum produces the same domain definitions, but typed for the bare elements rather than their bits.
It has the (rough) definition
pub enum Domain<'a, M, T>
where
M: Mutability,
T: BitStore,
{
Enclave(PartialElement<T>),
Region {
head: Option<PartialElement<T>>,
body: Reference<'a, M, [T::Unalias]>,
tail: Option<PartialElement<T>>,
},
}
(The PartialElement
type is a guarded reference which prevents accessing bits
that do not belong to the originating BitSlice
.)
As with the bit domains, these domains will inherit any aliasing markers from
their source bitslice. The ::Alias
associated type enables the mutable domain
to produce references that allow mutation without adding an unnecessary
aliasing marker. Rust strongly forbids the production of &mut
references to
aliased memory elements, which is why the only &mut
reference in these views
is to memory that is fully known to be unaliased.
The Domain
structure will produce bare atomic or Cell
types in the
alias condition. This is necessary in order to avoid production of &mut
references which alias (as this is undefined in the Rust abstract machine,
regardless of behavior), and safe because any other references to the same
location will be similarly aliased and capable of handling external mutation.
LLVM Suboptimizations
LLVM considers a “racing read”, that is, any read from memory that could occur
contemporaneously with an atomic write, to be undefined behavior. This is a
reasonable view to take, but a pessimistic one. bitvec
has information that
cannot be expressed to LLVM about which bits of an element it will observe
or modify, and bitvec
is capable of guaranteeing that two distinct access
points will not be able to interfere with each other electrically. LLVM does not
know this, so it considers any write to memory to touch all bits of the
touched element, and any read from memory to view all bits of the fetched
element.
bitvec
exclusively5 writes to contended memory with the Rust functions
AtomicT::fetch_and
and AtomicT::fetch_or
, which are mapped to the LLVM
instructions __atomic_fetch_and_N
and
__atomic_fetch_or_N
. It uses bitmasks that bitvec
can
guarantee are non-intersecting, but this proof cannot be extended to
even the Rust compiler, let alone LLVM. These bitmasks are applied to register
values before using either of the fetch_op
instructions, and after any reads
that use AtomicT::load
/__atomic_load_N
.
If the bitvec
mask proofs were extendible to LLVM, and LLVM were to expand its
tracking of which bits of a memory address became poison
ed by a memory write,
and which bits of a fetched memory value became un-poison
ed by a masking
operation, then the compiler would be more able to observe that bitvec
memory
accesses do not observably interfere with each other. This observation would
then define the behavior in the compiler’s memory model of racing writes/reads,
and permit an increased (possibly even complete) removal of synchrony guards.
I am not aware of any processor hardware which fails to guarantee that all bits
of memory are fully defined at the clock edges of all instructions that use the
location. To the full extent my knowledge, all memory banks in all relevant
processors have a stable bit-value at the start of a tick, when reads occur, and
at the end of a tick, when writes commit. At no point does changing the value of
one bit of a memory component affect the electrical value of other bits in the
component.
This is not necessarily true of other storage devices, such as SSDs, but
`bitvec` can only be used to access storage cells mapped in the RAM address
space, which tend to all have this stability property.
Summary
The formal definition of the bitvec
memory model extends the Rust
mutable-exclusion rules by refining memory regions to have bit-precision instead
of element-precision. The Rust compiler is only weakly capable of tracking the
region status of individual bits, and only in compiler-internal structures. LLVM
has a more robust arbitrary-bit-tracking capability, but similarly limits its
interface to external code.
Barring any errors in the bitvec
implementation, the bitvec
memory model is
fully sound in its behavior with regard to single-observer unsynchrony.
Synchronization is only added in order to correctly interface with rustc
and
LLVM without causing either of them to introduce undefined behavior due to a
lack of information.
Footnotes
all references to T
where T
is either !Sized
, or
mem::size_of::<T>()
is non-zero, that is. A fun quirk of Rust’s first-class
concept of zero-width types is that the only illegal value for a &Zst
reference is null. Since there is nothing to load or store through a &Zst
reference, the compiler doesn’t care what the reference value is, as it will
never be used to perform memory access.
This is not absolutely true. Like we saw with UnsafeCell
, the only
immutable rule of compiler developers is that whenever they make an immutable
rule, they also provide a way to sidestep it. If you freeze
a poison
,
you are now free to read its value and act on it. LLVM just doesn’t make any
guarantees about what bit-pattern you’ll see.
I’d feel a great deal more comfortable if I had firm knowledge of what
those costs actually were. An atomic write always issues a lock
instruction modifier on x86, and I have heard vastly different things about
what that actually means, from “it’s free if no other cache holds that
address” up to “it poisons the whole cacheline”, and have not had much luck
producing a benchmark that firmly demonstrates that unneeded atomic access is
a strict performance cost.
In multithreading environments. Disabling atomics also disables bitvec
’s
support for multithreading, so the penalty for aliasing is reduced to an
inability to remove redundant reads.
Runtime Performance
bitvec
increases the instruction cost of each access to a bool
in its data
structures. This is an inevitable consequence of the fact that, even on
architectures that have them, compilers typically do not emit object code
instructions that access individual bits within a memory location. Therefore,
each access in bitvec
has, in addition to a memory operation, one or two
shift instructions one or two AND
/OR
/NOT
instructions.
This means that, inevitably, bitvec
is slower in CPU time than [bool]
is.
Measurements indicate roughly a factor of ten, but with also about 10x more
variance. However, this cost is only apparent and meaningful when walking the
entirety of very large buffers, and tends to fade into noise on smaller buffers,
or be obviated by compile-time fixed accesses. As always, try it on a
representative workload.
Benchmarking
I have tried (with admittedly low priority) to have some benchmarks in the
project to back up my claims that bitvec
is fast. This has been difficult to
maintain for a few reasons, but I have at least a few that have stayed present
in order to demonstrate important claims, such as showing that specialization
for matching types does provide massive performance benefits (it does).
In particular, LLVM is very good at propagating compile-time constants through
the code, and bitvec
strives to maintain an internal implementation that is
easily accessible to optimizers. This means that basically any benchmark that
takes input from a source file that the compiler can see gets artificially
solved during codegen.
For instance, I don’t know how long it takes to construct a &BitSlice
view
over memory, because my benchmarks report 0ns: LLVM computes my pointer encoding
at compile time, and a consequence of the way I designed my pointer encoding is
that the only operation BitSlice::from_slice
actually performs is .len << 3
.
When LLVM can see the original length, it just does this itself, and emits an
immediate with the correct length instead of the constructor call.
Other constructor benchmarks are only showing me the time required to run
memcpy
, and arbitrary indexing just shows the time required to run three
instructions, because LLVM solved the shift/mask arguments ahead of time.
The important takeäway here is that if your code is at all dependent on
constants that the compiler can see, and is not exclusively performing indexing
based on runtime inputs, then bitvec
is going to be plenty fast.
Pitfalls
Everything stated above relies on having information available to the compiler.
bitvec
falls behind other bit-reading libraries when you start using access
patterns only known at runtime, such as iteration.
Until Rust stabilizes the ptr_metadata
feature (see #81513), bitvec
will
necessarily take a performance hit because it has to decode the &BitSlice
pointer every time you access memory, and reëncode it every time you munch
through the region.
The bitvec
pointer encoding (described here) requires manipulating
both words in the pointer with at least three instructions each.
Except when using `u8` storage, which discards *most* of the modifications to
the address half of the pointer. Try that out if you do a lot more munching
through a region than bulk work on its contents!
This would not be necessary if the pointer could use its own metadata type
rather than usize
. Until that stabilizes, the entire value proposition of the
crate rests on the fact that &BitSlice
has slice-pointer ABI. If you
absolutely cannot afford to have decoding/reëncoding costs in your hot loop, you
may have to try other libraries.
bitvec
strives to use batch operations on entire integer registers when
possible. However, doing so requires routing through the domain
module, which
has to perform similar decoding and processing of a &BitSlice
pointer every
time it is entered. This is another unavoidable cost. It is generally a
tolerable overhead compared to walking through each bit individually, especially
with wider storage integers, but it is one more thing that other bit-addressing
libraries don’t do.
Other libraries also do not have alias safety or tolerance for starting a region away from the zero-index in an integer. Power has a price. We do our best to cut down the library’s runtime cost as much as possible, but this computation simply has to be done somewhere, and it's not built in to silicon anymore. Sorry.
Specialization
bitvec
strives to use its knowledge of the underlying memory representation
wherever possible. This means that operations between BitSlice
s with the same
type parameters can rely on an identical representation and use integer behavior
rather than walking each bit individually.
Try to do this wherever possible, especially in performance-sensitive code. You
typically should not be mixing bitvec
structures with different type
parameters anyway: use the representation that serves your needs, and keep using
it in all buffers that interact with it.
As an example, as of 1.0, walking two large BitSlice
s with incompatible type
parameters takes 3µs (microseconds), while walking the same sizes with
identical type parameters takes 100ns (nanoseconds). It’s roughly a 32x
performance difference, which is only half the speedup that I expected using
usize
on a 64-bit machine, but still quite stark.
Bit Slice Pointer Encoding
bitvec
’s core value proposition rests on the fact that it is capable of
defining an unsized slice type, and controlling references to it. The Rust
language rests heavily on the two reference types &
and &mut
, and does not
ordinarily allow these to be faked or created by anything other than the
compiler.
Rust Reference Rules
It so happens that not only does Rust strongly guarantee the in-memory layout
of a reference to a slice, it also provides a stable API for
constructing values of &[T]
type without using mem::transmute
. Subject to
certain value requirements imposed by types, slice references can be constructed
through these functions and the compiler will accept them as valid.
These requirements traditionally make it difficult to encode non-address information into a bare reference, since the compiler has a very firm expectation that a reference to a type is immediately dereferenceäble to a value of that type, but if your type happens to be zero-sized, then it can never exist in memory, no loads or stores to it can ever be produced, and the compiler no longer concerns itself with the actual bit-pattern value of references to it.
Which is why the definition of BitSlice
is
// src/slice.rs
#[repr(transparent)]
pub struct BitSlice<T, O>
where
T: BitStore,
O: BitOrder,
{
_mem: [()],
_typ: PhantomData<T>,
_ord: PhantomData<O>,
}
BitSlice
is [()]
(a slice of the unit value) with some markers that only the
type-checker can see. &BitSlice
is thus &[()]
, and &[()]
can have any
values it wants (except, of course, null) – the unit value has no alignment
requirements, can be placed anywhere in memory without worrying about whether
there is a backing allocation, and can have as many instances of itself as
desired.
Zero-sized types are an absurdly powerful concept when working with memory that the language expects to be able to manifest at any time.
Pointer Encoding
Slice references contain two pieces of information: the address of the base element, and the number of elements, starting at the base, contained in the slice region. Theoretically, bit-slice references have the same pair of information: the address of the first bit, and the number of bits in the region.
However, computers are byte-addressed, not bit-addressed, so we need to store
three more bits (to select a bit in the base byte) somewhere in the reference.
Since slice references are defined as { base: *T, elts: usize }
, and there are
no1 spare bits in *const _
, the bits to store the base bit are taken out of
the length counter.
Reference address values are also required to be integer multiples of the
alignment of the referent type T
. This alignment is, on all supported targets,
the width in bytes of the referent type. As a result, there are as many low bits
in the address of any T
that are guaranteed to be the 0
value, as there
are bits needed to select a byte within the element. The end result is that the
length counter must always use three bits to store the starting bit, and the
base address will be composed of an aligned T
address and an index of the
starting byte within it.
As Rust does not have bitfield syntax, a definition of the pointer structure in C++ looks like something like this2:
template <typename T>
struct BitSpan {
static_assert(
std::is_unsigned<T>()
&& sizeof(T) <= sizeof(std::size_t)
&& sizeof(T) <= alignof(T)
);
// on little-endian systems, bitfields are
// allocated from LSbit and move towards MSbit
uintptr_t ptr_head : __builtin_ctzll(alignof(T));
uintptr_t ptr_addr : (sizeof(uintptr_t) * 8)
- __builtin_ctzll(alignof(T));
size_t len_head : 3;
size_t len_bits : (sizeof(size_t) * 8) - 3;
};
In Rust, the structure is declared as
// src/pointer.rs
#[repr(C)]
pub struct BitSpan<T, O>
where
T: BitStore,
O: BitOrder,
{
ptr: NonNull<u8>,
len: usize,
_ty: PhantomData<T>,
_or: PhantomData<O>,
}
and the logical components must be accessed through get/set functions, rather than through compiler-generated field stubs.
By marking the pointer as NonNull
, BitSpan
declares that it will never be a
null pointer and becomes subject to the same peephole optimization that allows
mem::size_of::<Option<&T>>() == mem::size_of::<&T>()
. By marking it as
unconditionally a pointer to u8
, we declare that all low bits of the address
value are in use, and none can be used as slots for anything else (since our
encoding is using them to select a byte within the T
).
Significant Values
BitSpan<T, O>
does not have any sentinel values of its own, but inherits from
NonNull<T>
. The completely zero value is not a valid member of the BitSpan
type, but rather indicates Option::<BitSpan<_, _>>::None
, and it uses the
dangling NonNull
pointer value to indicate an instantiated pointer object
without an associated allocation.
Not all zero-length regions are dead: a cleared BitVec
region has zero length
but owns an allocation, so it cannot discard its address information.
Summary
Rust requires that slice references have a specific ABI, but makes no
requirements about the encoding of values of those references for certain types.
We can supply our own ABI-equivalent structure, define functions that use the
structural encoding to compute the information needed to actually interact with
memory, and convert our structures into Rust-accepted slices through the
provided compiler API in core
.
If and when the ptr_metadata
feature stabilizes, bitvec
will experiment with
discarding this packed encoding in favor of a three-word pointer. If the
unpacked pointer results in better performance by eliminating the need for the
special encoding, bitvec
will release a new minor version with the changed
structure.
`bitvec`’s MSRV policy is that raising compiler requirements is a minor change,
not major, and the pointer ABI is **not public interface**! You are already
forbidden from moving bit-region pointers out of a program, so this change will
not affect your program’s behavior.
Footnotes
On AMD64, pointers are actually aggregates of MMU translation pages, and
processors only decode the low 48 or 57 bits of them, leaving the high 16 or 7
bits available for other information not part of the memory addressing system.
However, these processors also trap when attempting to dereference a pointer
whose high [48:64]
or [57:64]
bits do not have the same bit value as bit
[47]
or [56]
, and that bit is typically used to differentiate unprivileged
user memory from privileged kernel memory. Furthermore, this dead region does
not exist on 32-bit architectures, x86 or otherwise, and since bitvec
explicitly supports 32-bit systems, the use of dead bits only present on a
subset of supported targets and subject to their own extra rules is not
worthwhile.
Here is a full code listing which you can also view on Godbolt:
// compiles on x86-64 clang 15.0.0
#include <climits>
#include <cstddef>
#include <cstdint>
#include <type_traits>
static_assert(CHAR_BIT == 8, "this target is not supported");
template <typename T>
struct BitSpan {
static_assert(
std::is_unsigned<T>()
&& sizeof(T) <= sizeof(std::size_t)
&& sizeof(T) <= alignof(T),
"this type is not supported as BitSpan storage"
);
uintptr_t ptr_head : __builtin_ctzll(alignof(T));
uintptr_t ptr_addr : (sizeof(uintptr_t) * 8)
- __builtin_ctzll(alignof(T));
size_t len_head : 3;
size_t len_bits : (sizeof(size_t) * 8) - 3;
};
template <>
struct BitSpan<uint8_t> {
// ptr_head is zero bits wide when targeting bytes
uintptr_t ptr_addr;
size_t len_head : 3;
size_t len_bits : (sizeof(size_t) * 8) - 3;
};
static uint64_t data[4];
BitSpan<uint8_t> one() {
return {
.ptr_addr = (uintptr_t)&data[0],
.len_head = 1,
.len_bits = 6,
};
}
BitSpan<uint16_t> two() {
return {
.ptr_head = 1,
.ptr_addr = (uintptr_t)&data[1],
.len_head = 1,
.len_bits = 5,
};
}
BitSpan<uint32_t> four() {
return {
.ptr_head = 2,
.ptr_addr = (uintptr_t)&data[2],
.len_head = 3,
.len_bits = 10,
};
}
BitSpan<uint64_t> eight() {
return {
.ptr_head = 4,
.ptr_addr = (uintptr_t)&data[3],
.len_head = 5,
.len_bits = 25,
};
}
Miscellaneous Implementation Details
bitvec
has a number of internal implementation details used to facilitate
development but are not part of the public API. While not user-facing, these
details are nevertheless important to document as explanations for how the
library is built.
Integer Refinement
bitvec
offloads abstraction over the fundamental integers to funty
,
described earlier in this guide.
funty
only unifies the standard-library APIs of the integers into trait-based
code; bitvec
further extends this with useful constants in the BitRegister
trait. This trait delimits the integers that correspond to machine registers
actually available for use as storage, and provides minor conveniences for
working with them.
Specialization Hacks
bitvec
is built to be agnostic to the ordering of bits within an element. This
is an important aspect of its design, and even if no other ordering than the
provided Lsb0
and Msb0
are used, must remain so that these two orderings
can be used on equal footing. However, the deferral of register operations to
type parameters outside of the data structures’ control results in some
unpleasant performance losses that would not occur in a hand-written equivalent.
Some operations, like copying between, or comparing, slices, can be accelerated
with partial-element access, but require knowledge of the O
ordering type to
provide a semantic interpretation of register contents. Language-level
specialization could allow writing override impl
blocks, like this:
impl<T, O> BitSlice<T, O>
where
T: BitStore,
O: BitOrder,
{
fn eq(&self, other: &Self) -> bool {
todo!("baseline")
}
}
impl<T> BitSlice<T, Lsb0>
where T: BitStore {
fn eq(&self, other: &Self) -> bool {
todo!("lsb0-accelerated version")
}
}
impl<T> BitSlice<T, Msb0>
where T: BitStore {
fn eq(&self, other: &Self) -> bool {
todo!("msb0-accelerated version")
}
}
While waiting on this feature, we can use the compiler’s stable TypeId
API to
simulate access to specialization. By comparing the TypeId
of the O
type
argument to the TypeId
s of bitvec
’s provided orderings, functions can detect
when they are in a monomorphization with an Lsb0
or Msb0
ordering argument
and specialize accordingly. The above block can be replaced with:
impl<T, O> BitSlice<T, O>
where
T: BitStore,
O: BitOrder,
{
fn eq(&self, other: &Self) -> bool {
if let (Some(this), Some(that)) = (
self.coerce::<T, Lsb0>(),
other.coerce::<T, Lsb0>(),
) {
todo!("lsb0-accelerated version")
}
else if let (Some(this), Some(that)) = (
self.coerce::<T, Msb0>(),
other.coerce::<T, Msb0>(),
) {
todo!("msb0-accelerated version")
}
else {
todo!("baseline")
}
}
}
and, during monomorphization, only one branch of the if
stack will be
preserved. The .coerce()
method is defined in slice::specialization
and
provides easy access to a fully-typed value only within the monomorphization
that matches it. It is not public API – your code using bitvec
should prefer
being fully typed rather than generic.
Afterword
Thanks for reading! We hope this guide was useful to you.
Please feel both welcomed and encouraged to ask us any clarifying questions
(contact information for myrrlyn is in the CONTRIBUTING
document), or to
file an issue if any of it is confusing or even send a PR if you have
improvements!
Ferrilab has been a work of intensely deep focus and research for over four years. It is a complex and difficult topic, and we are absolutely certain that we have information and context that we haven’t sufficiently serialized for everyone else to see.
Thank you for using our crates. We are delighted that other people have found our work both interesting and useful, and we’re glad that we’ve been able to help people like you write better programs.