Dedication
I began working on bitvec
shortly before I was told that my father had been
diagnosed with cancer for the third time. Developing the library gave me
something into which to sink my attention and keep my mind from dwelling on his
rapidly-progressing illness. I wrote the core pointer representation that
enables bitvec
’s modern behavior at his side, in the last days that he
remained conscious.
I never had the chance to explain to my dad what I was building. By the time it
was developed enough to be worth explaining, he had lost too much of his brain
to understand me. More than anything I’ve done for my employers, bitvec
is the
work of which I’m most proud, and which I most wish he could have seen.
The bitvec
project is dedicated to the memory of my father, George M. Payne,
who inspired me to build quality work, supported my search for my talents, and
taught me to labor for the benefit of others beyond myself.
Introduction
Programmers have sought to model memory as a one-dimensional stream of bits for
as long as hardware manufacturers have sought to chunk it into wider and wider
words. bitvec
is far from the first library to provide this model, and it will
likely not be the last. It is, however, among the best (at least in my opinion).
bitvec
was built out of my experience and frustration with performing I/O
buffer manipulation using C, C++, and Ruby. My work required programs capable of
dynamically selecting a region of a bitstream, a task to which C-style bitfield
declarations were unsuited, and it required those programs to be fast, which is
not an adjective one commonly associates with Ruby engines.
Furthermore, my work involved message schemata that were permitted to select a bit ordering at the packet and field level. This is not a behavior that any existing bitstream library or language feature provides. These experiences informed my goals and design choices from the very beginning.
bitvec
matches, and exceeds, the functionality of every other bitstream
implementation I have found. It is also the only Rust crate that is a drop-in
replacement for standard library types, and is able to do so without taking a
performance loss. Thanks to excellent compiler engineering by the Rust and LLVM
teams, it often produces code that exactly matches, and sometimes surpasses,
the bit-shift/mask assembly logic you would write yourself.
Goals of This Book
I intend for this book to serve as an explanation of bitvec
’s design choices
and philosophy. It is not a detailed exploration of the crate APIs – docs.rs
exists – but instead seeks to communicate how to think about bitvec
so that
you will know how to use the APIs offered.
The best way that I know how to communicate this information is as a dialogue between me, the author, and you, the user. Since this is a book, not a conversation, I actively encourage you to get in contact with me with any questions through the channels listed in the repository’s CONTRIBUTING document, and throughout the book I will periodically remind you that if a section is unclear, it is an error on my part, and I would appreciate an issue or other contact so that I can improve it.
Data Structures
bitvec
’s primary exports are four data structures: BitSlice
, BitArray
,
BitBox
, and BitVec
. These correspond to the [bool]
,
[bool; N]
, Box<[bool]>
, and Vec<bool>
types in the
Rust standard libraries. Unlike the Rust types, the bitvec
types are not
composable, and cannot be mixed with any other types, including pointers in the
standard library.
The bitvec
types implement the APIs of their standard-library counterparts to
the fullest extent possible. The only missing feature is currently
IndexMut<usize>
, preventing collection[index] = bit;
assignment. This means
that, for users looking for compact usize => bool
collections, substituting
types in your project codebase ought to be enough to make your project Just
Work™️.
It is the fact that BitSlice
acts exactly like [bool]
, and can only be used
through &BitSlice
and &mut BitSlice
references, that makes bitvec
unique.
No other Rust library has this capability.
Before we explore the data structures in more detail, there are three caveats I must provide:
-
bitvec
uses an opaque, custom, pointer representation for everything exceptBitArray
. You may not inspect or modify this pointer. You may not use it as a type or value parameter in any other types or functions. You will break your program if you try.bitvec
ensures that this pointer encoding does not fail the compiler and language requirements for reference correctness. The rules used to do so are internal to the crate, and will not be present outside it.bitvec
pointers are perfectly safe to use, as long as you treat them as completely opaque and use only the interfaces provided. -
These structures all have two type parameters,
<T: BitStore, O: BitOrder>
. These parameters are described in the next chapter. They govern the in-memory representation of data storage, but are not relevant to the general use of the handle types. -
bitvec
trades an increased memory space efficiency for decreased instruction size and speed efficiency. The compiler may optimize some of the costs away, butbitvec
structures are not free to use. The “zero-cost” of thebitvec
abstraction is that you cannot write a better bitset, and not that it is equal in runtime performance to an ordinarybool
sequence.
Now that the disclaimers are over, we can talk about how the types actually work.
Slices
The base type of the project is BitSlice
. This is a region type, like
[bool]
, and cannot be held directly. Instead, it is accessed by borrowed
references (&BitSlice
, &mut BitSlice
) or owning handles (BitArray
,
BitBox
, BitVec
). The distinction between the handles and the region is the
same as it is in ordinary Rust types.
The BitSlice
region is able to begin and end at any bit in memory, and is not
restricted to having one edge aligned to the edge of its initial element. This
restriction, present in all of its competitors, is removed through the use of a
special encoding in all pointers to the region, which stores the starting bit of
the base element in part of the slice pointer that describes the real memory.
There are eight bits to a byte on all systems Rust targets, and therefore the
index of a bit within a byte is itself three bits wide. These bits are taken
from the length counter of a slice pointer, which means that BitSlice
is able
to address only ⅛th of the indices that [bool]
can.
This is 64 Mebibytes on a 32-bit system, and 256 Pebibytes on a 64-bit system. If you can even allocate that much real memory in one handle, then you have very different operating conditions than I can help you with.
Getting a BitSlice
BitSlice
is strictly a borrowed region. It can neither be created nor
destroyed; rather, views to it are acquired from a memory buffer that some other
binding owns.
The BitStore
chapter covers this in more detail, but only slices of the
unsigned integers u8
, u16
, u32
, usize
, and (on 64-bit targets) u64
can
be used as the source memory for a BitSlice
. (You can also use their Cell<>
wrappers or atomic variants; this will be discussed later).
Borrowing Constructors
The simplest way to create a BitSlice
reference is to borrow it from ordinary
Rust data. The BitView
trait, available in the prelude, implements methods
on the supported unsigned integers, all arrays of them, and their slices.
#![allow(unused)] fn main() { use bitvec::prelude::*; let byte = 0u8; let bits = byte.view_bits::<LocalBits>(); let array = [0u16; 2]; let bits = array.view_bits::<Lsb0>(); let mut array = [0u32; 3]; let slice = &mut array[..]; let bits = slice.view_bits_mut::<Msb0>(); }
The .view_bits()
and .view_bits_mut()
methods take the other type parameter
bitvec
requires. This is described in the BitOrder
chapter. Use Lsb0
until you have a specific need for a more precise parameter.
In addition, BitSlice
offers constructor functions ::from_element()
,
::from_slice()
, and their _mut
variants, which borrow elements and slices,
respectively, and construct &/mut BitSlice
references from them. The trait
methods are generally easier, and certainly shorter to write, but they all do
the same work.
Lastly, empty slices can be produced with the ::empty()
or ::empty_mut()
functions, since there is no &[]
or &mut []
literal available.
Macro Constructor
In addition to these method constructors, you may also use the bits!
constructor macro. This macro runs at compile-time to produce a buffer
containing the correct data values, then borrows it as a BitSlice
reference.
It is a macro_rules!
macro, not a procedural macro, and should not have a
significant impact on your compilation times.
By default, the produced buffer is a temporary that the compiler will then
extend to have the minimum lifetime of the produced reference handle. However,
you can use the static
keyword to cause the macro to produce a hidden and
unnameable static BitArray
backing buffer, which then provides the
&'static BitSlice
lifetime. Since this static
buffer cannot be named, it is
safe to use even when mut
able, as the provided reference is the only handle to
it.
The macro syntax extends that of vec!
. The simplest invocations are sequences
or repetitions of expressions, which can optionally be made mut
able:
#![allow(unused)] fn main() { use bitvec::prelude::*; let r = bits![0, 1, 0, 1]; let w = bits![mut 0, 1, 0, 1]; let r2 = bits![static 1; 4]; let w2 = bits![static mut 1; 4]; }
You are not required to use the literals
0
or1
; you can use any expression that isconst
-evaluable and can be placed into the expressionexpr != 0
. This means that you cannot use the names of runtimelet
bindings, but can use the names ofconst
bindings, or other literals. You probably do not want to do this, but you can.
In addition, you can specify the bit-ordering parameter and the integer storage
parameter, for even more precise control over the memory layout. If you do not
specify them, the macro uses the default parameters of usize
storage and
Lsb0
ordering.
#![allow(unused)] fn main() { use bitvec::prelude::*; let in_bytes = bits![u8, LocalBits; 0, 1, 0, 1]; let in_shorts = bits![u16, Lsb0; 0, 1, 0, 1]; let in_ints = bits![mut u32, Msb0; 0; 4]; }
To summarize the macro rules:
- If the first macro argument is
mut
, then the macro produces&mut BitSlice
, otherwise it produces&BitSlice
. You do not need to bind the name asmut
unless you want to reässign it to a different slice. - You may then optionally provide the storage and ordering type parameters,
followed by a semicolon. If you choose to add type parameters:
- You must provide the bit-ordering parameter.
- You may provide the storage parameter.
- The data input to the macro is one of the two
vec!
token lists:- One or more expressions that can be placed into
$expr != 0
, separated by commas. A trailing comma is permitted. - A single expression that can be placed into
$expr != 0
, followed by a semicolon and a repetition counter. The resultingBitSlice
will becounter
bits long, all set toexpression
.
- One or more expressions that can be placed into
Emulation tests indicate that
bitvec
correctly instructs the compiler to produce suitable buffers even when compiling for a target with a different byte-endianness than the host. However, I have not actually performed such cross-compilation and testing with real hardware. It should be correct; please file an issue if it is not.
What BitSlice
Can Do
Now that you have acquired a BitSlice
reference, either by borrowing memory
from elsewhere in your program or by creating a temporary, it is time to do some
actual work with it.
… That [bool]
Can
Everything1. I am not going to rewrite the standard library’s slice documentation here.
… That [bool]
Cannot
In addition to the standard library [bool]
API, BitSlice
offers some
inherent methods tailored to its specialization.
Set Queries
The five query methods .any()
, .all()
, .not_any()
, .not_all()
, and
.some()
test how many bits in a region are set to 1
. As you can guess from
the names, these methods have the following truth table:
Slice | any | all | not_any | not_all | some |
---|---|---|---|---|---|
00 | false | false | true | true | false |
01 | true | false | false | true | true |
11 | true | true | false | false | false |
any
is the Boolean OR operator; all
is the Boolean AND operator, and some
is the Boolean XOR operator.
In addition, .count_ones()
and .count_zeros()
count how many bits of the
slice are set to one or zero, rather than merely indicating whether any exist.
These methods are slower than the Boolean queries, which are capable of
short-circuiting once satisfied. You can also use .iter_{ones,zeros}()
to walk
each index of bits with the specified value. These are equivalent to running
.filter()
and .enumerate()
calls on iterators of bool
, but are specialized
to use dedicated bit-counting instructions where processors provide them.
Boolean Arithmetic
bitvec
data structures all implement the Boolean operators (&
, |
, ^
, and
!
) against each other.
In version 0, they allowed any
impl Iterator<Item = bool>
. This has been changed for performance reasons, since people never used the arbitrary iterator support but did require improved behavior when operating on two bit-slices.
#![allow(unused)] fn main() { use bitvec::prelude::*; let mut or = bits![mut 0, 0, 1, 1]; or |= bits![ 0, 1, 0, 1]; assert_eq!(or, bits![ 0, 1, 1, 1]); let mut and = bits![mut 0, 0, 1, 1]; and &= bits![ 0, 1, 0, 1]; assert_eq!(and, bits![ 0, 0, 0, 1]); let mut xor = bits![mut 0, 0, 1, 1]; xor ^= bits![ 0, 1, 0, 1]; assert_eq!(xor, bits![ 0, 1, 1, 0]); let mut not = bits![mut 0, 1]; not = !not; assert_eq!(not, bits![ 1, 0]); }
Writing To Memory
You can set all bits in a region to a new value by using the .fill()
method,
or you can set one bit in a region to a new value by using either the .set
or
.get_mut
methods. .get_mut
produces a proxy type which acts roughly like an
&mut bool
reference slot.
#![allow(unused)] fn main() { use bitvec::prelude::*; let bits = bits![0; 4]; assert!(bits.not_any()); bits[0 .. 1].set_all(true); assert!(bits[0]); bits.set(1, true); assert!(bits[1]); *bits.get_mut(2).unwrap() = true; assert!(bits[2]); let mut bit = bits.get_mut(3).unwrap(); assert!(!bit); *bit = true; assert!(bits[3]); assert!(bits.all()); }
The proxy type produced by .get_mut()
implements DerefMut<Target = bool>
, so
you can assign into it and read out of it. However, it does not commit the value
assigned into it back to its source BitSlice
until it Drop
s.
You can force the destruction of a named proxy reference by using its
.commit()
method, which takes self
by value, destroying it and releasing the
borrow.
Viewing the Underlying Memory
The memory underlying any bit-slice region is subject to some restrictions about
aliasing that are documented more thoroughly in the domain
module and the
Memory Model chapter. In short, borrowed BitSlice
regions cannot view
their underlying memory directly without violating aliasing rules established by
either the Rust language or by bitvec
itself. Instead, the .domain()
and
.domain_mut()
methods provide views that correctly handle aliasing and edge
conditions, and mediate access to the underlying memory.
The owning handles (BitArray
, BitVec
, and BitBox
) do not have this
limitation, as they can guarantee unique access to a memory location without any
possibility of aliasing. As such, these types all have .as_raw_slice()
and
.as_raw_mut_slice()
methods that provide ordinary slice views to their storage
region.
Footnotes
Except write-assignment through indexing. I am not going to keep mentioning this exception.
Arrays
While BitSlice
describes a region of borrowed data, BitArray
provides a
container that can hold and manage such a region.
It is most comparable to the C++ type std::bitset<N>
. Unfortunately, the
Rust support for type-level integers is still experimental, so it is unable to
take the length of the BitSlice
it contains as a type parameter. Instead, it
must take the entire region it contains as a type parameter. The full type
declaration is
use bitvec::prelude::*;
pub struct BitArray<
A: BitViewSized,
O: BitOrder,
> {
_ord: PhantomData<O>,
data: A,
}
As described in the previous chapter, the BitView
trait is implemented on
the unsigned integers, and on arrays of them. Currently, array support is
limited to arrays up to and including 32 elements long; as Rust type-level
integers mature, this will grow to include all arrays.
Once type-level computation stabilizes,
BitArray
will change to have the type parameters<T: BitStore, O: BitOrder, const N: usize>
, matching thestd::bitset<N>
length parameter.
This array dereferences to a BitSlice
region over its entire length. It does
not currently permit shortening its BitSlice
from either end. If this is a
behavior you want, please file an issue.
Using a BitArray
The ::ZERO
constant is a blank BitArray
with its memory completely zeroed.
The ::new()
function wraps an existing element or array into a BitArray
. In
addition, the macro constructor bitarr!
takes the exact same arguments as the
bits!
constructor, except that it returns an array directly rather than a
reference to a buffer.
Furthermore, BitArray
structures and references can be constructed from
&BitSlice
references using the TryFrom
trait, just as arrays can be
constructed in the standard library.
Once constructed, BitArray
offers the .as_bitslice()
and
.as_mut_bitslice()
explicit methods, as well as all the standard traits, to
borrow its data as a BitSlice
. The array has almost no functionality of its
own, and serves only to own a region used as a BitSlice
.
Once you are done using BitSlice
to manipulate the array, you can remove the
array with .into_inner()
and regain the A
memory within.
That’s everything that the array does! Like regular arrays, it is useful
primarily for its ability to move memory through a program, and has essentially
no behavior in its own right. It is useful for programs that do not have access
to a dynamic allocator, and do not wish to use static
buffers. However, if you
do have access to an allocator, you will probably prefer to use BitVec
instead.
Vectors
bitvec
has two types that require compiling the crate with feature = "alloc"
(and linking against the Rust distribution crate alloc
): BitVec
and
BitBox
. BitVec
is a dynamically resizable vector, equivalent to the C++ type
std::vector<bool>
and the Rust type Vec<bool>
. BitBox
is a frozen
vector, incapable of changing its length, equivalent to the Rust type
Box<[bool]>
or the C++ type std::unique_ptr<std::bitset<N>>
.
Since BitBox
is a vector that cannot .push()
or .pop()
, I will not discuss
it in detail here. It is a heap-allocated BitSlice
, and is otherwise
uninteresting.
Getting a BitVec
BitVec
implements the constructors shown in the standard library: ::new()
creates a handle without any allocation, BitSlice
implements the
.to_bitvec()
method, and and ToOwned
trait, to copy a bit-slice into a new
bit-vector. Additionally, the bitvec!
macro takes all the bits!
arguments
and produces an owned allocation instead of a stack temporary. You can also
construct a BitVec
by .collect
ing any iterator of bool
s.
#![allow(unused)] fn main() { use bitvec::prelude::*; let a: BitVec = BitVec::new(); let b = bits![0, 1, 0, 1].to_bitvec(); let c = bits![0; 4].clone(); let d = bits![u8, Msb0; 1; 4].to_owned(); let e = bitvec![0, 1, 1, 0]; let f = bitvec![u16, Msb0; 0; 4]; let g = [true, false, true, false] .iter() // &bool .copied() // bool .collect::<BitVec>(); }
Using a BitVec
Once constructed, BitVec
implements the entire API that Vec
does in the
standard library. This remains uninteresting to write out.
Like BitSlice
, BitVec
and BitBox
are implemented as stack handles that use
the specially-encoded pointer to describe their region. This enables them to
remain the same size as their standard-library counterparts, while making them
completely opaque to inspection.
Because they are fully owned, BitVec
and BitBox
have some important
behavioral differences from BitSlice
. Primarily, because they do not have to
worry about other handles viewing the memory under their control, they can
modify the contents of their buffers outside the BitSlice
that is considered
live, and they do not exclude partial edge elements when viewing their buffer as
raw memory. If you are using BitVec
to construct an I/O buffer, these two
facets can have surprising results if you are not careful to fully initialize a
memory span.
Vectors, like slices, do not need to begin at the zeroth index of the base
element. They can begin, and end, at any bit in any element. This will only
happen when copying a vector from a source slice that was misaligned. The
.force_align()
method moves the vector’s live slice down to start at the zero
index. Once done, extending the live slice to reach the last index of an element
ensures that viewing the buffer as raw memory will have no uninitialized bits.
Type Parameters
bitvec
uses type parameters to permit precise user control of its behavior and
in-memory representation. The Rust generic system permits bitvec
to have a
more powerful and capable behavior than any other bitstream library yet
implemented in any language.
All bitvec
types take two type parameters. The first denotes the storage type
being used: for everything but BitArray
, this is an implementor of the
BitStore
trait, and denotes the integer component of an underlying slice;
for BitArray
, it is an implementor of BitViewSized
, and is the entire
storage block. The second is an implementor of BitOrder
, and informs how the
structure translates a semantic index into a memory access.
The combination of these two parameters governs how a bitvec
type computes its
accesses to memory.
The next two chapters describe each trait and their implementors in more detail. You may be able to skip them with this sentence as a good-enough summary:
Use <Lsb0, usize>
as the parameters if you are implementing a collection and
do not care about memory layout; if you are implementing an I/O protocol
specification, the specification document will tell you what ordering and unit
size it requires.
Rust syntax requires explicitly choosing type parameters when using generic
expressions, such as BitVec::<Store, Order>::new()
, and will not substitute in
the default parameters when attempting to elide the parameters with
BitVec::new()
. However, Rust will use the default type parameters in
patterns: let bv: BitVec = BitVec::new();
will use the default type parameters
in the : BitVec
type annotation, which then completes the type of the
expression on the right side of the assignment =
.
Bit Ordering
bitvec
expects user code to count semantic indices, and hides the actual
position of a bit within a memory element from users. This allows user code to
treat indices as a uniform domain of integers in 0 ..= !0 >> 3
, and not have
to remember whether place - 1
means moving “forward” or “backward” in the
buffer.
Internally, each *BitSlice
pointer contains an element address and a bit
index. The pointer uses its BitOrder
type parameter to translate the bit index
into a one-hot selector mask that drives actual memory access.
BitOrder
is open to user implementation, and implementations of it are trusted
to be sound in the bitvec
memory model. For this reason, the trait is unsafe
to implement. Most users will not implement it; almost all users want one of the
two monotonic orderings provided for you. However, some specialized users may
have an ordering of their own, and they are still able to encode that ordering
away from their index logic.
Provided Orderings
bitvec
provides two orderings: Lsb0
and Msb0
. These each refer to
which numeric bit in a register element is considered to be the zero-index.
You can think of these as corresponding to the little-endian and big-endian processor byte orderings if you like, as long as you remember that your choice of bit-ordering is not at all related to the byte-ordering of your target processor.
Lsb0
The Lsb0
type sets the zero index at the least significant bit of a register
(numeric value ) and each successive index selects the
next more significant bit in the register, until the most significant bit is at
the final index.
It is the expression mask = 1 << index;
.
Msb0
The Msb0
type sets the zero index at the most significant bit of a register
(numeric value
for an n
-bit register) and each successive index selects the next less
significant bit in the register, until the least significant bit is at the final
index.
It is the expression mask = (iN::MIN as uN) >> index;
.
LocalBits
The type alias LocalBits
refers to the ordering that your target’s C compiler
will likely choose for its bitfield direction. This is Lsb0
on little-endian
byte-ordered processors, and Msb0
on big-endian byte-ordered processors.
Remember that the BitOrder
implementations and processor byte orderings have
no relation to each other! This is only a custom, not a requirement of the
processor architecture.
Default Ordering Parameter
Lsb0
is the default type parameter used by the sequence types, as it produces
selection masks using the starting value 1
, which encodes to smaller
instructions than the Msb0
starting value.
On AMD64, the pairs <u64, Lsb0>
and <u64, Msb0>
produce the following object
code and disassembly:
ba 01 00 00 00 movl $1, %edx
48 d3 e2 shlq %cl, %rdx
48 ba 00 00 00 00 00 00 00 80 movabsq $-9223372036854775808, %rdx
48 d3 ea shrq %cl, %rdx
The Msb0
load requires an additional four bytes of zeros in its immediate,
and the 64-bit modifier prefix (0x48
), in order to encode movabsq
instead of
movl
Implementing BitOrder
As stated above, this is an unsafe trait
because bitvec
relies on its
implementations to uphold certain mathematical invariants. Failure will result
in memory unsafety and/or program crashes.
BitOrder
has three functions: at
, select
, and mask
. Implementors are
required to provide at
; select
and mask
have default implementations that
rely on it. These functions are all generic over the BitMemory
trait; this
trait describes the register types (unsigned integers) that can be used by
bitvec
. It provides some useful associated constants, and is otherwise
uninteresting.
-
::at()
receives the semantic index of a bit within a register type, and must produce the concrete position corresponding to the semantic index. The input and output are both integers in the domain[0, W)
whereW
is the bit-width of the register type being indexed.at
must implement an exactly one-to-one mapping from all inputs to all outputs in the[0, W)
domain. This mapping must never observably change. These are strict requirements of the library, and failing to uphold either will break your program. -
::select()
receives the semantic index of a bit within a register type, and must produce a value of that register type with exactly one bit set. The produced value is a mask that selects only the bit specified by the provided index, and will be used in Boolean arithmetic to manipulate memory.The default implementation is
1 << at(index)
. You are required to maintain this behavior in your override. -
::mask()
receives an inclusive start index and an exclusive end index, and must produce a register value that selects every bit in the indexed range.The default implementation is
(start .. end).map(select).sum()
. You are required to maintain this behavior in your override.
Support Types
The BitOrder
trait APIs use supporting types to enforce requirements on the
bare numbers being passed through it. These types are documented in the index
module. They all provide a .value()
method that removes the wrapper and yields
the inner number, and a ::new()
constructor that ensures that values to be
wrapped uphold the type’s requirements.
at
andselect
receive aBitIdx<M>
argument. This is a wrapper overu8
that ensures that the contained value is in the domain0 .. M::BITS
. It serves to indicate that a number is a semantic counter, not an electrical position.at
returns aBitPos<M>
value. This has the same behavior and restrictions asBitIdx<M>
, and indicates that the number is an electrical position within a register.select
returns aBitSel<M>
value. This wraps anM
register value, and ensures that the contained number is a power of two – exactly one bit is set, and all others are zero. This type indicates that the mask is guaranteed to select exactly one bit in a register.mask
receives an inclusiveBitIdx<M>
and an exclusiveBitEnd<M>
argument, and returns aBitMask<M>
value.BitEnd<M>
ensures that the contained number is in the domain0 ..= M::BITS
, including the final count, and marks a one-past-the-end exclusive boundary.BitMask<M>
marks that the contained number may select any number of bits in a register.
New implementors of BitOrder
will use these types to satisfy behavior
requirements individually.
In addition, implementors’ test suites should call the function
order::verify_for_type::<O, M>()
to check that an implementation O
satisfies
the behavior requirements for a particular register type M
, or call
order::verify::<O>()
to check that an implementation satisfies the behavior
requirements for all supported register types. These functions are conditional
upon cfg(test)
, and accept a verbose: bool
parameter that allows the test
functions to print diagnostics to stdout
during evaluation.
If the verification functions panic, your implementation is incorrect, and
cannot be safely used in bitvec
.
BitStore
The BitStore
trait governs the processor behaviors used to interact with the
memory of a BitSlice
buffer. These include both the width of the processor
register used to contain a memory element, and the load/store instructions the
processor uses to move data across the memory bus.
BitStore
has no behavior of its own, and serves only to collect associated
types and constants. It cannot be implemented outside bitvec
, and is a closed
portion of the API. You can freely use the trait as a bound in types that
contain bitvec
structures, but should not otherwise attempt to make use of it.
Implementations
BitStore
is implemented on all unsigned integer types not larger than a target
processor’s word size, all Cell<T>
wrappers of them (as Cell
is a compiler
directive), and their AtomicT
variants.
Not all processors have atomic instructions for all their scalar registers. The
compiler maintains a list of all atomics available on all supported targets, and
exposes this list as unstable attributes in the standard library, unavailable to
user code such as bitvec
. bitvec
uses the radium
crate (which I also
maintain) to provide information about atomic support for a register width, and
radium
maintains a best-effort guess at what atomics are available.
On architectures with missing atomics, bitvec
’s default feature set will cause
a compiler error when you attempt to instantiate a bitvec
structure with the
register that is missing an atomic variant. You can fix this by using a narrower
register that does have atomic instructions, or by disabling default-features
and not enabling the "atomic"
feature. Please file an issue with
radium
with your target and failing type, so that we can improve our
precision.
Associated Types
The Mem
associated type names the scalar integer corresponding to the
BitStore
type. u8
, Cell<u8>
, and AtomicU8
all implement BitStore
with
their Mem
type assigned as u8
; the same is true for the wider registers.
This type is used to create selection masks in the processor and permit access to unaliased memory.
The Access
associated type names the type used to implement memory access. The
BitAccess
trait is an internal bridge to Radium
that allows a consistent
memory API, regardless of instructions used. All reads from and writes to memory
route through this association and trait.
Lastly, the Alias
associated type enables bitvec
to gracefully and correctly
handle events that cause multiple handles to alias the same memory address. This
association is used in .split_at_mut()
to select the alias-aware type used for
all subsequent accesses.
The Mem
and Alias
types are exposed in public APIs according to local alias
information. The Access
type is never publicly exposed, and only used for code
generation.
Practical Use
That’s enough theory; let’s talk about how to use the crate. This chapter is
divided into two subsections, one for use cases that want an ordinary bool
collection library with transparent memory compaction, and one for use cases
that want a convenient way to precisely sculpt memory. Before getting in to
either, let’s quickly recap how the bitvec
types interact with memory
ownership.
Rustic Memory Management
The first and most important thing to remember is that, despite the extra
complexity just discussed about memory parameters and aliasing behavior,
bitvec
is just Rust. It obeys all of the rules and patterns that the rest of
Rust does.
BitSlice
s, like regular slices, are exclusively borrowed from owned memory
higher up the stack. Their references can be passed down the stack, and are
subject to the same lifetime and mutation-exclusivity rules.
The owned memory that creates a BitSlice
view can be an array, boxed slice, or
vector of either ordinary integers, or their wrapper equivalents provided by
bitvec
:
#![allow(unused)] fn main() { use bitvec::prelude::*; let array = [0u8; 8]; let boxed: Box<[u16]> = Box::new([0u16; 4]); let vec = vec![0u32; 2]; let bits_a = bitarr![u8, Msb0; 0; 64]; let bits_b = bitbox![u16, Lsb0; 0; 64]; let bits_v = bitvec![u32, LocalBits; 0; 32]; }
Once memory is bound, it can be borrowed as a BitSlice
by using the BitView
trait (imported in the prelude), or by using the fact that all bitvec
containers borrow themselves as BitSlices
just like standard-library
containers borrow themselves as slices:
#![allow(unused)] fn main() { use bitvec::prelude::*; let array = [0u8; 8]; let boxed: Box<[u16]> = Box::new([0u16; 4]); let vec = vec![0u32; 2]; let bits_a = bitarr![u8, Msb0; 0; 64]; let bits_b = bitbox![u16, Lsb0; 0; 64]; let bits_v = bitvec![u32, LocalBits; 0; 32]; let array_bits = array.view_bits::<Msb0>(); let boxed_bits = boxed.view_bits::<Lsb0>(); let vec_bits = vec.view_bits::<LocalBits>(); let bits_a_ref: &BitSlice<_, _> = &bits_a; let bits_b_ref: &BitSlice<_, _> = &bits_b; let bits_c_ref: &BitSlice<_, _> = &bits_c; }
And, of course, mutability applies:
#![allow(unused)] fn main() { let mut arr = bitarr![0; 10]; let arr_ref: &mut BitSlice = &mut arr; arr_ref.set(1, true); assert!(arr_ref[1]); }
Just as with ordinary Rust code, BitSlice
is the type you want to use when
working with memory that you are not responsible for moving around or
destroying. However, when you do need to move memory around, you need to switch
to either a static
binding or a container: BitArray
is always available, and
BitBox
and BitVec
require an allocator.
I am spending this much time discussing the Rust memory management system
because this is a very common question I receive. No other bit-stream crate
has reference types, and therefore users coming from, e.g., bit-vec
see the
same BitSlice
name and attempt to use their habits from that crate.
bitvec
is not like any other bitstream library, in Rust, C++, or another
language. bitvec
is like ordinary Rust. I promise.
Bit Collections
As discussed in the Type Parameters chapter, you should use usize
as the
BitStore
parameter for optimal performance in the generated program.
Once you have created some memory that you can view as individual bits, it is
time to actually use it. Here is the one-sentence summary of what bitvec
can
do:
Every stable API present in the standard library is replicated in
bitvec
, except forBitSlice<T, O> as IndexMut<usize>
, becausebitvec
cannot produce&mut bool
.
If you were using ordinary collections to manage sequences of bool
s, then
every part of your code will continue to work on bitvec
types except for the
array assignment slice[index] = value;
. Until and unless the IndexMut
trait
is reshaped, you will need to use one of these two alternatives:
slice.set(index, value);
or *slice.get_mut(index).unwrap() = value;
Subslicing works:
#![allow(unused)] fn main() { use bitvec::prelude::*; let bits = bits![0, 0, 0, 0, 1, 1, 1, 1]; assert!(bits[.. 4].not_any()); assert!(bits[4 ..].all()); }
Incremental munching works:
#![allow(unused)] fn main() { use bitvec::prelude::*; let mut bits = bits![0, 0, 1, 1, 1, 0, 0, 0]; // ^^^ modify the slice handle, not the slice contents while let Some((&false, rest)) = bits.split_first() { bits = rest; } assert_eq!(bits, bits![1, 1, 1, 0, 0, 0]); while let Some((&false, rest)) = bits.split_last() { bits = rest; } assert_eq!(bits, bits![1; 3]); }
Mutation works:
#![allow(unused)] fn main() { use bitvec::prelude::*; use std::{iter, thread}; let bits: &'static mut BitSlice = bits![mut 0; 8]; { let (left, right) = bits.split_at_mut(4); // Pretend that better work is happening here let a = thread::spawn(|| left |= iter::repeat(true)); let b = thread::spawn(|| right ^= iter::repeat(true)); a.join().unwrap(); b.join().unwrap(); } assert_eq!(bits, bits![1; 8]); }
Everything you can do with a slice, an array, or a vector of bits, you can do
with bitvec
’s equivalent types. Except for IndexMut<usize>
. The only change
from the standard library types is that you are now guaranteed to use one bit of
storage for each bit of information, rather than eight bits of storage per bit.
Author’s note: Other than bragging about
bitvec
’s API fidelity, I don’t think this section is very useful or educational. If you want to read more about how to usebitvec
forusize => bool
collections, please let me know and I will expound!
Bitfields
bitvec
’s more technically-interesting capability is that it provides
load/store memory access behaviors that allow you to write values into, and read
them back out of, any BitSlice
in memory rather than being constrained to
well-formed references to well-typed memory.
This is useful for the de/construction of packed memory buffers, such as transporting data through I/O protocols.
AUTHOR’S NOTE: If you are using
bitvec
to do anything related to the underlying memory representation of a bit-buffer, you must read this chapter, and all of the API docs ofbitvec::field
and its contents, in their entirety.I have written extensively, and yet still insufficiently, about the intricacies involved in operating the
BitField
trait correctly. If you skim this documentation, you will have unexpected behavior, you will get frustrated with me for writing a bad library, you will file an issue about it, and I will probably tell you that the behavior is correct and that I already addressed it in the documentation.It took me a long time to think about and a long time to write. It should take you also a long time to read and a long time to think about.
All of this behavior is contained in the BitField
trait. Let’s explore that:
#![allow(unused)] fn main() { // bitvec::field pub trait BitField { fn load<M>(&self) -> M; fn store<M>(&mut self, value: M); } impl<T> BitField for BitSlice<T, Lsb0> { fn load<M>(&self) -> M { /* snip */ } fn store<M>(&mut self, value: M) { /* snip */ } } impl<T> BitField for BitSlice<T, Msb0> { fn load<M>(&self) -> M { /* snip */ } fn store<M>(&mut self, value: M) { /* snip */ } } }
The actual trait is more complex than this, and will be visited later. The
important part, right now, is that BitField
allows you to load values out of
BitSlice
s and store values into them. Furthermore, it is implemented
specifically on BitSlices
that use the bit orderings provided by bitvec
,
and is not generic over all orderings.
While bitvec
could in theory provide a default implementation for all
<O: BitOrder>
, this would by necessity have the most pessimal possible
performance, and the lack of specialization for overlapping trait
implementations means that faster performance can never be written.
The downside of the two specific implementations is that Rust coherence rules
forbid implementation of a bitvec
trait, on a bitvec
type, parameterized
with a local, but non-bitvec
, ordering type. On the off chance that you find
yourself writing a new BitOrder
implementor, file an issue.
The M
type parameter on the load and store methods is bounded by funty’s
Integral
, trait. It can store any unsigned or signed integer at any partial
width. This parameterization allows you to combine any integer type for transfer
with any integer type for storage, rather than being restricted to only
transferring T
data into and out of a BitSlice<T, _>
.
Unfortunately, adding a second integer type parameter is not the only
complication to the BitStore
memory model. There is also a second dimension of
segment ordering. bitvec
tries to make explicitly clear that the Lsb0
and
Msb0
types refer only to the ordering of bits within registers, and not to
the ordering of bytes within registers. However, when the register being
stored or stored does not fit within one register of the storage BitSlice
, it
must be split into multiple segments, and those segments must somehow be ordered
in memory.
Segment Orderings
Author’s Note: READ THIS. I have received several issues about this exact concept. It is not obvious.
There are two segment orderings: little-endian and big-endian. You may select
the segment endianness you prefer by using the _le
or _be
suffix,
respectively, on the .load()
and .store()
methods. The unsuffixed method is
an alias for the endianness of your processor: _be
on big-endian targets, and
_le
on little-endian. This is a convenience only. If you are writing I/O
buffers, you should really use the explicitly-named methods.
Let us imagine a BitSlice<u8, Lsb0>
used to store a u16
that is misaligned,
and thus stored in three successive bytes. This algorithm is true for all
circumstances where the stored region occupies more than one register of the
backing slice, but smaller examples are simpler to draw.
This diagram uses 0
to refer to the least significant bit, and 7
to refer to
the most significant bit. The first row shows bytes of memory, the second row
shows the bit indices in memory used by .store_le()
, and the third row shows
the bit indices in memory used by .store_be()
.
[ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ]
3 2 1 0 b a 9 8 7 6 5 4 f e d c
f e d c b a 9 8 7 6 5 4 3 2 1 0
.store_le()
places the least significant segment in the low address, while
.store_be()
places the most significant segment in the low address. The
ordering of bits within a segment is always preserved, no matter which
ordering parameter is used by the BitSlice
.
Here is the same example, but using the Msb0
bit ordering instead. Again, the
second row uses .store_le()
, and the third row uses .store_be()
.
[ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ]
3 2 1 0 b a 9 8 7 6 5 4 f e d c
f e d c b a 9 8 7 6 5 4 3 2 1 0
The only change is in how the segments are placed into memory. The ordering of bits within a segment never changes, and is always the processor’s significance order as implemented in hardware.
How to Use BitField
You will probably find real use of the BitField
trait more educational than
the previous section. It has a very straightforward API, that you can combine
with println!
-debugging or your favorite means of viewing memory in order to
observe its actions.
Step one: create any BitSlice
-capable buffer. This can be any of the
Rust-native sequence types, or any of the bitvec
types.
#![allow(unused)] fn main() { use bitvec::prelude::*; let mut data = [0u8; 4]; let bits = data.view_bits_mut::<Msb0>(); }
Then, narrow the BitSlice
to be the region you want to access as storage. It
must be no wider than the integer type you are transferring: BitSlice
s outside
the domain 1 ..= M::BITS
will panic during .load()
or .store()
. The
easiest way to narrow a BitSlice
(or buffer type that dereferences to it) is
by using range indexing, [start .. end]
.
#![allow(unused)] fn main() { use bitvec::prelude::*; let bits = bits![mut u8, Msb0; 0; 32]; bits[10 ..][.. 13].store_be::<u16>(0x765); assert_eq!(bits[10 .. 23].load_be::<u16>(), 0x765); bits[10 .. 23].store_le::<u16>(0x432); assert_eq!(bits[10 .. 23].load_le::<u16>(), 0x432); }
That’s the entire API. .store()
truncates the stored value to the width of the
receiving BitSlice
, and .load()
zero-extends the loaded value to the width
of the destination register type.
You can see an example that uses the BitField
trait to implement an I/O
protocol in the examples/ipv4.rs
program in the repository. Use
cargo run --example ipv4
to see it in action.
Runtime Performance
bitvec
increases the instruction cost of each access to a bool
in its data
structures. This is an inevitable consequence of the fact that, even on
architectures that have them, compilers typically do not emit object code
instructions that access individual bits within a memory location. Therefore,
each access in bitvec
has, in addition to a memory operation, one or two
shift instructions one or two AND
/OR
/NOT
instructions.
This means that, inevitably, bitvec
is slower in CPU time than [bool]
is.
Measurements indicate roughly a factor of ten, but with also about 10x more
variance. However, this cost is only apparent and meaningful when walking the
entirety of very large buffers, and tends to fade into noise on smaller buffers,
or be obviated by compile-time fixed accesses. As always, try it on a
representative workload.
Benchmarking
I have tried (with admittedly low priority) to have some benchmarks in the
project to back up my claims that bitvec
is fast. This has been difficult to
maintain for a few reasons, but I have at least a few that have stayed present
in order to demonstrate important claims, such as showing that specialization
for matching types does provide massive performance benefits (it does).
In particular, LLVM is very good at propagating compile-time constants through
the code, and bitvec
strives to maintain an internal implementation that is
easily accessible to optimizers. This means that basically any benchmark that
takes input from a source file that the compiler can see gets artificially
solved during codegen.
For instance, I don’t know how long it takes to construct a &BitSlice
view
over memory, because my benchmarks report 0ns: LLVM computes my pointer encoding
at compile time, and a consequence of the way I designed my pointer encoding is
that the only operation BitSlice::from_slice
actually performs is .len << 3
.
When LLVM can see the original length, it just does this itself, and emits an
immediate with the correct length instead of the constructor call.
Other constructor benchmarks are only showing me the time required to run
memcpy
, and arbitrary indexing just shows the time required to run three
instructions, because LLVM solved the shift/mask arguments ahead of time.
The important takeäway here is that if your code is at all dependent on
constants that the compiler can see, and is not exclusively performing indexing
based on runtime inputs, then bitvec
is going to be plenty fast.
Specialization
bitvec
strives to use its knowledge of the underlying memory representation
wherever possible. This means that operations between BitSlice
s with the same
type parameters can rely on an identical representation and use integer behavior
rather than walking each bit individually.
Try to do this wherever possible, especially in performance-sensitive code. You
typically should not be mixing bitvec
structures with different type
parameters anyway: use the representation that serves your needs, and keep using
it in all buffers that interact with it.
As an example, as of 1.0, walking two large BitSlice
s with incompatible type
parameters takes 3µs (microseconds), while walking the same sizes with
identical type parameters takes 100ns (nanoseconds). It’s roughly a 32x
performance difference, which is only half the speedup that I expected using
usize
on a 64-bit machine, but still quite stark.
Memory Representation
As discussed in the Type Parameters chapter, bitvec
allows users to select
the specific ordering of bits within a memory element when constructing a type.
This has consequences for how source code translates to an in-memory
representation.
To review: bitvec
provides two orderings of bits within a single memory
element (Lsb0
and Msb0
) and three or four types of memory elements (u8
,
u16
, u32
, and only on systems where it is 8-byte-aligned, u64
). The
usize
type is also supported, but it is not portable, and behaves exactly as
the named register of its width.
The
Cell
and atomic integer variants are not interesting here, as they only affect how the memory bus operates, not the processor register.
Let us now examine how each possible combination of register width, bit-ordering, and processor byte endianness affects the placement of bits in memory.
Memory Layout
The BitOrder
and BitStore
traits combine with your target architecture’s
byte-ordering of register elements to create a matrix of memory traversals. This
matrix informs what is the appropriate choice of parameters to use for your
program whenever you are using bitvec
for precise memory control rather than
solely for a compact usize
-to-bool
data collection.
The tables below list bytes of a memory address space with addresses increasing to the right, and the bits within those bytes with numeric significance decreasing to the right. This is the ordering used in most debug-printing of memory, so hopefully the table contents should match up with your prior experience viewing memory bytes.
The L
and M
indicate the Lsb0
and Msb0
ordering parameters,
respectively, and xx
indicates that the row matches all register types.
Within each row, traversal begins at zero and follows the arrows to each
successive step. Boundaries between registers are marked with a column;
boundaries between bytes within the same register are marked with a space.
Little-Endian Byte-Ordered Machines
On little-endian machines, the least-significant byte of a register type is stored at the lowest memory address, and each byte higher is one step more numerically significant than the last.
byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
bit ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
Lxx ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
─────╫─────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────
M8 ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F
M16 ║ 2 ---> 3 0 ---> 1│6 ---> 7 4 ---> 5│A ---> B 8 ---> 9│E ---> F C ---> D
M32 ║ 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1│E ---> F C ---> D A ---> B 8 ---> 9
M64 ║ E ---> F C ---> D A ---> B 8 ---> 9 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1
Big-Endian Byte-Ordered Machines
On big-endian machines, the most-significant byte of a register type is stored at the lowest memory address, and each byte higher is one step less numerically significant than the last.
byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
bit ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
L8 ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
L16 ║ 3 <--- 2 1 <--- 0│7 <--- 6 5 <--- 4│B <--- A 9 <--- 8│F <--- E D <--- C
L32 ║ 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0│F <--- E D <--- C B <--- A 9 <--- 8
L64 ║ F <--- E D <--- C B <--- A 9 <--- 8 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0
─────╫─────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────
Mxx ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F
If you need to care about the memory representation, then you most likely want
to use the <u8, Msb0>
pair. This provides a consistent ordering on all
machines, and the numeric value of the underlying memory will probably match
your expectations about the semantic contents of a data structure.
This chapter, and the BitField
trait, are the two most common sources of
questions about how bitvec
operates. Their intersection is even more complex,
and the layout of numeric integers stored into a BitSlice
is an extremely
common point of confusion.
Read these chapters and the API documentation thoroughly, and experiment with placing data into memory and changing the type parameters to observe their effects on buffer representation.
bitvec
Memory Model
bitvec
addresses individual bits, while computer hardware addresses register
elements. As a result, bitvec
has a more complex memory model than the Rust
language does. The library implementation strives to satisfy users’
expectations, the Rust language’s rules, and performance in the produced
artifact to the best solution for all parties, with as few compromises as
possible. Unfortunately, this has the side effect of increasing the complexity
of the codebase, both for maintainers and for readers.
This chapter explains the abstract concepts of the bitvec
memory model and the
means by which it is encoded in the Rust language without running afoul of Rust
or LLVM rules that threaten undefined behavior.
The bitvec
memory model is typed entirely within the store
module’s
BitStore
trait definition and implementations. It utilizes the access
module’s BitAccess
trait to mediate memory access events through types that
are known to be correct in the Rust and LLVM models, so that at any point in
program execution, memory access will be consistent and sound.
In addition, the domain
module’s types provide views which manipulate the
store
model to maximize performance. This chapter will discuss primarily
store
, and use domain
only to provide examples of how store
is used in
practice to accomplish the library’s implementation.
Aliasing
To Rust and LLVM, “aliasing” occurs whenever there are two paths to a memory
region, and at least one of them has write privileges. In Rust, this is
represented by the &mut
reference exclusion rule: it is always, always,
always Undefined Behavior in the Rust memory model to have two
references which can reach a memory element if at least one of them is marked
&mut
.
LLVM, which was created for, is written in, and culturally shaped by C++, takes
a similar view with its noalias
annotation, but struggles to enforce it as
thoroughly as Rust does.
bitvec
takes a similar view of the abstract meaning of the &mut
reference
type, but where the Rust memory model focuses on whole units T
, and has no
concept of subdivision from T
into the bits that compose it, bitvec
views
each individual bit as a standalone, independent, atom of memory. It excludes
the creation of two &mut BitSlice
reference handles that are capable of
viewing the same bit, but will happily produce two &mut BitSlice
handles
which are mutually-exclusive in bits, but reference the same location in memory.
Here we come to the first problem with the conflicting memory models: bitvec
cannot ever create an &mut T
through which it may write to memory, because it
has no way of ensuring that no other &T
or &mut T
reference exists which is
capable of viewing the memory region into which it writes.
Rust Shared Mutation
The problem isn’t just that the Rust standard library doesn’t offer any
non-unsafe
APIs to produce such references. The problem is that this is about
the most illegal thing you can do in the eyes of the Rust compiler, and if it
ever detects this transgression, it has full liberty to either reject, or worse,
accept and miscompile, your program.
In Gankra’s immortal words on the topic of attempting to sneak past the compiler’s commandments,
- Transmuting an
&
to&mut
is always UB- No you can’t do it
- No you’re not special
The solution is simple: Rust exposes a type, UnsafeCell
, which is the
axiomatic type for mutation through shared views. In the Rust memory model, any
reference to a region of memory marked with UnsafeCell
is permitted to write
to it, and if any other references can view that region, then it is up to them
to ensure consistent behavior.
This is, of course, not quite true in LLVM’s memory model, but we are not there yet.
Since an &mut BitSlice<T, _>
handle cannot produce an &mut T
reference to
perform writes to memory, it must instead either use a *mut T
bare pointer,
which has absolutely no checks or optimizations whatsoëver, or use an
&UnsafeCell<T>
shared reference, which has all the usual guarantees present on
all1 reference types.
All UnsafeCell
does is instruct the Rust compiler to politely look the other
way about your program’s memory accesses. It is somewhat like the C keyword
volatile
, in that the compiler no longer believes that reads are stateless, or
freely reörderable, but entirely unlike that keyword in that the compiler
doesn’t have any obligation to keep your reads from or writes to such regions.
Rust provides an additional type called Cell
. This is a wrapper over
UnsafeCell
2 that defines a more useful API, including the only safe and
guaranteed way to write into memory through a shared reference: Cell::set
.
And voilà: &mut BitSlice<T, _>
simply constructs &Cell<T>
when writing,
the Rust compiler does not see a violation of the &mut
exclusion principle,
and we are done.
LLVM Shared Mutation
No we’re not. If it was that easy, there wouldn’t be a trait system or this document dedicated specifically to dealing with this problem.
Cell
is not thread-safe, because Cell
does not modify the instructions used
to access memory. It produces ordinary load and store instructions, carefree and
ignorant of the bane of everyone who wants consistency and the delight of
everyone who wants performance: concurrency.
Just as it is undefined behavior in Rust to manifest two &mut
references that
can view the same location, it is equally undefined behavior in LLVM to manifest
two pointers into memory that ever, at all, no matter what, perform any
memory access to the same location, at the same time, on multiple executors.
As with above:
- Unsynchronized writes are always UB.
- No you can’t do it.
- No you’re not special.
LLVM has an even more insidious punishment for this transgression that Rust does
not directly express: unsynchronized reads from a data race produce poison
.
Poison is a nifty concept, because it’s not illegal to obtain one. When LLVM
gives you a poison
, your program continues undergoing compilation as if
nothing had happened. You can pass it around. You can write to it, and if you
destroy it before reading, you’re fine.
As soon as you attempt to read the bit-wise value of poison
, your program is
undefined3.
So if bitvec
wants to be threadsafe, which it does, and it wants to insist on
its ability to safely alias the same memory location from multiple handles,
which is non-negotiable, there’s only one avenue left to take.
Atomic Powered Microscopes
Rust doesn’t actually model atomics. It doesn’t have to do so: no harm can ever
come from multiple handles reading out of the same immutable location, harm can
only occur when writes are observable, and writes are not observable due to the
&mut
exclusion rule. Well, except for UnsafeCell
, so everything that has an
UnsafeCell
in it gets marked as !Send
, &
references can’t cross threads,
and the whole problem is avoided.
This is, of course, not good enough. Concurrent, mutable, access to a location
is an important property in a computer. LLVM provides atomic types, which Rust
transparently exports as wrappers over UnsafeCell
that have their Sync
implementation restored. Handles to a region marked as atomic will use some form
of hardware-provided exclusion in order to preserve the
one-writer-XOR-any-readers system behavior, and all is well.
Hardware-level exclusion has the unfortunate penalty of being, to put it
lightly, “slow”. So while it’s the safest choice to be correct, and was in fact
the default universal choice for all memory access in bitvec
for some time,
its costs4 called for a more prudent behavior.
It’s time for a new trick, something that the Rust compiler does at the region
level, and that bitvec
now does at the element level:
Run Time Alias Analysis
BitSlice
handles can only be constructed from ordinary Rust references to
memory, which Rust guarantees start out unaliased. The potential for aliasing
only occurs when a &mut BitSlice
is split into multiple subslices using any
of the functions that eventually call .split_at_unchecked_mut()
. Since that is
the root function that introduces alias conditions, it returns subslices whose
memory type parameters are tainted with the ::Alias
marker. It has the
following type signature:
#![allow(unused)] fn main() { impl<T, O> BitSlice<T, O> where O: BitOrder, T: BitStore { pub fn split_at_unchecked_mut(&mut self, at: usize) -> ( &mut BitSlice<T::Alias, O>, &mut BitSlice<T::Alias, O>, ); } }
The BitStore
trait defines an ::Alias
associated type, which ensures that
all memory accesses through it have appropriate aliasing guards. For builds
which do not use the atomic
feature (or where the target does not have an
atomic variant of the integer), this is Cell
, and its protections are the loss
of thread-safety. For builds that do permit atomics, the marker enforces that
all reads and writes use atomic instructions.
The ::Alias
marker is applied, at compile time, by operations that split
&mut BitSlice
references into multiple coëxisting subslices. This is a good
first step to reducing unnecessary synchrony, but not good enough. Consider the
following:
#![allow(unused)] fn main() { use bitvec::prelude::*; let mut data = [0u8; 2]; let bits = data.view_bits_mut::<Lsb0>(); let (one, two) = data.split_at_mut(8); }
It so happens that this is equivalent to splitting data
first, then viewing
each subslice as bits, but this transformation can only be done if the partition
point is known at compile-time to fall on an element boundary. There is no need
for the subslices one
and two
to use alias-safe operations, as accesses to
memory through them do not conflict with each other.
Bit Slice Domains
The in-memory domain of any bit slice can be generalized to one of two formats:
either the slice touches zero edge-bits (0
or T::BITS - 1
), or it touches at
edge-bit in at least one element. Consider three bytes of memory (any element
will do, but the extra width on this page is unnecessary), with some bitslice
regions drawn within them:
|00000000│11111111│22222222| Element
|76543210│76543210│76543210│ Bit
├────────┼────────┼────────┤
┆ ┆ ┆ ┆ 1
┆ ╞═════╡ ┆ ┆ 2
┆ ┆ ╞════╡ ┆ ┆ 3
┆ ┆ ╞═════╡ ┆ 4
┆ ╞═════════╡ ┆ ┆ 5
┆ ╞════════════╡ ┆ 6
┆ ╞═════════════╡ ┆ 7
┆ ╞═════════════════╡ ┆ 8
╞══════════════════════════╡ 9
There are nine example slices here, but they can be reduced into six specific categories, and two general ones:
- empty: row 1
- partially-spanning tail, no body: row 2
- minor (interior of an element, no edge indices): row 3
- partially-spanning head, no body: rows 4
- major (partial head, partial tail, no body): row 5
- partially-spanning head, fully-spanning body: row 6
- partially-spanning tail, fully-spanning body: row 7
- major (partial head, partial tail, full body): row 8
- spanning: row 9
The minor slice (row 3) is irreducible; the rest can all be divided into three subcomponents:
- zero or one partially-occupied head element, where the slice touches the last index in it but not the first
- zero or more fully-occupied middle elements, where the slice touches all indices in each
- zero or one partially-occupied tail element, where the slice touches the first index in it but not the last
We can break each row down into these components:
Row | Head | Body | Tail |
---|---|---|---|
1 | None | 0 | None |
2 | None | 0 | Some |
4 | Some | 0 | None |
5 | Some | 0 | Some |
6 | Some | 1 | None |
7 | None | 1 | Some |
8 | Some | 1 | Some |
9 | None | 3 | None |
We can observe that where a slice fully-spans some elements, those elements
cannot be mutated by any other reference. In the &BitSlice
case, all &mut
s
are forbidden by the compiler’s ordinary rules; in the &mut BitSlice
case,
bitvec
’s obedience to those same rules forbids any other handle from observing
the bits covered by the &mut BitSlice
. As such, it is statically impossible
for any alias to exist to the described memory in row 9, or for any alias to
observe element 1
of rows 6 through 8.
bitvec
happily permits element-aliasing &mut BitSlice
references to observe
the partially-filled elements in the outer columns, and middle column of rows 2
through 5, so writes to them must remain synchronized through either
single-threaded Cell
or concurrency-safe atomics. These domain components can
be calculated from the three components of a slice pointer: the base address,
the starting bit index, and the bit count.
This is expressed in the domain
module’s twt enums.
BitDomain
This enum splits any BitSlice
region into its subcomponents, immutably or
mutably, respectively. It has the (rough) definition
#![allow(unused)] fn main() { pub enum BitDomain<'a, M, T, O> where M: Mutability, T: BitStore, O: BitOrder, { Enclave(&'a /* mut */ BitSlice<T, O>), Region { head: &'a /* mut */ BitSlice<T, O>, body: &'a /* mut */ BitSlice<T::Unalias, O>, tail: &'a /* mut */ BitSlice<T, O>, }, } }
and, rather than granting direct memory access, merely removes any aliasing markers from as much memory as possible. The subslices that partially fill their base element do not need to add an additional aliasing marker, as the marker is only required when writes to the element may collide. If the slice is immutable, aliasing never occurs, so synchrony is never required. If the slice is mutable, then the only way to get a partial edge slice is to either forget about some bits from the main slice, which is not an alias event, or to split the slice, which is, and splitting already marks the alias.
Domain
The bit domains are still bit slices, and do not offer a way to access the backing memory. For operations where raw memory access is required, this enum produces the same domain definitions, but typed for the bare elements rather than their bits.
It has the (rough) definition
#![allow(unused)] fn main() { pub enum Domain<'a, M, T> where M: Mutability, T: BitStore, { Enclave(PartialElement<&'a T>), Region { head: Option<PartialElement<&'a T>>, body: &'a /* mut */ [T::Unalias], tail: Option<PartialElement<&'a T>>, }, } }
(The PartialElement
type prevents accessing bits that do not belong to the
originating BitSlice
.)
As with the bit domains, these domains will inherit any aliasing markers from
their source bitslice. The ::Alias
associated type enables the mutable domain
to produce references that allow mutation without adding an unnecessary
aliasing marker. Rust strongly forbids the production of &mut
references to
aliased memory elements, which is why the only &mut
reference in these views
is to memory that is fully known to be unaliased.
The Domain
structure will produce bare atomic or Cell
types in the
alias condition. This is necessary in order to avoid production of &mut
references which alias (as this is undefined in the Rust abstract machine,
regardless of behavior), and safe because any other references to the same
location will be similarly aliased and capable of handling external mutation.
LLVM Suboptimizations
LLVM considers a “racing read”, that is, any read from memory that could occur
contemporaneously with an atomic write, to be undefined behavior. This is a
reasonable view to take, but a pessimistic one. bitvec
has information that
cannot be expressed to LLVM about which bits of an element it will observe
or modify, and bitvec
is capable of guaranteeing that two distinct access
points will not be able to interfere with each other electrically. LLVM does not
know this, so it considers any write to memory to touch all bits of the
touched element, and any read from memory to view all bits of the fetched
element.
bitvec
exclusively5 writes to contended memory with the Rust functions
AtomicT::fetch_and
and AtomicT::fetch_or
, which are mapped to the LLVM
instructions __atomic_fetch_and_N
and
__atomic_fetch_or_N
. It uses bitmasks that bitvec
can
guarantee are non-intersecting, but this proof cannot be extended to
even the Rust compiler, let alone LLVM. These bitmasks are applied to register
values before using either of the fetch_op
instructions, and after any reads
that use AtomicT::load
/__atomic_load_N
.
If the bitvec
mask proofs were extendible to LLVM, and LLVM were to expand its
tracking of which bits of a memory address became poison
ed by a memory write,
and which bits of a fetched memory value became un-poison
ed by a masking
operation, then the compiler would be more able to observe that bitvec
memory
accesses do not observably interfere with each other. This observation would
then define the behavior in the compiler’s memory model of racing writes/reads,
and permit an increased (possibly even complete) removal of synchrony guards.
I am not aware of any processor hardware which fails to guarantee that all bits of memory are fully defined at the clock edges of all instructions that use the location. To the full extent my knowledge, all memory banks in all relevant processors have a stable bit-value at the start of a tick, when reads occur, and at the end of a tick, when writes commit. At no point does changing the value of one bit of a memory component affect the electrical value of other bits in the component.
This is not necessarily true of other storage devices, such as SSDs, but
bitvec
can only be used to access storage cells mapped in the RAM address space, which tend to all have this stability property.
Summary
The formal definition of the bitvec
memory model extends the Rust
mutable-exclusion rules by refining memory regions to have bit-precision instead
of element-precision. The Rust compiler is only weakly capable of tracking the
region status of individual bits, and only in compiler-internal structures. LLVM
has a more robust arbitrary-bit-tracking capability, but similarly limits its
interface to external code.
Barring any errors in the bitvec
implementation, the bitvec
memory model is
fully sound in its behavior with regard to single-observer unsynchrony.
Synchronization is only added in order to correctly interface with rustc
and
LLVM without causing either of them to introduce undefined behavior due to a
lack of information.
Footnotes
all references to T
where T
is either !Sized
, or
mem::size_of::<T>()
is non-zero, that is. A fun quirk of Rust’s
first-class concept of zero-width types is that the only illegal value for
a &Zst
reference is null. Since there is nothing to load or store
through a &Zst
reference, the compiler doesn’t care what the reference
value is, as it will never be used to perform memory access.
This is not absolutely true. Like we saw with UnsafeCell
, the only
immutable rule of compiler developers is that whenever they make an
immutable rule, they also provide a way to sidestep it. If you freeze
a poison
, you are now free to read its value and act on it. LLVM just
doesn’t make any guarantees about what you’ll see.
I’d feel a great deal more comfortable if I had firm knowledge of what
those costs actually were. An atomic write always issues a lock
instruction modifier on x86, and I have heard vastly different things
about what that actually means, from “it’s free if no other cache holds
that address” up to “it poisons the whole cacheline”, and have not had
much luck producing a benchmark that firmly demonstrates that unneeded
atomic access is a strict performance cost.
In multithreading environments. Disabling atomics also disables bitvec
’s
support for multithreading, so the penalty for aliasing is reduced to an
inability to remove redundant reads.
Bit Slice Pointer Encoding
bitvec
’s core value proposition rests on the fact that it is capable of
defining an unsized slice type, and controlling references to it. The Rust
language rests heavily on the two reference types &
and &mut
, and does not
ordinarily allow these to be faked or created by anything other than the
compiler.
Rust Reference Rules
It so happens that not only does rust strongly guarantee the in-memory layout
of a reference to a slice, it also provides a stable API for
constructing values of &[T]
type without using mem::transmute
. Subject to
certain value requirements imposed by types, slice references can be constructed
through these functions and the compiler will accept them as valid.
These requirements traditionally make it difficult to encode non-address information into a bare reference, since the compiler has a very firm expectation that a reference to a type is immediately dereferenceäble to a value of that type, but if your type happens to be zero-sized, then it can never exist in memory, no loads or stores to it can ever be produced, and the compiler no longer concerns itself with the actual bit-pattern value of references to it.
Which is why the definition of BitSlice
is
#![allow(unused)] fn main() { // src/slice.rs #[repr(transparent)] pub struct BitSlice<T, O> where T: BitStore, O: BitOrder, { _mem: [()], _typ: PhantomData<T>, _ord: PhantomData<O>, } }
BitSlice
is [()]
with some markers that only the type-checker can see.
&BitSlice
is thus &[()]
, and &[()]
can have any values it wants (except,
of course, null).
Pointer Encoding
Slice references contain two pieces of information: the address of the base element, and the number of elements, starting at the base, contained in the slice region. Theoretically, bit-slice references have the same pair of information: the address of the first bit, and the number of bits in the region.
However, computers are byte-addressed, not bit-addressed, so we need to store
three more bits (to select a bit in the base byte) somewhere in the reference.
Since slice references are defined as { base: *T, elts: usize }
, and there are
no1 spare bits in *const _
, the bits to store the base bit are taken out of
the length counter.
Reference address values are also required to be integer multiples of the
alignment of the referent type T
. This alignment is, on all supported targets,
the width in bytes of the referent type. As a result, there are as many low bits
in the address of any T
that are guaranteed to be the 0
value, as there
are bits needed to select a byte within the element. The end result is that the
length counter must always use three bits to store the starting bit, and the
base address will be composed of an aligned T
address and an index of the
starting byte within it.
As Rust does not have bitfield syntax, a definition of the pointer structure in C++ looks like this:
template <typename T>
struct BitSpan<T> {
static_assert(
std::is_unsigned<T>
&& sizeof(T) <= sizeof(size_t)
&& sizeof(T) <= alignof(T)
);
uintptr_t ptr_head : __builtin_ctzll(alignof(T));
uintptr_t ptr_addr : sizeof(uintptr_t) * CHAR_BITS;
- __builtin_tczll(alignof(T));
size_t len_head : 3;
size_t len_bits : sizeof(size_t) * 8 - 3;
}
In Rust, the structure is declared as
#![allow(unused)] fn main() { // src/pointer.rs #[repr(C)] pub struct BitSpan<T, O> where T: BitStore, O: BitOrder, { ptr: NonNull<u8>, len: usize, _ty: PhantomData<T>, _or: PhantomData<O>, } }
and the logical components must be accessed through get/set functions, rather than through compiler-generated field stubs.
By marking the pointer as NonNull
, BitSpan
declares that it will never be a
null pointer and becomes subject to the same peephole optimization that allows
mem::size_of::<Option<&T>>() == mem::size_of::<&T>()
. By marking it as
unconditionally a pointer to u8
, we declare that all low bits of the address
value are in use, and none can be used as slots for anything else (since our
encoding is using them to select a byte within the T
).
Significant Values
The null value, { ptr: 0, len: 0 }
, is not valid in BitSpan<T>
, but rather is
used to mark Option::<BitSpan<T>>::None
.
Empty Slices
All slices with a bits
logical field are considered equally empty, and equal
to each other. The addr
and head
logical fields can be anything. However,
they are not required to be clamped to 1
and 0
, respectively, because they
can contain important information about the region.
Specifically, the BitVec
type is an owning type that manages a buffer assigned
to it by the memory allocator. It is never allowed to forget the address of its
buffer, even if the user has removed all live bits from the vector.
Summary
Rust requires that slice references have a specific ABI, but makes no
requirements about the encoding of values of those references for certain types.
We can supply our own ABI-equivalent structure, define functions that use the
structural encoding to compute the information needed to actually interact with
memory, and convert our structures into Rust-accepted slices through the
provided compiler API in core
.
Footnotes
On AMD64, pointers are actually aggregates of MMU translation pages, and
processors only decode the low 48 or 57 bits of them, leaving the high 16
or 7 bits available for other information not part of the memory
addressing system. However, these processors also trap when attempting to
dereference a pointer whose high [48:]
or [57:]
bits do not have the
same bit value as bit [47]
or [56]
, and that bit is typically used to
differentiate unprivileged user memory from privileged kernel memory.
Furthermore, this dead region does not exist on 32-bit architectures, x86
or otherwise, and since bitvec
explicitly supports 32-bit systems, the
use of dead bits only present on a subset of supported targets and subject
to their own extra rules
Miscellaneous Implementation Details
bitvec
has a number of internal implementation details used to facilitate
development but are not part of the public API. While not user-facing, these
details are nevertheless important to document as explanations for how the
library is built.
Integer Refinement
bitvec
offloads abstraction over the fundamental integers to the funty
crate. It provides traits such as Unsigned
that generalize over any unsigned
integer and allow them to be used in generic code.
funty
only unifies the standard-library APIs of the integers into trait-based
code; bitvec
further extends this with useful constants in the BitRegister
trait. This trait delimits the integers that correspond to machine registers
actually available for use as storage, and provides minor conveniences for
working with them.
Specialization Hacks
bitvec
is built to be agnostic to the ordering of bits within an element. This
is an important aspect of its design, and even if no other ordering than the
provided Lsb0
and Msb0
are used, must remain so that these two orderings
can be used on equal footing. However, the deferral of register operations to
type parameters outside of the data structures’ control results in some
unpleasant performance losses that would not occur in a hand-written equivalent.
Some operations, like copying between, or comparing, slices, can be accelerated
with partial-element access, but require knowledge of the O
ordering type to
provide a semantic interpretation of register contents. Language-level
specialization could allow writing override impl
blocks, like this:
#![allow(unused)] fn main() { impl<T, O> BitSlice<T, O> where T: BitStore, O: BitOrder, { fn eq(&self, other: &Self) -> bool { todo!("baseline") } } impl<T> BitSlice<T, Lsb0> where T: BitStore { fn eq(&self, other: &Self) -> bool { todo!("lsb0-accelerated version") } } impl<T> BitSlice<T, Msb0> where T: BitStore { fn eq(&self, other: &Self) -> bool { todo!("msb0-accelerated version") } } }
While waiting on this feature, we can use the compiler’s stable TypeId
API to
simulate access to specialization. By comparing the TypeId
of the O
type
argument to the TypeId
s of bitvec
’s provided orderings, functions can detect
when they are in a monomorphization with an Lsb0
or Msb0
ordering argument
and specialize accordingly. The above block can be replaced with:
#![allow(unused)] fn main() { impl<T, O> BitSlice<T, O> where T: BitStore, O: BitOrder, { fn eq(&self, other: &Self) -> bool { if let (Some(this), Some(that)) = ( self.coerce::<T, Lsb0>(), other.coerce::<T, Lsb0>(), ) { todo!("lsb0-accelerated version") } else if let (Some(this), Some(that)) = ( self.coerce::<T, Msb0>(), other.coerce::<T, Msb0>(), ) { todo!("msb0-accelerated version") } else { todo!("baseline") } } } }
and, during monomorphization, only one branch of the if
stack will be
preserved. The .coerce()
method is defined in slice::specialization
and
provides easy access to a fully-typed value only within the monomorphization
that matches it.
Afterword
Thanks for reading! I hope this book was useful to you.
Please feel both welcomed and encouraged to ask me any clarifying questions, or to file an issue if any of it is confusing or even send a PR if you have improvements!
This has been a work of intensely deep focus and research for three and a half years. It is a complex and difficult topic, and I am absolutely certain that I have information and context that I haven’t sufficiently serialized for everyone else to see.
Thank you for using bitvec
. I am delighted that other people have found my
work both interesting and useful, and I’m glad that I’ve been able to help
people like you write better programs.