Skip to content

Bit Namespace

The Bit namespace provides hardware-accelerated parallel bit manipulation operations. Functions auto-vectorize on arrays, slices, and tensors -- a single Bit.pext() call on an array processes all elements in parallel via SIMD.

No import needed. Bit.* is a builtin namespace available everywhere.

Scalar vs Vectorized Usage

vex
// Scalar: single u64
let extracted = Bit.pext(0b1010_0110, 0b1111_0000)  // 0b1010

// Array: all elements processed in parallel via SIMD
let bitmaps: [u64; 4] = [0b1010_0110, 0b1111_0000, 0b0011_1100, 0b1100_0011]
let masks: [u64; 4] = [0xFF00, 0xFF00, 0xFF00, 0xFF00]
let extracted = Bit.pext(bitmaps, masks)
// All 4 PEXT operations execute in a single SIMD instruction

// Span / Tensor: same auto-vectorization
let data: Span<u64> = getBitstream()
let packed = Bit.pext(data, 0xFF00)   // broadcasts mask, vectorizes

Quick Example

vex
// Extract specific bits from a value using a mask
let data: u64 = 0b1010_1100;
let mask: u64 = 0b1111_0000;

let extracted = Bit.pext(data, mask);  // 0b1010 (bits at mask positions, packed)
let deposited = Bit.pdep(0b1010, mask); // 0b1010_0000 (bits deposited at mask positions)

Parallel Bit Deposit

Scatter bits from source to positions indicated by the mask. Accepts u64, [u64; N], Span<u64>, Tensor<u64>.

FunctionSignatureDescription
Bit.pdep(source, mask)(T, T) → T where T = u64, [u64;N], Span<u64>, Tensor<u64>Deposit bits at mask positions

How PDEP Works

source  = 0b_1011          (4 bits to deposit)
mask    = 0b_1010_0110     (positions to deposit into)
                   ↓↓  ↓↓
result  = 0b_1000_0110     (bits scattered to mask positions)

Each bit from source is placed at the next set bit position in mask, left to right.

Platform Mapping

x86ARM
PDEP (BMI2, 1 cycle Intel / 3 cycles AMD Zen3+)Software fallback (Apple Silicon has no SVE)

Parallel Bit Extract

Gather bits from source at positions indicated by the mask, packed contiguously. Same type support as pdep.

FunctionSignatureDescription
Bit.pext(source, mask)(T, T) → T where T = u64, [u64;N], Span<u64>, Tensor<u64>Extract bits from mask positions

How PEXT Works

source  = 0b_1010_0110
mask    = 0b_1010_0110     (positions to extract from)
                   ↓↓  ↓↓
result  = 0b_1011          (extracted bits, packed together)

Each bit at a set position in mask is extracted and packed into the result, starting from bit 0.

Platform Mapping

x86ARM
PEXT (BMI2, 1 cycle Intel / 3 cycles AMD Zen3+)Software fallback

Use Cases

Compression (Huffman Decoding)

PEXT/PDEP are the secret weapon behind modern compression:

vex
// Fast Huffman decode: extract variable-length code from bitstream
fn decodeSymbol(bitstream: u64, codeLen: u64): u64 {
    let mask = (1 as u64 << codeLen) - 1;
    return Bit.pext(bitstream, mask);
}

Bitmap Indexing

Fast rank/select on bitmaps (used in succinct data structures):

vex
// Count set bits before position (rank query)
fn rank(bitmap: u64, pos: u64): u64 {
    let mask = (1 as u64 << pos) - 1;
    return Math.popcount(bitmap & mask) as u64;
}

// Select the nth set bit position
fn select(bitmap: u64, n: u64): u64 {
    let! b = bitmap;
    let! count: u64 = 0;
    while count < n {
        b = b & (b - 1);  // Clear lowest set bit
        count = count + 1;
    }
    return Math.ctz(b) as u64;
}

Chess Engines (Bitboard Move Generation)

vex
// Extract all set bit positions from a bitboard
fn extractMoves(bitboard: u64): Vec<u64> {
    let! moves = Vec.new<u64>();
    let! bb = bitboard;
    while bb != 0 {
        let sq = Math.ctz(bb) as u64;  // Index of lowest set bit
        moves.push(sq);
        bb = bb & (bb - 1);            // Clear that bit
    }
    return moves;
}

Data Packing

vex
// Pack 4 fields into a u64 using PDEP
fn packFields(a: u64, b: u64, c: u64, d: u64): u64 {
    let! result: u64 = 0;
    result = result | Bit.pdep(a, 0xFF);              // bits 0-7
    result = result | Bit.pdep(b, 0xFF00);             // bits 8-15
    result = result | Bit.pdep(c, 0xFF0000);           // bits 16-23
    result = result | Bit.pdep(d, 0xFF000000);         // bits 24-31
    return result;
}

// Unpack with PEXT
fn unpackA(packed: u64): u64 {
    return Bit.pext(packed, 0xFF);
}

Hardware Support

Functionx86ARMApple Silicon
pdepBMI2 (Haswell+ 2013)Software fallbackSoftware fallback
pextBMI2 (Haswell+ 2013)Software fallbackSoftware fallback

Note: On ARM/Apple Silicon, pdep and pext use efficient software fallback loops. On x86 with BMI2 support, they compile to single instructions.

AMD Zen 1/2 warning: AMD Zen 1 and Zen 2 implement PDEP/PEXT with microcode (~18 cycles). True single-cycle support starts from AMD Zen 3+.

Released under the MIT License.