Bit Namespace

The Bit namespace provides hardware-accelerated parallel bit manipulation operations. Functions auto-vectorize on arrays, slices, and tensors -- a single Bit.pext() call on an array processes all elements in parallel via SIMD.

No import needed. Bit.* is a builtin namespace available everywhere.

Scalar vs Vectorized Usage

vex

// Scalar: single u64
let extracted = Bit.pext(0b1010_0110, 0b1111_0000)  // 0b1010

// Array: all elements processed in parallel via SIMD
let bitmaps: [u64; 4] = [0b1010_0110, 0b1111_0000, 0b0011_1100, 0b1100_0011]
let masks: [u64; 4] = [0xFF00, 0xFF00, 0xFF00, 0xFF00]
let extracted = Bit.pext(bitmaps, masks)
// All 4 PEXT operations execute in a single SIMD instruction

// Span / Tensor: same auto-vectorization
let data: Span<u64> = getBitstream()
let packed = Bit.pext(data, 0xFF00)   // broadcasts mask, vectorizes

Quick Example

vex

// Extract specific bits from a value using a mask
let data: u64 = 0b1010_1100;
let mask: u64 = 0b1111_0000;

let extracted = Bit.pext(data, mask);  // 0b1010 (bits at mask positions, packed)
let deposited = Bit.pdep(0b1010, mask); // 0b1010_0000 (bits deposited at mask positions)

Parallel Bit Deposit

Scatter bits from source to positions indicated by the mask. Accepts u64, [u64; N], Span<u64>, Tensor<u64>.

Function	Signature	Description
`Bit.pdep(source, mask)`	`(T, T) → T` where `T = u64, [u64;N], Span<u64>, Tensor<u64>`	Deposit bits at mask positions

How PDEP Works

source  = 0b_1011          (4 bits to deposit)
mask    = 0b_1010_0110     (positions to deposit into)
                   ↓↓  ↓↓
result  = 0b_1000_0110     (bits scattered to mask positions)

Each bit from source is placed at the next set bit position in mask, left to right.

Platform Mapping

x86	ARM
`PDEP` (BMI2, 1 cycle Intel / 3 cycles AMD Zen3+)	Software fallback (Apple Silicon has no SVE)

Parallel Bit Extract

Gather bits from source at positions indicated by the mask, packed contiguously. Same type support as pdep.

Function	Signature	Description
`Bit.pext(source, mask)`	`(T, T) → T` where `T = u64, [u64;N], Span<u64>, Tensor<u64>`	Extract bits from mask positions

How PEXT Works

source  = 0b_1010_0110
mask    = 0b_1010_0110     (positions to extract from)
                   ↓↓  ↓↓
result  = 0b_1011          (extracted bits, packed together)

Each bit at a set position in mask is extracted and packed into the result, starting from bit 0.

Platform Mapping

x86	ARM
`PEXT` (BMI2, 1 cycle Intel / 3 cycles AMD Zen3+)	Software fallback

Use Cases

Compression (Huffman Decoding)

PEXT/PDEP are the secret weapon behind modern compression:

vex

// Fast Huffman decode: extract variable-length code from bitstream
fn decodeSymbol(bitstream: u64, codeLen: u64): u64 {
    let mask = (1 as u64 << codeLen) - 1;
    return Bit.pext(bitstream, mask);
}

Bitmap Indexing

Fast rank/select on bitmaps (used in succinct data structures):

vex

// Count set bits before position (rank query)
fn rank(bitmap: u64, pos: u64): u64 {
    let mask = (1 as u64 << pos) - 1;
    return Math.popcount(bitmap & mask) as u64;
}

// Select the nth set bit position
fn select(bitmap: u64, n: u64): u64 {
    let! b = bitmap;
    let! count: u64 = 0;
    while count < n {
        b = b & (b - 1);  // Clear lowest set bit
        count = count + 1;
    }
    return Math.ctz(b) as u64;
}

Chess Engines (Bitboard Move Generation)

vex

// Extract all set bit positions from a bitboard
fn extractMoves(bitboard: u64): Vec<u64> {
    let! moves = Vec.new<u64>();
    let! bb = bitboard;
    while bb != 0 {
        let sq = Math.ctz(bb) as u64;  // Index of lowest set bit
        moves.push(sq);
        bb = bb & (bb - 1);            // Clear that bit
    }
    return moves;
}

Data Packing

vex

// Pack 4 fields into a u64 using PDEP
fn packFields(a: u64, b: u64, c: u64, d: u64): u64 {
    let! result: u64 = 0;
    result = result | Bit.pdep(a, 0xFF);              // bits 0-7
    result = result | Bit.pdep(b, 0xFF00);             // bits 8-15
    result = result | Bit.pdep(c, 0xFF0000);           // bits 16-23
    result = result | Bit.pdep(d, 0xFF000000);         // bits 24-31
    return result;
}

// Unpack with PEXT
fn unpackA(packed: u64): u64 {
    return Bit.pext(packed, 0xFF);
}

Hardware Support

Function	x86	ARM	Apple Silicon
`pdep`	BMI2 (Haswell+ 2013)	Software fallback	Software fallback
`pext`	BMI2 (Haswell+ 2013)	Software fallback	Software fallback

Note: On ARM/Apple Silicon, pdep and pext use efficient software fallback loops. On x86 with BMI2 support, they compile to single instructions.

AMD Zen 1/2 warning: AMD Zen 1 and Zen 2 implement PDEP/PEXT with microcode (~18 cycles). True single-cycle support starts from AMD Zen 3+.

Math.popcount — Count set bits
Math.clz / Math.ctz — Leading/trailing zeros
Math.bswap — Byte-swap (endian conversion)
Crypto Namespace — Cryptographic operations
SIMD — Array-level bit operations

Bit Namespace ​

Scalar vs Vectorized Usage ​

Quick Example ​

Parallel Bit Deposit ​

How PDEP Works ​

Platform Mapping ​

Parallel Bit Extract ​

How PEXT Works ​

Platform Mapping ​

Use Cases ​

Compression (Huffman Decoding) ​

Bitmap Indexing ​

Chess Engines (Bitboard Move Generation) ​

Data Packing ​

Hardware Support ​

Related ​

Bit Namespace

Scalar vs Vectorized Usage

Quick Example

Parallel Bit Deposit

How PDEP Works

Platform Mapping

Parallel Bit Extract

How PEXT Works

Platform Mapping

Use Cases

Compression (Huffman Decoding)

Bitmap Indexing

Chess Engines (Bitboard Move Generation)

Data Packing

Hardware Support

Related