Tensor and Mask Types
Tensor and Mask are the types that connect user-facing graph code to the SIR and SIMD backends.
The important distinction is this:
- small static tensors line up closely with array and vector lowering
- dynamic tensors are owning fat-pointer values with their own ABI and drop behavior
- masks are the boolean side of that model and have both static and dynamic forms
Layout overview
Current compiler and SIR shape contracts are centered around these mappings:
Tensor<f32, 4> -> Ty::Array(f32, 4) -> <4 x float>
Tensor<f32, 32> -> Ty::Array(f32, 32) -> [32 x float]
Tensor<f32> -> Ty::DynTensor(f32) -> %VexDynTensor { ptr, i64 }
Mask<4> -> Ty::Mask(4) -> <4 x i1>
Mask -> Ty::DynMask -> %VexDynMask { ptr, i64 }That split matters because static and dynamic tensors do not behave like a single interchangeable runtime object.
Static Tensor<T, N>
Static tensors are compile-time sized.
In practice, the compiler lowers them through array machinery first:
- If the total size is at most 64 bytes (e.g.
Tensor<f32, 16>), they map directly to LLVM vector values (<16 x float>) in CPU registers. - If the total size is larger than 64 bytes, they remain as contiguous memory arrays but can still be vectorized via loop optimization metadata.
So Tensor<f32, 4> and Tensor<f32, 8> are good mental models for inline SIMD, while larger static tensors should be thought of as shaped arrays.
Example:
let a: Tensor<f32, 4> = [1.0, 2.0, 3.0, 4.0]
let b: Tensor<f32, 4> = [4.0, 3.0, 2.0, 1.0]
let c = a + bDynamic Tensor<T>
Dynamic tensors lower to an owning %VexDynTensor { ptr, i64 } layout.
The data buffer is heap-managed via the VUMI/VUMM allocator and participates in drop logic. Conversions into Tensor<T> from Span<T> involve an owned copy.
Example:
graph fn relu(x: Tensor<f32>): Tensor<f32> {
return x.select(x, 0.0)
}Tensor & DynTensor API Reference
Both static and dynamic tensors support the following methods and static builders resolved directly by the compiler:
Static Builders
Tensor.arange<T>(start: T, end: T, step: T) -> Tensor<T>Generates a linear sequence of values starting atstartup to (but not including)endwith step sizestep.Tensor.linspace<T>(start: T, end: T, count: usize) -> Tensor<T>Generatescountevenly spaced values over the specified interval[start, end].Tensor.fill<T>(val: T, shape: Vec<i32>) -> Tensor<T>Generates a new tensor with the specifiedshapewhere all elements are initialized toval.Tensor.zeros<T>(shape: Vec<i32>) -> Tensor<T>Generates a tensor filled with zeros (0or0.0).Tensor.ones<T>(shape: Vec<i32>) -> Tensor<T>Generates a tensor filled with ones (1or1.0).
Instance Methods
len() -> usizeReturns the total number of elements in the tensor.is_empty() -> boolReturnstrueif the tensor contains no elements.get(index: usize) -> TReturns the element at the specified index.clone() -> SelfPerforms a deep copy of the tensor's metadata and backing data buffer.ptr() -> *const ()Returns a raw pointer to the underlying contiguous data buffer.copyTo(offset: usize, src: &Tensor<T>) -> SelfCopies elements from thesrctensor intoselfstarting at the target elementoffset. Returnsselffor chaining.gather(indices: Tensor<i64>) -> Tensor<T>Indexed read that collects elements from the source tensor according to the specified index tensor.scatter(indices: Tensor<i64>, values: Tensor<T>) -> Tensor<T>Indexed write that updates the tensor at the specified indices with the corresponding values.scatter_add / scatter_max / scatter_min / scatter_mulAtomic/combining scatter operations that resolve conflicts by adding, taking the min/max, or multiplying values at duplicate indices.sort() -> Self/sortDesc() -> SelfSorts the tensor in ascending or descending order.reverse() -> SelfReverses the order of elements along the primary dimension.cumsum() -> Self/prefixSum() -> SelfComputes the cumulative sum (parallel prefix scan) of elements.cumprod() -> SelfComputes the cumulative product of elements.argsort() -> Tensor<i64>Computes the indices that would sort the tensor, returning a tensor of index integers.argmin() -> i64/argmax() -> i64Returns the flat index of the minimum or maximum value in the tensor.
Mask<N> and Mask
Masks represent boolean results from comparisons and masking operations.
Mask<N> maps directly to hardware mask registers (e.g. AVX-512 K-registers), while the dynamic Mask maps to a %VexDynMask { ptr, i64 } fat-pointer layout.
Mask API Reference
select<T>(true_val: Tensor<T>, false_val: Tensor<T>) -> Tensor<T>Returns a new tensor with elements chosen fromtrue_valwhere the mask istrue, andfalse_valwhere the mask isfalse.toBitmask() -> i64Packs the boolean lanes of the mask into the bits of a 64-bit integer.toBitmaskAt(offset: usize) -> i64Packs boolean lanes starting at the specified elementoffsetinto a 64-bit integer.firstSet() -> i32Returns the index of the firsttruelane in the mask. Returns-1if all lanes arefalse.countBits() -> i32Returns the total number oftruelanes (population count).any() -> boolReturnstrueif at least one lane in the mask istrue.all() -> boolReturnstrueif all lanes in the mask aretrue.clearBit(index: usize) -> i32Clears the mask bit atindexand returns the updated bitmask integer.
Example:
let mask = data > 0.0
if mask.any() {
let first = mask.firstSet()
$println("First positive value index: {}", first)
}Hardware Lowering Summary
The compiler routes target workloads based on shapes and constraints:
- Small static tensors/masks are lowered directly to hardware vector registers and operations.
- Dynamic tensors/masks use standard library runtime calls and GPU/Metal dispatch kernels for bulk parallel operations.