🧠 Memory Model Specification
Hybrid Memory Management for AI/ML Workloads
Overview
STARK employs a hybrid memory management system that combines ownership-based memory safety with garbage collection for high-level objects, optimized specifically for AI/ML workloads.
Memory Management Strategy
STARK uses a dual-tier memory management system:
- Stack-Allocated & Owned Memory - For performance-critical data (tensors, primitives)
- Garbage-Collected Memory - For high-level objects and complex data structures
Memory Management Examples
// Stack-allocated and owned (zero-cost)
let tensor = Tensor::::zeros(); // Stack metadata, owned data
let array = [1, 2, 3, 4, 5]; // Stack-allocated array
// Garbage-collected (managed)
let model = torch::load_model("resnet50.pt"); // GC-managed complex object
let dataset = Dataset::from_csv("data.csv"); // GC-managed with lazy loading
let cache: Map = Map::new(); // GC-managed collections
Ownership and Borrowing System
Ownership Rules
STARK follows ownership principles similar to Rust but with relaxed rules for AI/ML ergonomics:
- Each value has exactly one owner
- When the owner goes out of scope, the value is dropped
- There can be multiple immutable borrows OR one mutable borrow
- Borrows must be valid for their entire lifetime
Ownership Examples
fn ownership_example() {
let tensor1 = Tensor::rand([1000, 1000]);
let tensor2 = tensor1; // Move (tensor1 is no longer valid)
// print(tensor1.shape()); // ERROR: tensor1 moved
print(tensor2.shape()); // OK: tensor2 owns the data
let tensor3 = tensor2.clone(); // Explicit deep copy
print(tensor2.shape()); // OK: tensor2 still owns its data
print(tensor3.shape()); // OK: tensor3 owns separate data
}
Borrowing and References
// Immutable borrowing
fn immutable_borrows() {
let tensor = Tensor::ones([512, 512]);
let ref1 = &tensor; // Immutable borrow
let ref2 = &tensor; // Multiple immutable borrows OK
print(f"Shape: {ref1.shape()}"); // OK
print(f"Device: {ref2.device()}"); // OK
// tensor.fill_(0.0); // ERROR: Cannot mutate while borrowed
}
// Mutable borrowing
fn mutable_borrows() {
let mut tensor = Tensor::zeros([256, 256]);
let ref_mut = &mut tensor; // Mutable borrow
// let ref2 = &tensor; // ERROR: Cannot borrow while mutably borrowed
ref_mut.fill_(1.0); // OK: Mutate through mutable reference
// print(tensor.sum()); // ERROR: Cannot use tensor while borrowed
}
Garbage Collection System
Hybrid GC Design
STARK uses a generational, concurrent, low-latency garbage collector for managed objects:
GC-Managed Types
// GC-managed types (automatically allocated on GC heap)
struct Model {
layers: Vec, // GC-managed vector
optimizer: Box, // GC-managed box
metadata: Map // GC-managed map
}
struct Dataset {
data_source: DataSource, // GC-managed trait object
transforms: Vec, // GC-managed vector of closures
cache: LRUCache // GC-managed cache
}
// Mixed ownership (owned data + GC references)
struct TrainingLoop {
model: Gc, // GC reference to model
dataset: Gc, // GC reference to dataset
batch_size: i32, // Stack-allocated primitive
learning_rate: f32, // Stack-allocated primitive
current_batch: Tensor // Owned tensor data
}
GC Configuration
// GC modes for different workloads
enum GCMode {
Throughput, // Optimize for throughput (larger heaps, less frequent GC)
LowLatency, // Optimize for low latency (concurrent GC, small pauses)
Memory, // Optimize for memory usage (frequent GC, compact heaps)
Training, // Optimized for ML training (batch-aware collection)
Inference // Optimized for inference (predictable, low-latency)
}
// Training-specific GC integration
fn training_loop_with_gc() {
// Configure GC for training workload
gc::configure(GCConfig {
mode: GCMode::Training,
max_heap_size: Some(8 * 1024 * 1024 * 1024), // 8GB
young_gen_size: 512 * 1024 * 1024, // 512MB
concurrent_threads: 4,
pause_target: Duration::milliseconds(10),
throughput_target: 0.95
});
for epoch in 0..100 {
for batch in dataset.batches(batch_size) {
// Training step with owned tensors (no GC pressure)
let predictions = model.forward(&batch.inputs);
let loss = loss_fn(&predictions, &batch.targets);
// GC cleanup between batches
if batch.id % 10 == 0 {
gc::collect_young(); // Quick young generation cleanup
}
}
// Full GC between epochs
gc::collect();
}
}
Memory Safety Guarantees
Compile-Time Safety Checks
The borrow checker prevents common memory errors:
- Use after free prevention - Values cannot be used after being moved or dropped
- Double free prevention - Values can only be dropped once
- Dangling pointer prevention - References cannot outlive the data they reference
- Data race prevention - Mutable access is exclusive
- Iterator invalidation prevention - Collections cannot be modified while being iterated
Memory Safety Examples
fn memory_safety_examples() {
// 1. Use after free prevention
let tensor = Tensor::rand([100, 100]);
drop(tensor);
// print(tensor.shape()); // ERROR: tensor used after drop
// 2. Double free prevention
let tensor = Tensor::rand([100, 100]);
drop(tensor);
// drop(tensor); // ERROR: tensor already dropped
// 3. Dangling pointer prevention
let reference: &Tensor;
{
let tensor = Tensor::rand([100, 100]);
reference = &tensor;
}
// print(reference.shape()); // ERROR: tensor dropped, reference invalid
// 4. Data race prevention
let mut tensor = Tensor::zeros([100, 100]);
let ref1 = &tensor;
// let ref2 = &mut tensor; // ERROR: cannot borrow mutably while borrowed
}
Performance Characteristics
Memory Allocation Performance
Allocation Type | Performance | Use Case | Example |
---|---|---|---|
Stack allocation | Fastest (microseconds) | Small, fixed-size data | [f32; 1024] |
Owned heap allocation | Fast (milliseconds) | Large tensors, predictable | Tensor::zeros() |
GC allocation | Moderate overhead | Complex objects, graphs | Model::new() |
Arena allocation | Fastest for batches | Temporary batch operations | with_arena() |
GC Performance Tuning
// Configuration for inference workloads
gc::configure(GCConfig {
mode: GCMode::LowLatency,
pause_target: Duration::milliseconds(5), // Max 5ms pauses
concurrent_threads: 2, // Concurrent collection
throughput_target: 0.99 // 99% app throughput
});
// Configuration for training workloads
gc::configure(GCConfig {
mode: GCMode::Training,
pause_target: Duration::milliseconds(20), // Can tolerate longer pauses
concurrent_threads: 4, // More GC threads
throughput_target: 0.95 // 95% app throughput
});
Key Benefits
🚀 Performance
Zero-cost abstractions for tensor operations with predictable memory layout
🛡️ Safety
Compile-time memory safety without runtime overhead for critical paths
🧠 AI-Optimized
Hybrid system balances tensor performance with ML object graph management
⚙️ Configurable
Different GC modes for training, inference, and production workloads