Previous: Park construction observation journal
Model Training Memory Simulator
2026-02-08
What This Visualizes
This simulator models a simplified training input pipeline as three stages:
- Data loading into a CPU-side prefetch queue (often pinned host memory).
- Host-to-device transfer into a GPU-side VRAM backlog queue.
- GPU compute consuming queued batches.
Each stage has a throughput and each queue has a capacity. The key idea is that memory pressure is created by mismatch between rates, not just by one parameter in isolation.
Tradeoffs It Tries To Show
- Larger prefetch can improve utilization, but it increases pinned RAM usage.
- Faster loading helps only if transfer and compute can keep up.
- Faster transfer helps only if data is available and compute can drain VRAM.
- Larger VRAM backlog capacity can smooth bursts, but it can also increase VRAM residency.
- Bigger batch size raises memory footprint everywhere at once (CPU queue, transfer payload, GPU queue).
Practical Reading Guide
- If prefetch queue fills and pinned memory saturates, reduce prefetch depth, loader rate, or batch size.
- If the VRAM backlog queue fills and VRAM saturates, reduce backlog depth or batch size, or speed up compute.
- If transfer is starved, the loader is too slow for the downstream pipeline.
- Stable throughput comes from balancing all stages, not maximizing any single slider.
This is a first-order mental model for input-pipeline pressure during training. In real systems, total VRAM also includes relatively stable components (weights, gradients, optimizer state) plus activation/workspace effects.