Previous: Park construction observation journal

Model Training Memory Simulator

2026-02-08

What This Visualizes

This simulator models a simplified training input pipeline as three stages:

Data loading into a CPU-side prefetch queue (often pinned host memory).
Host-to-device transfer into a GPU-side VRAM backlog queue.
GPU compute consuming queued batches.

Each stage has a throughput and each queue has a capacity. The key idea is that memory pressure is created by mismatch between rates, not just by one parameter in isolation.

Tradeoffs It Tries To Show

Larger prefetch can improve utilization, but it increases pinned RAM usage.
Faster loading helps only if transfer and compute can keep up.
Faster transfer helps only if data is available and compute can drain VRAM.
Larger VRAM backlog capacity can smooth bursts, but it can also increase VRAM residency.
Bigger batch size raises memory footprint everywhere at once (CPU queue, transfer payload, GPU queue).

Practical Reading Guide

If prefetch queue fills and pinned memory saturates, reduce prefetch depth, loader rate, or batch size.
If the VRAM backlog queue fills and VRAM saturates, reduce backlog depth or batch size, or speed up compute.
If transfer is starved, the loader is too slow for the downstream pipeline.
Stable throughput comes from balancing all stages, not maximizing any single slider.

This is a first-order mental model for input-pipeline pressure during training. In real systems, total VRAM also includes relatively stable components (weights, gradients, optimizer state) plus activation/workspace effects.

Feedbacks are welome: @czheo