Prefetch vs VRAM Simulator

Pipeline State

Creates CPU-side batches. If queue is full, loader blocks. Prefetch mostly consumes pinned host RAM.

Prefetch queue occupancy

Moves batch from pinned RAM to VRAM. Fast copy helps, but cannot beat queue bounds or tensor lifetime issues.

VRAM backlog waiting for compute

Consumes VRAM-resident bytes. Slow compute or high backlog capacity can hold tensors longer and push VRAM toward OOM.

VRAM backlog occupancy

Pinned host RAM usage

VRAM usage

Current diagnosis Running baseline simulation...

Loader packets Transfer packets Compute packets

Produced throughput0

Transferred throughput0

Consumed throughput0

Leading bottleneck-