Some resources I found helpful
Understanding Latency Hiding on GPUs
A History of Nvidia Stream Multiprocessor
Modal GPU Glossary
Nvidia SM89 Instruction Set Architecture
In Defense of NIR
Terminology
CUDA
PTX
NVIDIA hardware
OpenGL
Vulkan
thread block grid
–
GPU
dispatch
global workgroup
thread block
CTA
SM
(sometimes MP)
work group
(local) workgroup
–
warp
warp scheduler?
–
subgroup
–
thread
CUDA core
invocation/thread
invocation