Skip to content

Commit 2ed6578

Browse files
committed
Update blog
1 parent 7ef8ba9 commit 2ed6578

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: "Post from Oct 27, 2025"
3+
date: 2025-10-27T10:14:42
4+
slug: "1761560082"
5+
tags:
6+
- gpu
7+
- ai
8+
- sdkit
9+
---
10+
11+
As a note to myself, a possible intuition for understanding GPU memory hierarchy (and the performance penalty for data transfer between various layers):
12+
1. CPU (host) to GPU (device) is like travelling overnight between two cities. The CPU city is like the "headquarters", and contains a mega-sized warehouse of parts (think football field sizes), also known as 'Host memory'.
13+
2. Each GPU is like a different city, containing its own warehouse outside the city, also known as 'Global Memory'. This warehouse stockpiles whatever it needs from the headquarters city (CPU).
14+
3. Each SM/Core/Tile is a factory located in different areas of the city. Each factory contains a small warehouse (shed) for stockpiling whatever inventory it needs, also known as 'Shared Memory'.
15+
4. Each warp is a bulk stamping machine inside the factory, producing 32 items in one shot. There's a tray next to each machine, also known as 'Registers'. This tray is used for keeping stuff temporarily for each stamping process.
16+
17+
This analogy helps me understand the scale and performance penalty for data transfers.
18+
19+
For e.g. reading constantly from the Global Memory is like driving between the factory and the warehouse outside the city each time (with the traffic of city roads). This is much slower than going to the shed inside the factory (i.e. Shared Memory), and much much slower than just sticking your hand into the tray next to your stamping machine (i.e. Registers). And reading from the Host Memory (CPU) is like taking an overnight trip to another city.
20+
21+
Notes:
22+
1. Apple Silicon and Mobile devices use a concept of "unified memory", so they don't have an overnight trip between cities. You can think of Apple Silicon as neighboring cities that almost overlap, like twin cities in some countries.
23+
2. Mobile devices usually don't have a concept of shared memory, so their factories don't have warehouse sheds.

0 commit comments

Comments
 (0)