Name	Name	Last commit message	Last commit date
parent directory ..
01_fmha	01_fmha
02_layernorm2d	02_layernorm2d
03_gemm	03_gemm
04_img2col	04_img2col
05_reduce	05_reduce
06_permute	06_permute
09_topk_softmax	09_topk_softmax
10_rmsnorm2d	10_rmsnorm2d
11_add_rmsnorm2d_rdquant	11_add_rmsnorm2d_rdquant
12_smoothquant	12_smoothquant
13_moe_sorting	13_moe_sorting
14_moe_smoothquant	14_moe_smoothquant
15_fused_moe	15_fused_moe
16_batched_gemm	16_batched_gemm
17_grouped_gemm	17_grouped_gemm
18_flatmm	18_flatmm
19_gemm_multi_d	19_gemm_multi_d
20_grouped_convolution	20_grouped_convolution
21_elementwise	21_elementwise
22_gemm_multi_abd	22_gemm_multi_abd
35_batched_transpose	35_batched_transpose
36_pooling	36_pooling
37_transpose	37_transpose
38_block_scale_gemm	38_block_scale_gemm
39_copy	39_copy
40_streamk_gemm	40_streamk_gemm
41_batched_contraction	41_batched_contraction
CMakeLists.txt	CMakeLists.txt
README.md	README.md
remod.py	remod.py

Name

Last commit message

Last commit date

11_add_rmsnorm2d_rdquant

20_grouped_convolution

41_batched_contraction

CMakeLists.txt

README.md

remod.py

CK Tile Example Suite

This directory contains a comprehensive suite of examples demonstrating the CK Tile programming model for high-performance GPU kernels. Each example illustrates a key deep learning or HPC operation, implemented using tile-based parallelism, modular pipelines, and data movement policy.

What is CK Tile?

CK Tile is a composable GPU programming API that expresses kernels as a composition of "tiles"—rectangular blocks of computation and data movement. The pipeline & policy orchestrates data movement (global <-> LDS <-> registers), computation, and synchronization, enabling high efficiency and flexibility.

Example Index

Example	Operation	Description
01_fmha	Fused Multi-Head Attention	Tile-based FMHA with masking, quantization, and epilogue fusion
02_layernorm2d	LayerNorm2D	Blockwise layer normalization with fusion and quantization
03_gemm	GEMM	Matrix multiplication with tilewise parallelism
04_img2col	im2col	Image-to-column transformation for GEMM-based convolution
05_reduce	Reduction	Tilewise sum, max, mean reductions
06_permute	Permute	Generic tensor permutation (up to rank-8)
09_topk_softmax	TopK-Softmax	Rowwise softmax and top-k selection for MoE gating
10_rmsnorm2d	RMSNorm2D	Root mean square normalization for LLMs
11_add_rmsnorm2d_rdquant	Add + RMSNorm2D + RDQuant	Fused add, RMSNorm, and rowwise dynamic quantization
12_smoothquant	SmoothQuant	Per-channel scaling and quantization for int8 inference
13_moe_sorting	MoE Sorting	Token-to-expert rearrangement for MoE dispatch
14_moe_smoothquant	MoE-SmoothQuant	Expert-dependent quantization fused with top-k selection
15_fused_moe	Fused MoE	End-to-end fused MoE block: sorting, group-GEMM, activation, weighting
16_batched_gemm	Batched GEMM	Parallel computation of multiple GEMMs
17_grouped_gemm	Grouped GEMM	Multiple independent GEMMs with different shapes
18_flatmm	FLATMM	Flattened matrix multiplication for packed layouts
19_gemm_multi_d	Multi-D GEMM	GEMM with multiple side inputs (bias, residual, etc.)
35_batched_transpose	Batched Transpose	NCHW <-> NHWC and other layout conversions
36_copy	Copy	Minimal example for tile-based memory movement
37_transpose	Block Transpose	High-performance tiled transpose for large tensors

Technical Highlights

Tile Distribution: See include/ck_tile/tile_program/tile_distribution/ for mapping tiles to thread blocks.
Block Tile Pipelines: See include/ck_tile/tile_program/block_tile_pipeline/ for memory/computation pipelines.
Policies and Utilities: Many examples use custom policies for tile/block size and memory access.

How to Build & Run

mkdir build && cd build
sh ../script/cmake-ck-dev.sh ../ <arch>
make -j

Each example produces its own executable in build/bin/.

Learning and Extending

Start Simple: Try 03_gemm or 36_copy to learn tile basics.
Explore Fusion: See 11_add_rmsnorm2d_rdquant, 15_fused_moe, or 14_moe_smoothquant for advanced fusion.
Experiment: Modify tile sizes, layouts, or pipelines to explore performance and flexibility.

References

Back to Composable Kernel Examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

CK Tile Example Suite

What is CK Tile?

Example Index

Technical Highlights

How to Build & Run

Learning and Extending

References

FilesExpand file tree

ck_tile

Directory actions

More options

Directory actions

More options

Latest commit

History

ck_tile

Folders and files

parent directory

README.md

CK Tile Example Suite

What is CK Tile?

Example Index

Technical Highlights

How to Build & Run

Learning and Extending

References