diff --git a/Exercises/assignment3.md b/Exercises/assignment3.md index de3973e..b70c044 100644 --- a/Exercises/assignment3.md +++ b/Exercises/assignment3.md @@ -568,7 +568,7 @@ void processRequests() { auto& hit_line = set.lines.at(hit_line_id); // ADD: Mark as used if it was prefetched - if (hit_line.was_prefetched && bank_req.is_prefetch) { + if (hit_line.was_prefetched && !bank_req.is_prefetch) { hit_line.was_used = true; } diff --git a/README.md b/README.md index 317ad10..0f10171 100644 --- a/README.md +++ b/README.md @@ -18,17 +18,86 @@ Chihyo (Mark) Ahn (Georgia Institute of Technology) Shinnung Jeong (Georgia Institute of Technology) -## Tentative Tutorial Schedule +## Tentative Tutorial and Workshop Schedule | Time | Contents | Presenter | slides | |-------------|---------------------------------------------|-------------------|--------| -| 8:00-8:20 | Intro and GPU background | Hyesoon Kim | | -| 8:20-9:20 | Vortex Microarchitecture and Software Stack | Blaise Tine | | -| 9:20-9:40 | CuPBoP: Running OpenCL and CUDA on Vortex | Chihyo (Mark) Ahn | || -| 9:40-10:00 | Q&A Session | | | -| 10:00-10:20 | Coffee Break | | | -| 10:20-11:00 | Vortex Software | | | -| 11:00-12:00 | Vortex Workshop | | | +| 8:00-8:10 | Intro and GPU background | Hyesoon Kim | | +| 8:10-9:10 | Vortex Microarchitecture and Software Stack | Blaise Tine | | +| 9:10-9:25 | Vortex Compiler and running OpenCL | Shinnung Jeong | | +| 9:25-9:40 | CuPBoP: Running CUDA on Vortex | Chihyo (Mark) Ahn | || +| 9:40-10:00 | Vortex Tutorial Assignment | | | +| 10:00-10:30 | Q&A and Coffee Break | | | +| 10:30-11:40 | Vortex Workshop +| 11:40-12:00 | Review of Tutorial Assignments | | | + + +# Vortex Workshop Info + +--- + +## Portable Vortex HDL for FPGA and ASIC Technologies +**Presenters:** Jamie Kelly (enVention, LLC) and Scott O’Malia (enVention, LLC) + +### Abstract +In this work, we analyze the open-source Vortex GPGPU HDL source code for portability between FPGA and ASIC target technologies. Beyond coding HDL source for legal RTL synthesis, several architecture aspects should be considered to ease technology retargeting without significant HDL source changes. Clock and reset trees and fanout control can be planned at the HDL level. Required sync and async reset types can vary with target technology, warranting a generic, global method to automatically handle each case. Special handling of clock and reset domain crossings may be required. Well-planned design hierarchy can aid floorplanning for back-end tools. Technology-specific leaf cells, such as static RAMs and arithmetic multipliers, should be wrapped using a common interface and parameter set. RAM wrappers can contain special reset control state machines to directly initialize RAM contents for many ASIC technologies that do not support this function. HDL logic pipelining and technology timing closure rely heavily on the use of flip-flop cells for delay. FPGA and ASIC flip-flop area costs are quite different, especially when complex scan-style cells are needed for ASIC manufacturing testing. The ratio of combinatorial look-up tables to flip-flops is examined. The Vortex GPGPU HDL source is analyzed for each of these cited aspects, and the results and suggested improvements are presented in this paper. + +### Bios +**Jamie Kelly** +Jamie Kelly (MS EE ‘97, MS Physics ‘07) has worked in hardware, software, FPGA, and ASIC development for more than 25 years. He has expertise in telecommunications/networking, packet switching/queuing, Linux kernel/device drivers, and end-to-end FPGA/ASIC design. Jamie currently serves as the Director of Hardware Engineering at enVention, LLC in Huntsville, Alabama, USA. + +**Scott O’Malia** +Scott O'Malia (BS MET ’09, BS EE ’13) is an Electrical Engineer at enVention, LLC with over 10 years of experience in FPGA verification, embedded systems, and safety-critical hardware/software design. His expertise includes HDL development and verification, applying DO-178/DO-254 rigor for flight-critical applications, and advancing vendor-independent FPGA verification solutions for long-term sustainment. + +--- + +## A Configurable Mixed-Precision Fused Dot Product Unit for GPGPU Tensor Computation +**Presenters:** Nikhil Rout (Vellore Institute of Technology) and Blaise Tine (UCLA) + +Nikhil Rout is a 4th-year undergraduate student in ECE at the Vellore Institute of Technology, Chennai. He has been a research intern with the Vortex GPGPU group since summer 2025, advised by Prof. Blaise Tine. His research interests lie in GPGPUs and DNN accelerators at the microarchitecture abstraction level. + + +### Abstract +There has been increasing interest in developing and accelerating mixed-precision Matrix-Multiply-Accumulate operations in GPGPUs for Deep Learning workloads. However, existing open-source RTL implementations of inner dot product units rely on discrete arithmetic units, leading to suboptimal throughput and poor resource utilization. To address these challenges, we propose a scalable mixed-precision dot product unit that integrates floating-point and integer arithmetic pipelines within a singular fused architecture, implemented as part of the open-source RISC-V based Vortex GPGPU’s Tensor Core Unit extension. Our design supports low-precision multiplication in FP16/BF16/FP8/BF8/INT8/UINT4 formats and higher-precision accumulation in FP32/INT32, with an extensible framework for adding and evaluating other custom representations in the future. Experimental results demonstrate 4-cycle operation latency at 362.2 MHz clock frequency on the AMD Xilinx Alveo U55C FPGA, delivering an ideal filled pipeline throughput of 11.948 GFlops in a 4-thread configuration. + +--- + +## Virgo and Radiance: Enabling Scalable Matrix Units and an SoC-based GPU Platform with Vortex +**Presenter:** Hansung Kim (UC Berkeley) + +### Abstract +Modern GPUs integrate specialized matrix units like Tensor Cores to accelerate +deep learning. However, their tight coupling with SIMT cores limits tensor +operation size due to register file and bandwidth constraints, hindering both +scalability and energy efficiency. + +To address this limitation, We present Virgo, a GPU microarchitecture that +integrates matrix units at the SIMT cluster level. By physically disaggregating +the matrix units from SIMT cores, Virgo supports larger tiles, lowers +instruction overhead, and improves data reuse and energy efficiency. Leveraging +the Vortex HW/SW stack, Virgo demonstrates full-system design and evaluation +for fused kernels such as FlashAttention. + +Building on top of Virgo and Vortex, we introduce our recent work on Radiance, +an ASIC SoC–based GPU platform within Chipyard. Radiance features the new +Chisel-based Muon SIMT core which improves PPA via a redesigned issue pipeline, +dynamic warp occupancy support, and an extended ISA that expands register +capacity while reducing stack accesses. We discuss tentative plans for +a silicon tape-out. + +### Bio +**Hansung Kim** +Hansung Kim is a Ph.D. candidate at UC Berkeley, advised by Prof. Sophia +Shao. His research focuses on GPU microarchitecture and hardware/software +co-design, with technical expertise in RTL implementation, GPU kernel +development and SoC integration. He is currently on the job market for +industry positions and welcomes opportunities to connect. + + +--- + + + ## Tutorial Assignments @@ -47,6 +116,11 @@ Provided are seven hands-on tutorial assignments covering various aspects of Vor ### Remote Access A terminal interface hosted by the [CRNCH Rogues Gallery](https://crnch-rg.cc.gatech.edu/) is provided. [Instructions can be found here](./REMOTE_ACCESS.md). + +### Apptainer +See the [Apptainer instructions](./apptainer/README.md) for how to set up the apptainer and run simulation for Vortex. + + ### Docker (Experimental) See the [Docker instructions](./docker/README.md) for how to set up a Docker image for Vortex. @@ -56,7 +130,10 @@ If you would like to set up Vortex on your own system, [instructions can be foun ## Relevant Repos * [Vortex](https://github.com/vortexgpgpu/vortex) +* * [Vortex Toolchain](https://github.com/vortexgpgpu/vortex-toolchain-prebuilt) +* [Cupbop on Vortex] (https://github.com/cupbop/CuPBoP_Vortex) + ## Mailing list For tutorial info please join https://docs.google.com/forms/d/1r8E-Yo5NwA45Hi3-kEwte4AxK0mBsYDwgjM6Bul4so0/edit diff --git a/apptainer/INSTALL.md b/apptainer/INSTALL.md new file mode 100644 index 0000000..2827b4d --- /dev/null +++ b/apptainer/INSTALL.md @@ -0,0 +1,114 @@ +Apptainer (formerly Singularity) is a container system optimized for HPC and secure scientific environments, so installation varies by OS family. + + +## 🐧 1. Ubuntu / Debian +#### ✅ Option A — Install via .deb package + +``` +sudo apt update +sudo apt install -y build-essential libseccomp-dev pkg-config squashfs-tools cryptsetup wget + +# Download the latest stable release +wget https://github.com/apptainer/apptainer/releases/download/v1.2.2/apptainer_1.2.2_amd64.deb + +# Install +sudo apt install ./apptainer_1.2.2_amd64.deb + +# Verify +apptainer --version +``` + +#### ✅ Option B — Build from source (if .deb not available) +``` +sudo apt update +sudo apt install -y build-essential uuid-dev libseccomp-dev pkg-config squashfs-tools cryptsetup wget git golang-go + +cd /tmp +wget https://github.com/apptainer/apptainer/releases/download/v1.2.2/apptainer-1.2.2.tar.gz +tar -xzf apptainer-1.2.2.tar.gz +cd apptainer-1.2.2 +./mconfig +make -C builddir +sudo make -C builddir install + +# Verify +apptainer --version +``` + + +## 🧱 2. RHEL / AlmaLinux / Rocky / CentOS +#### ✅ Option A — Install via EPEL (Recommended) +``` +sudo dnf install -y epel-release +sudo dnf config-manager --set-enabled crb +sudo dnf install -y apptainer +``` + +Works for RHEL 8/9, AlmaLinux, Rocky Linux, CentOS Stream, etc. + +#### ✅ Option B — Build from source +``` +sudo dnf groupinstall -y "Development Tools" +sudo dnf install -y golang libseccomp-devel squashfs-tools cryptsetup wget git pkg-config make + +cd /tmp +wget https://github.com/apptainer/apptainer/releases/download/v1.2.2/apptainer-1.2.2.tar.gz +tar -xzf apptainer-1.2.2.tar.gz +cd apptainer-1.2.2 +./mconfig +make -C builddir +sudo make -C builddir install + +# Verify +apptainer --version +``` + + +## 🍎 3. macOS + +Apptainer doesn’t run natively on macOS — it’s a Linux-only system (needs Linux kernel namespaces). +But you can run it using Linux virtual environments: + +#### ✅ Option A — Using Homebrew + Apptainer inside a Linux VM + +Install Homebrew and a lightweight Linux VM (like multipass): + +``` +brew install --cask multipass +multipass launch --name ubuntu --cpus 4 --mem 4G --disk 20G +multipass shell ubuntu +``` + +Inside the VM, follow the Ubuntu install steps above. + +#### ✅ Option B — Using Docker + Apptainer inside container +``` +docker run -it --privileged ghcr.io/apptainer/apptainer:latest bash + +# Verify +apptainer --version +``` + + +## 🪟 4. Windows 10/11 + +Apptainer requires Linux namespaces → it cannot run directly on native Windows. + +#### ✅ Option A — Use WSL2 (Windows Subsystem for Linux) + +Enable WSL2 and install Ubuntu: +``` +wsl --install -d Ubuntu +``` +Inside WSL Ubuntu terminal: Follow the Ubuntu install steps above (either using Ubuntu Debian package / Build from source). + + +#### ✅ Option B — Use a full Linux VM (VirtualBox, VMware, or WSL2 Ubuntu) + +If you need GPU or privileged access, use a full Linux VM with Apptainer installed inside. + + + + +### Reference: +https://apptainer.org/docs/admin/main/installation.html \ No newline at end of file diff --git a/apptainer/README.md b/apptainer/README.md new file mode 100644 index 0000000..234b3a9 --- /dev/null +++ b/apptainer/README.md @@ -0,0 +1,77 @@ +# Apptainer Build Process + +Prerequisite: Install `apptainer` package on your machine by following [INSTALL.md](./INSTALL.md) + + +# Clone Vortex repo + +Create tools directory for mounting vortex-toolchains onto the apptainer +``` +$ mkdir -p tools +``` + +``` +$ git clone --depth=1 --recursive https://github.com/vortexgpgpu/vortex.git +``` + +Go to `apptainer` directory and build the vortex apptainer + +``` +$ ls + tools vortex + +$ cd vortex/miscs/apptainer + +$ apptainer build --no-https vortex.sif vortex.def + +``` + +To start the apptainer, +``` +apptainer shell --fakeroot --cleanenv --writable-tmpfs --bind ../../../vortex:/home/vortex --bind ../../../tools:/home/tools vortex.sif +``` + + +# Vortex Simulation inside Apptainer + +Go to the bind of vortex repo, +``` +Apptainer> cd /home/vortex +Apptainer> ./ci/install_dependencies.sh +Apptainer> mkdir build +Apptainer> cd build +Apptainer> ../configure --xlen=32 --tooldir=$HOME/tools + + +Skip the below 3 steps, if toolchains are already present in the $HOME/tools; (These steps are compulsory while getting the setup ready for the first time) +Apptainer> sed -i 's/\btar /tar --no-same-owner /g' ci/toolchain_install.sh +Apptainer> ./ci/toolchain_install.sh --all +Apptainer> sed -i 's/\btar --no-same-owner /tar /g' ci/toolchain_install.sh + +Apptainer> ls $HOME/tools/ +libc32 libc64 libcrt32 libcrt64 llvm-vortex pocl riscv32-gnu-toolchain riscv64-gnu-toolchain sv2v verilator yosys + +Apptainer> source ./ci/toolchain_env.sh +Apptainer> verilator --version +``` + + +### Running SIMX, RTLSIM and XRTSIM +``` +Compile the Vortex codebase +Apptainer> make -s + +Run the programs by specifying the appropriate driver as shown below: + +SIMX +Apptainer> ./ci/blackbox.sh --cores=2 --app=demo --driver=simx + +RTLSIM +Apptainer> ./ci/blackbox.sh --cores=2 --app=demo --driver=rtlsim + +XRTSIM +Apptainer> ./ci/blackbox.sh --cores=2 --app=demo --driver=xrt + + +Apptainer> make -C runtime/ clean +```