Table of Contents

SD1.2.6.1-B CUDA C/C++ Programming Fundamentals

Maintainer: Markus Velten, ZIH Tools Team @ TU Dresden

Background

NVIDIA GPUs can be programmed by using CUDA as an extension to the C/C++ programming languages. NVIDIA provides an entire CUDA software environment which includes compilers, profilers, libraries and more. CUDA allows programmers to write highly parallel compute kernels for the execution on NVIDIA GPUs. In order to write CUDA kernels that can utilize the GPU, an understanding of their architecture and its mapping to the CUDA language extension is required.

Aim

Students should learn the fundamental properties of the NVIDIA GPU architecture and its mapping to the CUDA language extension. They should understand the relevance of proper memory transfer handling via unified memory and prefetching. The thread hierarchy should be well understood in order to write correct and scalable codes. Error-handling within CUDA should be understood.

Outcomes

* Understand the CUDA thread hierarchy:

* Understand the nature of unified memory:

* Be able to execute kernels concurrently via streams:

* Be able to query the status of a GPU & potential CUDA errors that occur during program execution

Subskills

* PE2.2.2.6-B NVIDIA Nsight Systems