User Tools

Site Tools


skill-tree:sd:1:2:6:1:b

SD1.2.6.1-B CUDA C/C++ Programming Fundamentals

Maintainer: Markus Velten, ZIH Tools Team @ TU Dresden

Background

NVIDIA GPUs can be programmed by using CUDA as an extension to the C/C++ programming languages. NVIDIA provides an entire CUDA software environment which includes compilers, profilers, libraries and more. CUDA allows programmers to write highly parallel compute kernels for the execution on NVIDIA GPUs. In order to write CUDA kernels that can utilize the GPU, an understanding of their architecture and its mapping to the CUDA language extension is required.

Aim

Students should learn the fundamental properties of the NVIDIA GPU architecture and its mapping to the CUDA language extension. They should understand the relevance of proper memory transfer handling via unified memory and prefetching. The thread hierarchy should be well understood in order to write correct and scalable codes. Error-handling within CUDA should be understood.

Outcomes

* Understand the CUDA thread hierarchy:

  • Address individual threads within the kernel grid
  • Write scalable codes via grid-striding loops
  • Configure and modify kernel launch configurations

* Understand the nature of unified memory:

  • Understand the necessity for memory transfers
  • Understand the functional principles of unified memory
  • Understand the potential performance hazards & ways to mitigate those

* Be able to execute kernels concurrently via streams:

  • Use streams to fully utilize the GPU
  • Use streams to hide memory transfer times

* Be able to query the status of a GPU & potential CUDA errors that occur during program execution

Subskills

skill-tree/sd/1/2/6/1/b.txt · Last modified: 2022/10/06 15:40 by 127.0.0.1