skill-tree:k:1:1:b
Table of Contents
K1.1 System Architectures
An HPC cluster is built out of a few up to many servers that are connected by a high-performance communication network. The servers are called nodes. HPC clusters use batch systems for running compute jobs. Interactive access is usually limited to a few nodes. A typical cluster contains of these parts:
- various system-, hardware-, and I/O-architectures used for supercomputers, i.e. shared memory systems, distributed systems, and cluster systems
- physical hardware; chassis/rack, computer system units, interconnect, power., compute node architecture with memory, local disk, and sockets
- typical architecture of cluster systems consisting of nodes with different roles (e.g. so-called head, management, login, compute, interactive, visualization nodes, etc.)
Which are functioning as:
Login or gateway nodes where you are on after logging into the cluster and where you can work interactively:
- Compute nodes that are the workhorses of a cluster
- Admin or system nodes that work in the background necessary for the operation of the cluster, e.g. for running the batch service, or starting and shutting down compute nodes
- Disk nodes that provide global file systems, i.e. file systems that can be used on all other kinds of nodes
- Special nodes e.g. for data movement, visualization, or pre- and post-processing of large data sets
- Head nodes which can mean login or admin nodes used for management either by a user or an administrator
Learning objectives
- Comprehend there are nodes with several functions
- Comprehend that nodes are connected by a high-performance communication network
- Comprehend that global file systems are available on all nodes of the cluster and are convenient because their files can be accessed directly on all nodes
- Quantitatively, parallel file systems offer higher I/O performance than classic network file systems
- Qualitatively, they allow several processes to write into the same file
- Comprehend the two purposes of the communication network
- Enabling high-speed data communication for parallel applications running on multiple nodes
- Providing a high-speed connection to the disk systems in the cluster
- Understand Share memory systems
- Understand Distributed systems
Subskills
skill-tree/k/1/1/b.txt · Last modified: 2024/09/11 12:30 by 127.0.0.1