User Tools

Site Tools


skill-tree:adm:1:4:1:3:b

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
skill-tree:adm:1:4:1:3:b [2022/10/10 14:40] agerbesskill-tree:adm:1:4:1:3:b [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1
Line 1: Line 1:
-# ADM1.4.1.3-B Manage cluster nodes 
-#### Maintainer: Markus Hilger, Peter Grossöhme, HPC Engineers @ Megware 
- 
-#Background 
-During the operation of an HPC cluster, it is necessary to manually execute commands on several nodes or make temporary changes to the configuration for debugging or testing purposes. In case of problems, any logs or measurements must be collected. Furthermore, regular software updates are necessary. 
- 
-#Aim 
-Students should learn basics of HPC cluster administration. They should know how to execute commands on multiple nodes, collect logs, gather debug information and update operating system software components. 
- 
-#Outcome 
-  * Execute a commands on multiple nodes in parallel using the parallel shell 
-  * Able to update software packages on nodes 
-  * Able to gather logs and debug information from nodes and node service processors 
-  * Control nodes using the node service processor (power on/off, set boot target) 
-  * Able to use the Serial-over-LAN connection for debugging purposes 
- 
-# Subskills