skill-tree:k:2:1:b
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
skill-tree:k:2:1:b [2020/06/24 19:41] – [Outcomes] kai_h | skill-tree:k:2:1:b [2025/04/16 18:30] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | # K2.1-B Performance Frontiers | + | # K2.1 Performance Frontiers |
- | # Background | + | |
- | # Aim | + | |
- | * To provide knowledge about FLOPS (floating point operations per second) which is the key measurement unit for the performance of HPC systems, and its pitfalls | + | |
- | * To provide knowledge about Moore’s law and the Gustafson–Barsis' | + | |
- | * To provide knowledge about the definitions for key terms: speedup, efficiency, and scalability | + | |
- | * To provide knowledge about Amdahl’s law and its significance for performance frontiers in modern HPC | + | |
- | # Outcomes | + | * To provide knowledge about FLOPS (floating-point operations per second) which is the key measurement unit for the performance of HPC systems, and its pitfalls. |
- | * Comprehend that **FLOPS** (Floating Point Operations per Second) as a key measurement is used in two distict | + | * To provide knowledge about Moore’s law and the Gustafson–Barsis' |
- | * To measure the computing power of a computer | + | * To provide knowledge about the definitions for key terms: speedup, efficiency, and scalability. |
- | * To measure the performance of applications | + | * To provide knowledge about Amdahl’s law and its significance for performance frontiers in modern HPC. |
- | * Comprehend that there are Pitfalls when measuring FLOPScomputing power of a computer | + | |
- | * Many scientific applications are memory bound, not CPU bound so the memory buses will become the bottleneck of the application | + | ## Learning |
- | * FLOPS say nothing about the performance of the network or the file system | + | |
- | * FLOP utilization says nothing about the quality of code | + | * Comprehend that **FLOPS** (Floating Point Operations per Second) as a key measurement is used in two distinct |
- | * Comprehend that **Moore’s law** from 1965, revised in 1975, states (in simple terms) that the complexity of integrated circuits and thus the computing power of CPUs for HPC systems, respectively, | + | * To measure the computing power of a computer. |
- | * In the past that was true | + | * To measure the performance of applications. |
- | * However, for some time it has been observed that this increase in performance gain is no longer achieved through improvements of processor technology in a sequantial | + | * Comprehend that there are Pitfalls when measuring FLOPScomputing power of a computer: |
- | * CPU clock rates, for instance, have not been increased notably for several years | + | * Many scientific applications are memory-bound, not CPU bound so the memory buses will become the bottleneck of the application. |
- | * It is rather achieved by using many cores for processing a task in parallel | + | * FLOPS say nothing about the performance of the network or the file system. |
+ | * FLOP utilization says nothing about the quality of code. | ||
+ | * Comprehend that **Moore’s law** from 1965, revised in 1975, states (in simple terms) that the complexity of integrated circuits and thus the computing power of CPUs for HPC systems, respectively, | ||
+ | * In the past that was true. | ||
+ | * However, for some time it has been observed that this increase in performance gain is no longer achieved through improvements of processor technology in a sequential | ||
+ | * CPU clock rates, for instance, have not been increased notably for several years. | ||
+ | * It is rather achieved by using many cores for processing a task in parallel. | ||
* Therefore parallel computing and HPC systems will become increasingly relevant in the future. | * Therefore parallel computing and HPC systems will become increasingly relevant in the future. | ||
- | | + | * Comprehend that **Speedup** defines the relation between the sequential and parallel runtime of a program. |
- | * Comprehend that **Efficiency** defines the relation between the Speedup of a program and the number of processors used to achieve it | + | * Comprehend that **Efficiency** defines the relation between the Speedup of a program and the number of processors used to achieve it. |
- | * Comprehend that a good **Scalability** is achieved, when the efficiency remains high while the number of processors is being increased | + | * Comprehend that a good **Scalability** is achieved, when the efficiency remains high while the number of processors is being increased. |
- | * Comprehend that the parallelization of different problems can have different degrees of difficulty | + | * Comprehend that the parallelization of different problems can have different degrees of difficulty: |
- | * Some problems can be parallelized trivially, e.g. rendering (independent) computer animation images | + | * Some problems can be parallelized trivially, e.g. rendering (independent) computer animation images. |
- | * However, there are algorithms having a so-called sequential nature, e.g. some tree-search algorithms that have been notoriously difficult to parallelize | + | * However, there are algorithms having a so-called sequential nature, e.g. some tree-search algorithms that have been notoriously difficult to parallelize. |
- | * Typical problems in the field of scientific computing are somewhere in-between these extremes | + | * Typical problems in the field of scientific computing are somewhere in-between these extremes. |
- | * Comprehend that it is an important aspect to use the best known sequential algorithm for speedup comparisons in order to get fair speedup results | + | * Comprehend that it is an important aspect to use the best known sequential algorithm for speedup comparisons in order to get fair speedup results: |
- | * Conventional Speedup: use the same version of an algorithm (the same program) to measure runtimes T(sequential) and T(parallel) | + | * Conventional Speedup: use the same version of an algorithm (the same program) to measure runtimes T(sequential) and T(parallel). |
- | * Fair Speedup: use best known sequential algorithm to measure T(sequential) | + | * Fair Speedup: use the best known sequential algorithm to measure T(sequential). |
- | * Comprehend that **Amdahl' | + | * Comprehend that **Amdahl' |
- | * by its sequential, i.e. non-parallelizable part (e.g. for initialization and I/O operations) | + | * By its sequential, i.e. non-parallelizable part (e.g. for initialization and I/O operations). |
- | * or more generally, by synchronization (e.g. due to unbalanced load) and communication overheads (e.g. for data exchange) | + | * Or more generally, by the synchronization (e.g. due to unbalanced load) and communication overheads (e.g. for data exchange). |
- | * | + | * Comprehend the general Amdahl formula and that in practice it represents a simplified upper limit for the speedup: |
+ | * E.g., if the parallelizable part of a program is 99.9% the speedup is in theory limited to 1000. | ||
+ | * But in practice, even speedups above 100 might hardly be achieved in this case if it is additionally considered that overheads for communication and synchronization will also increase when the number of processes is increased. | ||
+ | * the roofline model, used to provide performance estimates for parallel programs based on multi-core or accelerator processor architectures, | ||
- | # Subskills | ||
skill-tree/k/2/1/b.1593020508.txt.gz · Last modified: 2020/06/24 19:41 by kai_h