This is an old revision of the document!

PE3-B Benchmarking

Background

Benchmarking is the activity to measure performance reliably and to assess the obtained performance.
A benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.

Such a controlled experiment is named a benchmark, but the term is also used – apparent from the context – for the program that is, or set of programs that are used for benchmarking.

For HPC users measuring the performance behavior of the parallel program(s) they use is of primary importance in order to make optimal use of HPC hardware.

Aim

Provide a basic understanding of benchmarking or performance measurement, so one can quantify successes and failures and use that information to improve the application performance.
Prepare a method of comparing the performance of various subsystems across different chip/system architectures.
To assess speedups and efficiencies as the key measures for benchmarks of a parallel program.
To differentiate between strong and weak scaling
To assess the performance impact of certain features of current CPU architectures (temperature and dynamic CPU frequencies)

Outcomes

Consider not only the speed of computational performance, but also the qualities of service, the total cost of ownership and facilities burden (space, power, and cooling).
Differentiate types of benchmarks
- The Linpack benchmark is used, for example, to build the TOP 500 list of the currently fastest supercomputers, which is updated twice a year
- For HPC users, however, synthetic tests to benchmark HPC cluster hardware (like the Linpack benchmark) are of less importance, because the emphasis lies on the determination of speedups and efficiencies of the parallel program they want to use
Comrephend that benchmarking is very essential in the HPC environment and can be applied to a variety of issues
- What is the scalability of my program?
- How many cluster nodes can be maximally used, before the efficiency drops to values which are unacceptable?
- How does the same program perform in different cluster environments?
Comprehend that benchmarking is also a basis for dealing with questions emerging from tuning, e.g.
- What is the appropriate task size (big vs. small) that may have a positive performance impact on my program?
- Is the use of hyper-threading83 technology advantageous?
- What is the best mapping of processes to nodes, pinning84 of processes/threads to CPUs or cores, and setting memory affinities to NUMA85 nodes in order to speed up a parallel program?
- What is the best compiler selection for my program (GCC, Intel, PGI, . . . ), in combination with the most suitable MPI environment (Open MPI, Intel MPI, …)?
- What is the best compiler generation/version for my program?
- What are the best compiler options regarding for example the optimization level -O2, -O3, . . . , for building the executable program?
- Is the use of PGO86 (Profile Guided Optimization) or other high level optimization, e.g. using IPA/IPO87 (Inter-Procedural Analyzer/Inter-Procedural Optimizer), helpful?
- What is the performance behavior after a (parallel) algorithm has been improved, i.e. to what extend are speedup, efficiency, and scalability improved?
Assess speedups and efficiencies as the key measures for benchmarks of a parallel program
Benchmark the runtime behaviour of parallel programs, performing controlled experiments by providing varying HPC resources (e.g. 1, 2, 4, 8, … cores on shared memory systems or 1, 2, 4, 8, … nodes on distributed systems for the benchmarks)

Subskills

K2.1 Performance Frontiers

**This is an old revision of the document!**

Table of Contents

PE3-B Benchmarking

Background

Aim

Outcomes

Subskills

This is an old revision of the document!