Tutorial 4: Measuring Performance in HPC Applications
1. Performance Metrics
| Metric | Meaning | Formula |
|---|---|---|
| Execution Time (T) | How long your code runs | Measured in seconds |
| Speedup (S) | How much faster parallel code runs | S = T1 / TpS = T1/TP |
| Efficiency (E) | Resource utilization | E = S / PE = S / P |
| Throughput | Work done per unit time | Ops/secOps/sec |
| Arithmetic Intensity (AI) | FLOPs per byte moved | AI=FLOPsBytesAI=BytesFLOPs​ |
2. Scaling vs. Weak Scaling
- Strong Scaling: fixed total problem size, increase processors. Goal: decrease runtime.
- Weak Scaling: increase both problem size and processors. Goal: maintain runtime.
3. Profiling Tools
- perf: CPU sampling profiler.
- mpiP: Lightweight MPI performance profiler.
- HPCToolkit: Visualize CPU + MPI + OpenMP data.
- Intel VTune: Vectorization, cache, and memory analysis.
4. Roofline Model
Compute vs memory bottlenecks using FLOPs/Byte.
- Visualize compute vs. memory bottlenecks.
- Compute intensity = FLOPS / Bytes.
- Compare against hardware peak FLOPS and bandwidth to identify limits.