From Code to Performance

Performance Analysis and Tools

Profiling tools

Debugging tools

Best practices for using tools

Performance measurement & models

Baseline first

  1. Get a reproducible baseline.
  1. Time critical kernels (microbenchmarks) and full runs.

Strong and weak scaling

Strong Scaling Efficiency

Efficiency formula:
Estrong(P) = T(1) / (P × T(P))

Example:

Compute:
P × T(P) = 8 × 20 = 160
Efficiency = 100 / 160 = 0.625 = 62.5%

Weak Scaling Efficiency

Weak scaling keeps workload per processor constant.

Efficiency formula:
Eweak(P) = T(1) / T(P)

Ideal case: runtime stays constant as processors increase.

Roofline Model

The Roofline model connects performance (GFLOP/s) to arithmetic intensity.

Matrix Multiplication Example:

FLOPs ≈ 2N³
Memory ≈ 3N² elements
Arithmetic Intensity ≈ N / 12

Larger N → higher intensity → better performance potential.

Compiler Flags & Build Options

Best Practices

Compile Examples

mpicc -O3 -march=native -o app app.c
mpicc -O3 -march=native -fopenmp -o hybrid_app hybrid.c

Runtime Environment

Profiling & Debugging Tools

Matrix Multiplication Optimization

Naive Version

for(i=0;i<N;i++)
for(j=0;j<N;j++) {
double sum=0.0;
for(k=0;k<N;k++)
sum += A[i][k]*B[k][j];
C[i][j]=sum;
}

Problem: poor cache usage.

Tiled Version

for(ii=0; ii<N; ii+=T)
for(jj=0; jj<N; jj+=T)
for(kk=0; kk<N; kk+=T)
for(i=ii; i<min(ii+T,N); ++i)
for(k=kk; k<min(kk+T,N); ++k)
for(j=jj; j<min(jj+T,N); ++j)
C[i][j] += A[i][k]*B[k][j];

OpenMP Parallelization

#pragma omp parallel for collapse(2)
for(ii=0; ii<N; ii+=T)
for(jj=0; jj<N; jj+=T)

Vectorization

#pragma omp simd
for(j = jj; j < min(jj + T, N); ++j)
C[i][j] += A[i][k] * B[k][j];
for(j = jj; j < min(jj + T, N); ++j)
C[i][j] += A[i][k] * B[k][j];

MPI Distribution

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

Distribute rows of C across ranks:

int rows_per_rank = N / size;
int start_row = rank * rows_per_rank;

Use MPI_Gather or MPI_Reduce to collect results.

Performance Metrics