Learning High-Performance Computing with an Online Toolkit

Exercise 9: Mini-Project - HPC Stimulation Pipeline

Objective:In this exercise, you will implement data decomposition techniques in an MPI application. You will create a simulation pipeline that processes data in parallel using MPI, applying the concepts learned in previous exercises.

Scenario: You are tasked with developing a simulation pipeline for a scientific application that requires processing large datasets. The pipeline consists of several stages, including data generation, processing, and analysis. Each stage can be parallelized using MPI to improve performance. You are building a Parallel Monte Carlo Simulation to estimate the value of π.

Steps:

Repository Setup: Create a new Git repository for your project and set up the necessary directory structure.

Data Generation: Create a function that generates random points within a unit square. Each point will have coordinates (x, y) where x and y are random numbers between 0 and 1.
Processing: Implement a function that checks whether each generated point falls within a quarter circle inscribed within the unit square. This can be done by checking if x^2 + y^2 <= 1.
Parallelization with MPI: Use MPI to parallelize the data generation and processing steps. Each MPI process should generate a portion of the total number of points and count how many fall within the quarter circle.
Reduction: Use MPI_Reduce to collect the counts from all processes and calculate the total number of points that fall within the quarter circle.
Estimation of π: Calculate the estimate of π using the formula: π ≈ 4 * (number of points in quarter circle / total number of points).
Performance Analysis: Measure the runtime of your parallel implementation and compare it to a serial version of the same algorithm. Analyze how performance scales with increasing numbers of processes.

Documentation and Version Control: Document your code and commit your changes to the Git repository regularly. Use meaningful commit messages to track your progress.
Final Report: Write a report summarizing your implementation, performance results, and any challenges you faced during the project. Include graphs or tables to illustrate the performance scaling.
Use MPI to distribute random point generation among ranks.
Within each rank, parallelize using OpenMP.

Performance Measurement:

Compare runtime for different ranks/threads combinations.

Documentation:

Ensure your code is well-documented with comments explaining key sections.
Update your README file with instructions on how to run your simulation.
Commit plots, reports, and a performance summary in Markdown format.

Deliverables:

Source code for the MPI application implementing the simulation pipeline.
Documentation explaining the design and implementation of your solution.
Performance analysis report comparing the parallel and serial versions.
Any additional materials (e.g., plots, graphs) generated during the project.

Learning Outcomes:

Apply data decomposition techniques in an MPI application.
Understand the challenges and benefits of parallelizing a simulation pipeline.
Gain experience with performance analysis and optimization in HPC applications.
Enhance documentation and version control skills using Git.
Add each as a downloadable worksheet or Interactive tutorial in your project repository on your Websites.
Pair them with solution notebooks (C Code + Command walkthroughs)that demonstrate the implementation and results.