Introduction to Paraver and Dimemas methodology (MPI analysis)

This tutorial is structured as a bunch of rules that can be verified during the analysis process. The results of looking at one rule diagnosis may open new rules to look at, like in a search tree. The goal of this tutorial is not to describe the full search tree but to focus on the first steps and show how to decide which branches are interesting to explore. As a general characteristic of the analysis methodology with Paraver and Dimemas, the approach is based on looking at the temporal and spacial distribution of the performance data to understand the application behavior, detect its different phases and identify the behavioral structure (that may be different from the procedural structure). This detailed analysis allows to extract a lot of information from the performance data collected during the run.

WARNING::: The first section is based on a tracefile without samples. If the tracefile was obtained with samples the configuration files related to correlate hardware counters with code regions have to be selected from cfgs/sampling.

The first question to answer when analyzing a parallel code is "how efficient does it run?". The efficiency of a parallel program can be defined based on two aspects: the parallelization efficiency and the efficiency obtained in the execution of the serial regions. These two metrics would be the first checks on the proposed methodology.



With these 4 views we've been able to get a first assessment of the parallel efficiency of the code, to identify if either balance or communication are limiting the performance, to analyze the structure and distribution of the computation with respect to duration and instructions and to measure the performance of the sequential regions (IPC) correlating its impact on the region duration. Before a deeper analysis of the facts we detected as bad performing, it's a good idea to look a bit more at the behavioral structure.



Next steps would be driven by the results of the basic previous analysis. Select the metrics you consider interesting to look at.



All the previous steps have been done using Paraver analyzer and the modules implementing performance analytics (clustering, folding). The second part of this introductory analysis methodology uses the Dimemas simulator to evaluate different scenarios. The Dimemas predictions should have been taken as trends of the application behavior under different conditions.



Some of the analysis steps results may raise new questions. If balancing a region doesn't have the desired impact on the communication phase after it, it may indicate that the problem of the communication phase isn't the initial unbalance but the serializations within the communication phase. The proposed methodology has some fixed initial steps for the analysis covered in this tutorial. After these steps the exploration would depend on the previous results and some examples have been provided. Keep your investigator attitude and enjoy the analysis!