Analysis with Paraver & Dimemas

Methodology

Objective


The objective of this session is to present some examples of analysis methodology supported by the configuration files.


This methodology should in the future be automated by integrating Paraver and Dimemas with profile presentation tools and rule based systems.


Distribution of configuration files


An analysis methodology is essentially a tree of measurements to perform that lead us to a good understanding of the behavior of our application or the identification of bottlenecks. A typical search will iterate through successive sets of hypotheses and validations down the tree till we understand the behavior or have no further information in the available data to validate an hypotheses.


The standard Paraver distribution contains a large set of directories and configuration files. At the outermost levels, directories try to group configurations based on the type of information they look at. Internally we typically separate directories for configuration files that that show a specific timeline view and directories for configuration files that compute a profile or histogram. The major directories are:


Methodology: basic profiles


In this section we will present some recommendations on how to proceed in an analysis.


The first question to address is what is the parallel efficiency and whether it is determined/limited by load balance or communication.


If the application does show some load imbalance, you may wonder whether it is computational load imbalance or due to other factors.



An important metric to look at reports the performance of the sequential computation phases. If the parallel efficiency is good, this may be the limiting factor. If it is not good, imbalances in IPC may cause the imbalance in execution time. Sometimes, imbalance in IPC may compensate computational load imbalance.


If the IPC is not good or well balanced, you may want to look at cache misses.


If the communication time seems to be a problem it may actually not be due to communication but to local (or microscopic) load imbalances or serialization. In order to identify this effect, a Dimemas simulation is required. You will need to convert the file to .dim, and simulate with an ideal target architecture.



Methodology: detailed profiles



The above metrics are computed for the whole program. You may want to have the same metrics for different computational phases. You will need to cluster the paraver trace and load it.


Methodology: histograms



If parallel efficiency is bad due to load imbalance and you want to know how is that load imbalance distributed and where it shows up.

If parallel efficiency is good, you may want to look at how the IPC distributes along the different computation phases