8. Examples¶
Here we present three different examples of generating a Paraver tracefile.
The first example requires the package to be compiled with DynInst libraries.
The second example uses the LD_PRELOAD
or LDR_PRELOAD[64]
mechanism to
interpose code in the application. Such mechanism is available in Linux and
FreeBSD operating systems and only works when the application uses dynamic
libraries. Finally, there is an example using the static library of the
instrumentation package.
8.1. DynInst based examples¶
DynInst is a third-party instrumentation library developed at UW Madison which can instrument in-memory binaries. It adds flexibility to add instrumentation to the application without modifying the source code. DynInst is ported to different systems (Linux, FreeBSD) and to different architectures [1] (x86, x86/64, PPC32, PPC64) but the functionality is common to all of them.
8.1.1. Generating intermediate files for serial or OpenMP applications¶
1 2 3 4 5 6 7 8 | #!/bin/sh
export EXTRAE_HOME=WRITE-HERE-THE-PACKAGE-LOCATION
export LD_LIBRARY_PATH=${EXTRAE_HOME}/lib
source ${EXTRAE_HOME}/etc/extrae.sh
## Run the desired program
${EXTRAE_HOME}/bin/extrae -config extrae.xml $*
|
A similar script can be found in ${EXTRAE_HOME}/share/example/SEQ
just tune
the EXTRAE_HOME
environment variable and make the script executable
(using chmod u+x). You can either pass the XML configuration file
through the EXTRAE_CONFIG_FILE
instead, if you prefer. Line no. 5 is
responsible for loading all the environment variables needed for the DynInst
launcher (called extrae) that is invoked in line 8.
In fact, there are two examples provided in
${EXTRAE_HOME}/share/example/SEQ
, one for static (or manual) instrumentation
and another for the DynInst-based instrumentation. When using the DynInst
instrumentation, the user may add new routines to instrument using the existing
function-list file that is already pointed by the
extrae.xml configuration file. The way to specify the routines
to instrument is add as many lines with the name of every routine to be
instrumented.
Running OpenMP applications using DynInst is rather similar to serial codes.
Just compile the application with the appropriate OpenMP flags and run as
before. You can find an example in ${EXTRAE_HOME}/share/example/OMP
.
8.1.2. Generating intermediate files for MPI applications¶
MPI applications can also be instrumented using the DynInst instrumentator. The instrumentation is done independently to each spawned MPI process, so in order to execute the DynInst-based instrumentation package on a MPI application, you must be sure that your MPI launcher supports running shell-scripts. The following scripts show how to run the DynInst instrumentator from the MOAB/Slurm queue system. The first script just sets the environment for the job whereas the second is responsible for instrumenting every spawned task.
1 2 3 4 5 6 7 8 9 10 11 12 | #!/bin/bash
# @ initialdir = .
# @ output = trace.out
# @ error = trace.err
# @ total_tasks = 4
# @ cpus_per_task = 1
# @ tasks_per_node = 4
# @ wall_clock_limit = 00:10:00
# @ tracing = 1
srun ./run.sh ./mpi_ping
|
The most important thing in the previous script is the line number 11, which is
responsible for spawning the MPI tasks (using the srun command). The spawn
method is told to execute ./run.sh ./mpi_ping which in fact refers to
instrument the mpi_ping
binary using the run.sh
script. You must adapt
this file to your queue-system (if any) and to your MPI submission mechanism
(i.e., change srun to mpirun, mpiexec, poe, etc.). Note that changing the
line 11 to read like ./run.sh srun ./mpi_ping would result in
instrumenting the srun
application not mpi_ping
.
1 2 3 4 5 6 7 8 9 10 11 | #!/bin/bash
export EXTRAE_HOME=@sub_PREFIXDIR@
source ${EXTRAE_HOME}/etc/extrae.sh
# Only show output for task 0, others task send output to /dev/null
if test "${SLURM_PROCID}" == "0" ; then
${EXTRAE_HOME}/bin/extrae -config ../extrae.xml $@ > job.out 2> job.err
else
${EXTRAE_HOME}/bin/extrae -config ../extrae.xml $@ > /dev/null 2> /dev/null
fi
|
This is the script responsible for instrumenting a single MPI task. In line
number 4 we set-up the instrumentation environment by executing the commands
from extrae.sh
. Then we execute the binary passed to the run.sh
script
in lines 8 and 10. Both lines are executing the same command except that line 8
sends all the output to two different files (one for standard output and another
for standard error) and line 10 sends all the output to /dev/null
.
Please note, this script is particularly adapted to the MOAB/Slurm queue
systems. You may need to adapt the script to other systems by using the
appropiate environment variables. Particularly, SLURM_PROCID
identifies the
MPI task id (i.e., the task rank) and may be changed to the proper environemnt
variable (MPI_RANK
in ParaStation/Torque/MOAB system or MXMPI_ID in systems
having Myrinet MX devices, for example).
8.2. LD_PRELOAD based examples¶
LD_PRELOAD
(or LDR_PRELOAD[64]
in AIX) interposition
mechanism only works for binaries that are linked against shared libraries. This
interposition is done by the runtime loader by substituting the original symbols
by those provided by the instrumentation package. This mechanism is known to
work on Linux, FreeBSD and AIX operating systems, although it may be available
on other operating systems (even using different names [2]) they are
not tested.
We show how this mechanism works on Linux (or similar environments) in Linux and on AIX in AIX.
8.2.1. Linux¶
The following script preloads the libmpitrace
library to instrument MPI
calls of the application passed as an argument (tune EXTRAE_HOME
according to your installation).
1 2 3 4 5 6 7 8 | #!/bin/sh
export EXTRAE_HOME=<WRITE-HERE-THE-PACKAGE-LOCATION>
export EXTRAE_CONFIG_FILE=extrae.xml
export LD_PRELOAD=${EXTRAE_HOME}/lib/libmpitrace.so
## Run the desired program
$*
|
The previous script can be found in
${EXTRAE_HOME}/share/example/MPI/ld-preload
in your tracing package
directory. Copy the script to one of your directories, tune the
EXTRAE_HOME
environment variable and make the script executable (using
chmod u+x). Also copy the XML configuration extrae.xml
file from
${EXTRAE_HOME}/share/example/MPI
to the current directory. This file is used
to configure the whole behavior of the instrumentation package (there is more
information about the XML file on chapter Extrae XML configuration file). The last line in the
script, $*, executes the arguments given to the script, so as you can
run the instrumentation by simply adding the script in between your execution
command.
Regarding the execution, if you run MPI applications from the command-line, you can issue the typical mpirun command as:
${MPI_HOME}/bin/mpirun -np N ./trace.sh mpi-app
where ${MPI_HOME}
is the directory for your MPI installation, N
is the
number of MPI tasks you want to run, and mpi-app
is the binary of the MPI
application you want to run.
However, if you execute your MPI applications through a queue system you may
need to write a submission script. The following script is an example of a
submission script for MOAB/Slurm queuing system using the aforementioned
trace.sh script for an execution of the mpi-app
on two
processors.
1 2 3 4 5 6 7 8 9 10 11 | #! /bin/bash
#@ job_name = trace_run
#@ output = trace_run%j.out
#@ error = trace_run%j.out
#@ initialdir = .
#@ class = bsc_cs
#@ total_tasks = 2
#@ wall_clock_limit = 00:30:00
srun ./trace.sh mpi_app
|
If your system uses LoadLeveler your job script may look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #! /bin/bash
#@ job_type = parallel
#@ output = trace_run.ouput
#@ error = trace_run.error
#@ blocking = unlimited
#@ total_tasks = 2
#@ class = debug
#@ wall_clock_limit = 00:10:00
#@ restart = no
#@ group = bsc41
#@ queue
export MLIST=/tmp/machine_list ${$}
/opt/ibmll/LoadL/full/bin/ll_get_machine_list > ${MLIST}
set NP = `cat ${MLIST} | wc -l`
${MPI_HOME}/mpirun -np ${NP} -machinefile ${MLIST} ./trace.sh ./mpi-app
rm ${MLIST}
|
Besides the job specification given in lines 1-11, there are commands of particular interest. Lines 13-15 are used to know which and how many nodes are involved in the computation. Such information information is given to the mpirun command to proceed with the execution. Once the execution finished, the temporal file created on line 14 is removed on line 19.
8.2.2. CUDA¶
There are two ways to instrument CUDA applications, depending on how the package
was configured. If the package was configured with --with-cuda
only
interposition on binaries using shared libraries are available. If the package
was configured with --with-cupti
any kind of binary can be
instrumented because the instrumentation relies on the CUPTI library to
instrument CUDA calls. The example shown below is intended for the former case.
1 2 3 4 5 6 7 8 | #!/bin/bash
export EXTRAE_HOME=/home/harald/extrae
export PAPI_HOME=/home/harald/aplic/papi/4.1.4
EXTRAE_CONFIG_FILE=extrae.xml
LD_LIBRARY_PATH=${EXTRAE_HOME}/lib:${PAPI_HOME}/lib:${LD_LIBRARY_PATH} ./hello
${EXTRAE_HOME}/bin/mpi2prv -f TRACE.mpits -e ./hello
|
In this example, the hello application is compiled using the nvcc
compiler and linked against the cudatrace
library
(-lcudatrace
). The binary contains calls to Extrae_init
and
Extrae_fini
and then executes a CUDA kernel. Line number 6 refers to the
execution of the application itself. The Extrae configuration file and the
location of the shared libraries are set in this line. Line number 7 invokes the
merge process to generate the final tracefile.
8.2.3. AIX¶
AIX typically ships with POE and LoadLeveler as MPI implementation and queue
system respectively. An example for a system with these software packages is
given below. Please, note that the example is intended for 64 bit applications,
if using 32 bit applications then LDR_PRELOAD64
needs to be changed in
favour of LDR_PRELOAD
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #@ job_name = basic_test
#@ output = basic_stdout
#@ error = basic_stderr
#@ shell = /bin/bash
#@ job_type = parallel
#@ total_tasks = 8
#@ wall_clock_limit = 00:15:00
#@ queue
export EXTRAE_HOME=<WRITE-HERE-THE-PACKAGE-LOCATION>
export EXTRAE_CONFIG_FILE=extrae.xml
export LDR_PRELOAD64=${EXTRAE_HOME}/lib/libmpitrace.so
./mpi-app
|
Lines 1-8 contain a basic LoadLeveler job definition. Line 10 sets the Extrae
package directory in EXTRAE_HOME
environment variable. Follows setting
the XML configuration file that will be used to set up the tracing. Then follows
setting LDR_PRELOAD64
which is responsible for instrumentation using
the shared library libmpitrace.so
. Finally, line 14 executes the
application binary.
8.3. Statically linked based examples¶
This is the basic instrumentation method suited for those installations that
neither support DynInst nor LD_PRELOAD
, or require adding some manual
calls to the Extrae API.
8.3.1. Linking the application¶
To get the instrumentation working on your code, first you have to link your
application with the Extrae libraries. There are installed examples in your
package distribution under $EXTRAE_HOME/share/examples
. There you can
find MPI, OpenMP, pthread and sequential examples depending on the support at
configure time.
Consider the example Makefile found in
$EXTRAE_HOME/share/examples/MPI/static
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | MPI_HOME = /gpfs/apps/MPICH2/mx/1.0.7..2/64
EXTRAE_HOME = /home/bsc41/bsc41273/foreign-pkgs/extrae-11oct-mpich2/64
PAPI_HOME = /gpfs/apps/PAPI/3.6.2-970mp-patched/64
XML2_LDFLAGS = -L/usr/lib64
XML2_LIBS = -lxml2
F77 = $(MPI_HOME)/bin/mpif77
FFLAGS = -O2
FLIBS = $(EXTRAE_HOME)/lib/libmpitracef.a \
-L$(PAPI_HOME)/lib -lpapi -lperfctr \
$(XML2_LDFLAGS) $(XML2_LIBS)
all: mpi_ping
mpi_ping: mpi_ping.f
$(F77) $(FFLAGS) mpi_ping.f $(FLIBS) -o mpi_ping
clean:
rm -f mpi_ping *.o pingtmp? TRACE.*
|
Lines 2-5 are definitions of some Makefile variables to set up the location of
different packages needed by the instrumentation. In particular,
EXTRAE_HOME
sets where the Extrae package directory is located. In
order to link your application with Extrae you have to add its libraries in the
link stage (see lines 9-11 and 16). Besides file:libmpitracef.a we also add
some PAPI libraries (-lpapi
), and its dependency (which you may or not
need (-lperfctr
), the libxml2 parsing library (-lxml2
), and
finally, the bfd and liberty libraries (-lbfd
and -liberty
),
if the instrumentation package was compiled to support merge after trace (see
chapter Configuration, build and installation for further information).
8.3.2. Generating the intermediate files¶
Executing an application with the statically linked version of the
instrumentation package is very similar as the method shown in
LD_PRELOAD based examples. There is, however, a difference: do not set
LD_PRELOAD
in trace.sh.
1 2 3 4 5 6 7 8 9 10 | #!/bin/sh
export EXTRAE_HOME=WRITE-HERE-THE-PACKAGE-LOCATION
export EXTRAE_CONFIG_FILE=extrae.xml
export LD_LIBRARY_PATH=${EXTRAE_HOME}/lib: \
/gpfs/apps/MPICH2/mx/1.0.7..2/64/lib: \
/gpfs/apps/PAPI/3.6.2-970mp-patched/64/lib
## Run the desired program
$*
|
See section LD_PRELOAD based examples to know how to run this script either through command line or queue systems.
8.4. Generating the final tracefile¶
Independently from the tracing method chosen, it is necessary to translate the intermediate tracefiles into a Paraver tracefile. The Paraver tracefile can be generated automatically (if the tracing package and the XML configuration file were set up accordingly, see chapters Configuration, build and installation and Extrae XML configuration file) or manually. In case of using the automatic merging process, it will use all the resources allocated for the application to perform the merge once the application ends.
To manually generate the final Paraver tracefile issue the following command:
${EXTRAE_HOME}/bin/mpi2prv -f TRACE.mpits -e mpi-app -o trace.prv
This command will convert the intermediate files generated in the previous step
into a single Paraver tracefile. The TRACE.mpits
is a file generated
automatically by the instrumentation and contains a reference to all the
intermediate files generated during the execution run. The -e
parameter receives the application binary mpi-app
in order to perform
translations from addresses to source code. To use this feature, the binary must
have been compiled with debugging information. Finally, the -o
flag
tells the merger how the Paraver tracefile will be named (trace.prv
in
this case).
Footnotes
[1] | The IA-64 architecture support was dropped in DynInst 7.0. |
[2] | Look at http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html for further information. |