4. Extrae XML configuration file¶
Extrae is configured through a XML file that is set through the
EXTRAE_CONFIG_FILE
environment variable. The included examples provide
several XML files to serve as a basis for the end user. For instance, the MPI
examples provide four XML configuration files:
extrae.xml
Exemplifies all the options available to set up in the configuration file. We will discuss below all the sections and options available. It is also available on this document on appendix An example of Extrae XML configuration file.extrae_explained.xml
The same as the above with some comments on each section.summarized_trace_basic.xml
A small example for gathering information of MPI and OpenMP information with some performace counters and calling information at each MPI call.detailed_trace_basic.xml
A small example for gathering a summarized information of MPI and OpenMP parallel paradigms.extrae_bursts_1ms.xml
An XML configuration example to setup the bursts tracing mode. This XML file will only capture the regions in between MPI calls that last more than the given threshold (1ms in this example).
Please note that most of the nodes present in the XML file have an
enabled
attribute that allows turning on and off some parts of the
instrumentation mechanism. For example, <mpi enabled="yes">
means MPI
instrumentation is enabled and process all the contained XML subnodes, if any;
whether <mpi enabled="no">
means to skip gathering MPI information and do
not process XML subnodes.
Each section points which environment variables could be used if the tracing package lacks XML support. See appendix Environment variables for the entire list.
Sometimes the XML tags are used for time selection (duration, for instance). In
such tags, the following postfixes can be used: n
or ns
for nanoseconds,
u
or us
for microseconds, m
or ms
for milliseconds, s
for
seconds, M
for minutes, H
for hours and D
for days.
4.1. XML Section: Trace configuration¶
The basic trace behavior is determined in the first part of the XML and contains all of the remaining options. It looks like:
<?xml version='1.0'?>
<trace enabled="yes"
home="@sed_MYPREFIXDIR@"
initial-mode="detail"
type="paraver"
>
< ... other XML nodes ... >
</trace>
The <?xml version='1.0'?>
is mandatory for all XML files. Don’t touch this.
The available tunable options are under the <trace>
node:
enabled
Set toyes
if you want to generate tracefiles.home
Set to where the instrumentation package is installed. Usually it points to the same location thatEXTRAE_HOME
environment variable.initial-mode
Available optionsdetail
Provides detailed information of the tracing.bursts
Provides summarized information of the tracing. This mode removes most of the information present in the detailed traces (like OpenMP and MPI calls among others) and only produces information for computation bursts.
type
Available optionsparaver
The intermediate files are meant to generate Paraver tracefiles.dimemas
The intermediate files are meant to generate Dimemas tracefiles.
See also
EXTRAE_ON
, EXTRAE_HOME
, EXTRAE_INITIAL_MODE
and
EXTRAE_TRACE_TYPE
environment variables in appendix Environment variables.
4.2. XML Section: MPI¶
The MPI configuration part is nested in the config file (see section XML Section: Trace configuration) and its nodes are the following:
<mpi enabled="yes">
<counters enabled="yes" />
<comm-calls enabled="yes" />
</mpi>
MPI calls can gather performance information at the begin and end of MPI calls.
To activate this behavior, just set to yes
the attribute of the nested
<counters>
node. When <comm-calls>
is set to no
, the calls to
certain MPI_Comm_* calls (_rank, _size) are excluded from instrumentation to
reduce tracing overhead.
See also
EXTRAE_DISABLE_MPI
and EXTRAE_MPI_COUNTERS_ON
environment variables in appendix Environment variables.
4.3. XML Section: pthread¶
The pthread configuration part is nested in the config file (see section XML Section: Trace configuration) and its nodes are the following:
<pthread enabled="yes">
<locks enabled="no" />
<counters enabled="yes" />
</pthread>
The tracing package allows to gather information of some pthread routines. In
addition to that, the user can also enable gathering information of locks and
also gathering performance counters in all of these routines. This is achieved
by modifying the enabled attribute of the <locks>
and <counters>
,
respectively.
See also
EXTRAE_DISABLE_PTHREAD
, EXTRAE_PTHREAD_LOCKS
and :envvar:`
EXTRAE_PTHREAD_COUNTERS_ON` environment variables in appendix
cha:EnvVars
.
4.4. XML Section: OpenMP¶
The OpenMP configuration part is nested in the config file (see section XML Section: Trace configuration) and its nodes are the following:
<openmp enabled="yes" ompt="no">
<locks enabled="no" />
<task dependencies="yes" />
<taskloop enabled="yes" dependencies="yes"/>
<counters enabled="yes" />
</openmp>
The tracing package allows to gather information of some OpenMP runtimes and
outlined routines. In addition to that, the user can also enable gathering
information of locks and also gathering performance counters in all of these
routines. This is achieved by modifying the enabled attribute of the
<locks>
and <counters>
, respectively.
See also
EXTRAE_DISABLE_OMP
, EXTRAE_OMP_LOCKS
and
EXTRAE_OMP_COUNTERS_ON
environment variables in appendix
Environment variables.
4.5. XML Section: Callers¶
<callers enabled="yes">
<mpi enabled="yes">1-3</mpi>
<sampling enabled="no">1-5</sampling>
<dynamic-memory enabled="no">1-5</dynamic-memory>
</callers>
Callers are the routine addresses present in the process stack at any given moment during the application run. Callers can be used to link the tracefile with the source code of the application.
The instrumentation library can collect a partial view of those addresses during the instrumentation. Such collected addresses are translated by the merging process if the correspondent parameter is given and the application has been compiled and linked with debug information.
There are three points where the instrumentation can gather this information:
- Entry of MPI calls
- Sampling points (if sampling is available in the tracing package)
- Dynamic memory calls (malloc, free, realloc)
The user can choose which addresses to save in the trace (starting from 1, which is the closest point to the MPI call or sampling point) specifying several stack levels by separating them by commas or using the hyphen symbol.
See also
EXTRAE_MPI_CALLER
environment variable in appendix
Environment variables.
4.6. XML Section: User functions¶
<user-functions enabled="no"
list="/home/bsc41/bsc41273/user-functions.dat"
exclude-automatic-functions="no">
<counters enabled="yes" />
</user-functions>
The file contains a list of functions to be instrumented by Extrae. There are different alternatives to instrument application functions, and some alternatives provides additional flexibility, as a result, the format of the list varies depending of the instrumentation mechanism used:
DynInst Supports instrumentation of user functions, outer loops, loops and basic blocks. The given list contains the desired function names to be instrumented. After each function name, optionally you can define different basic blocks or loops inside the desired function always by providing different suffixes that are provided after the
+
character. For instance:- To instrument the entry and exit points of foo function just provide the
function name (
foo
). - To instrument the entry and exit points of foo function plus the entry and
exit points of its outer loop, suffix the function name with
outerloops
(i.e.,foo+outerloops
). - To instrument the entry and exit points of foo function plus the entry and
exit points of its N-th loop function you have to suffix it as
loop_N
, for instancefoo+loop_3
. - To instrument the entry and exit points of foo function plus the entry and
exit points of its N-th basic block inside the function you have to use the
suffix
bb_N
, for instancefoo+bb_5
. In this case, it is also possible to specifically ask for the entry or exit point of the basic block by additionally suffixing_s
or_e
, respectively.
Additionally, these options can be added by using comas, as in:
foo+outerloops,loop_3,bb_3_e,bb_4_s,bb_5
.To discover the instrumentable loops and basic blocks of a certain function you can execute the command ${EXTRAE_HOME}/bin/extrae -config extrae.xml -decodeBB, where
extrae.xml
is an Extrae configuration file that provides a list on the user functions attribute that you want to get the information.- To instrument the entry and exit points of foo function just provide the
function name (
GCC and ICC (through
-finstrument-functions
) GNU and Intel compilers provide a compile and link flag named-finstrument-functions
that instruments the routines of a source code file that Extrae can use. To use this functionality a file containing the names of the functions to be instrumented has to be provided. Compile the executable using the flag-rdynamic
(or link it using-export-dynamic
) in order to make the functions visible. For instance, to instrument the functionsfoo
,bar
andbaz
the user would create a file with:foo bar baz
In specific cases (e.g., functions declared inside a Fortran CONTAINS construct) the user may also need to provide the function address as given by the command nm. For instance, to instrument the routine
pi_kernel
from thepi
executable the user would run nm as follows:$ nm -a pi | grep pi_kernel 00000000004005ed T pi_kernel
and then add
<FUNCTION_NAME> # <HEX_ADDRESS>
into the function list:pi_kernel # 00000000004005ed
The exclude-automatic-functions
attribute is used only by the DynInst
instrumenter. By setting this attribute to yes
the instrumenter will avoid
automatically instrumenting the routines that either call OpenMP outlined
routines (i.e., routines with OpenMP pragmas) or call CUDA kernels.
Finally, in order to gather performance counters in these functions and also in
those instrumented using the extrae_user_function
API call, the node
counters
has to be enabled.
Warning
Note that you need to compile your application binary with debugging
information (typically the -g
compiler flag) in order to translate
the captured addresses into valuable information such as: function name, file
name and line number.
See also
EXTRAE_FUNCTIONS
environment variable in appendix
Environment variables.
4.7. XML Section: Performance counters¶
The instrumentation library can be compiled with support for collecting performance metrics of different components available on the system. These components include:
- Processor performance counters. Such access is granted by PAPI [1] or PMAPI [2].
- Network performance counters. (Only available in systems with Myrinet GM/MX networks).
- Operating system accounts.
Here is an example of the counters section in the XML configuration file:
<counters enabled="yes">
<cpu enabled="yes" starting-set-distribution="1">
<set enabled="yes" domain="all" changeat-time="5s">
PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_L1_DCM
<sampling enabled="yes" period="100000000">PAPI_TOT_CYC</sampling>
</set>
<set enabled="yes" domain="user" changeat-globalops="5">
PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_FP_INS
</set>
</cpu>
<network enabled="yes" />
<resource-usage enabled="yes" />
</counters>
See also
EXTRAE_COUNTERS
, EXTRAE_NETWORK_COUNTERS
and
EXTRAE_RUSAGE
environment variables in appendix Environment variables.
4.7.1. Processor performance counters¶
Processor performance counters are configured in the <cpu>
nodes. The user
can configure many sets in the <cpu>
node using the <set>
node, but
just one set will be used at any given time in a specific task. The <cpu>
node supports the starting-set-distribution
attribute with the following
accepted values:
number
(in range 1..N, where N is the number of configured sets) All tasks will start using the set specified by number.block
Each task will start using the given sets distributed in blocks (i.e., if two sets are defined and there are four running tasks: tasks 1 and 2 will use set 1, and tasks 3 and 4 will use set 2).cyclic
Each task will start using the given sets distributed cyclically (i.e., if two sets are defined and there are four running tasks: tasks 1 and 3 will use, and tasks 2 and 4 will use set 2).thread-cyclic
Sets will be distributed cyclically between tasks and threads in a task.random
Each task will start using a random set, and also calls either toExtrae_next_hwc_set
orExtrae_previous_hwc_set
will change to a random set.
Each set contains a list of performance counters to be gathered at different instrumentation points (see sections XML Section: MPI, XML Section: OpenMP and XML Section: User functions). If the tracing library is compiled to support PAPI, performance counters must be given using the canonical name (like PAPI_TOT_CYC and PAPI_L1_DCM), or the PAPI code in hexadecimal format (like 8000003b and 80000000, respectively) [3] If the tracing library is compiled to support PMAPI, only one group identifier can be given per set [4] and can be either the group name (like pm_basic and pm_hpmcount1) or the group number (like 6 and 22, respectively).
In the given example (which refers to PAPI support in the tracing library) two sets are defined. First set will read PAPI_TOT_INS (total instructions), PAPI_TOT_CYC (total cycles) and PAPI_L1_DCM (1st level cache misses). Second set is configured to obtain PAPI_TOT_INS (total instructions), PAPI_TOT_CYC (total cycles) and PAPI_FP_INS (floating point instructions).
Additionally, if the underlying performance library supports sampling mechanisms, each set can be configured to gather information (see section XML Section: Callers) each time the specified counter reaches a specific value. The counter that is used for sampling must be present in the set. In the given example, the first set is enabled to gather sampling information every 100M cycles.
Furthermore, performance counters can be configured to report accounting on
different basis depending on the domain
attribute specified on each set.
Available options are:
kernel
Only counts events ocurred when the application is running in kernel mode.user
Only counts events ocurred when the application is running in user-space mode.all
Counts events independently of the application running mode.
In the given example, first set is configured to count all the events ocurred, while the second one only counts those events ocurred when the application is running in user-space mode.
Finally, the instrumentation can change the active set in a manual and an
automatic fashion. To change the active set manually see
Extrae_previous_hwc_set and
Extrae_next_hwc_set API calls in section
Basic API. To change automatically the active set two options are
allowed: based on time and based on application code. The former mechanism
requires adding the attribute changeat-time
and specify the minimum time to
hold the set. The latter requires adding the attribute changeat-globalops
with a value. The tracing library will automatically change the active set when
the application has executed as many MPI global operations as selected in that
attribute. When In any case, if either attribute is set to zero, then the set
will not me changed automatically.
4.7.2. Network performance counters¶
Network performance counters are only available on systems with Myrinet GM/MX networks and they are fixed depending on the firmware used. Other systems, like BG/* may provide some network performance counters, but they are accessed through the PAPI interface (see section XML Section: Performance counters and PAPI documentation).
If <network>
is enabled the network performance counters appear at the end
of the application run, giving a summary for the whole run.
4.7.3. Operating system accounting¶
Operating system accounting is obtained through the getrusage(2)
system call when <resource-usage>
is enabled. As network performance
counters, they appear at the end of the application run, giving a summary for
the whole run.
4.8. XML Section: Storage management¶
The instrumentation package can be instructed on what/where/how produce the intermediate trace files. These are the available options:
<storage enabled="no">
<trace-prefix enabled="yes">TRACE</trace-prefix>
<size enabled="no">5</size>
<temporal-directory enabled="yes">/scratch</temporal-directory>
<final-directory enabled="yes">/gpfs/scratch/bsc41/bsc41273</final-directory>
</storage>
Such options refer to:
trace-prefix
Sets the intermediate trace file prefix. Its default value isTRACE
.size
Let the user restrict the maximum size (in megabytes) of each resulting intermediate trace file [5].temporal-directory
Where the intermediate trace files will be stored during the execution of the application. By default they are stored in the current directory. If the directory does not exist, the instrumentation will try to make it.final-directory
Where the intermediate trace files will be stored once the execution has been finished. By default they are stored in the current directory. If the directory does not exist, the instrumentation will try to make it.
See also
EXTRAE_PROGRAM_NAME
, EXTRAE_FILE_SIZE
,
EXTRAE_DIR
, EXTRAE_FINAL_DIR
and
EXTRAE_GATHER_MPITS
environment variables in appendix
Environment variables.
4.9. XML Section: Buffer management¶
Modify the buffer management entry to tune the tracing buffer behavior.
<buffer enabled="yes">
<size enabled="yes">150000</size>
<circular enabled="no" />
</buffer>
By, default (even if the enabled attribute is no
) the tracing buffer is set
to 500k events. If <size>
is enabled the tracing buffer will be set to the
number of events indicated by this node. If the circular option is enabled, the
buffer will be created as a circular buffer and the buffer will be dumped only
once with the last events generated by the tracing package.
See also
EXTRAE_BUFFER_SIZE
environment variable in appendix Environment variables.
4.10. XML Section: Trace control¶
<trace-control enabled="yes">
<file enabled="no" frequency="5M">/gpfs/scratch/bsc41/bsc41273/control</file>
<global-ops enabled="no">10</global-ops>
<remote-control enabled="yes">
<mrnet enabled="yes" target="150" analysis="spectral" start-after="30">
<clustering max_tasks="26" max_points="8000"/>
<spectral min_seen="1" max_periods="0" num_iters="3" signals="DurBurst,InMPI"/>
</mrnet>
</remote-control>
</trace-control>
This section groups together a set of options to limit/reduce the final trace size. There are three mechanisms which are based on file existence, global operations executed and external remote control procedures.
Regarding the file
, the application starts with the tracing disabled, and it
is turned on when a control file is created. Use the property frequency
to
choose at which frequency this check must be done. If not supplied, it will be
checked every 100 global operations on MPI_COMM_WORLD.
If the global-ops
tag is enabled, the instrumentation package begins
disabled and starts the tracing when the given number of global operations on
MPI_COMM_WORLD has been executed. The user can also specify multiple intervals
in the form start-stop
separated by commas.
The remote-control
tag section allows to configure some external mechanisms
to automatically control the tracing. Currently, there is only one option which
is built on top of MRNet and it is based on clustering and spectral analysis to
generate a small yet representative trace.
These are the options in the mrnet
tag:
target
the approximate requested size for the final trace (in Mb).analysis
one betweenclustering
andspectral
.start-after
number of seconds before the first analysis starts.
The clustering
tag configures the clustering analysis parameters:
max_tasks
maximum number of tasks to get samples from.max_points
maximum number of points to cluster.
The spectral
tag section configures the spectral analysis parameters:
min_seen
minimum times a given type of period has to be seen to trace a sample.max_periods
maximum number of representative periods to trace. 0 equals to unlimited.num_iters
number of iterations to trace for every representative period found.signals
performance signals used to analyze the application. If not specified,DurBurst
is used by default.
See also
EXTRAE_CONTROL_FILE
, EXTRAE_CONTROL_GLOPS
,
EXTRAE_CONTROL_TIME
environment variables in appendix
Environment variables.
4.11. XML Section: Bursts¶
<burst enabled="no" threshold="500u" mpi-statistics="yes" omp-statistics="yes" omp-summarization="no" />
If the user enables this option, the instrumentation library will just emit
information of computation bursts (i.e., not does not trace MPI calls, OpenMP
runtime, and so on) when the current mode (through initial-mode in section
XML Section: Trace configuration) is set to bursts
. The library will
discard all those computation bursts that last less than the selected threshold.
In addition to that, when the tracing library is running in burst mode, it
computes some statistics of MPI activity. Such statistics can be dumped in the
tracefile by enabling mpi-statistics
.
See also
EXTRAE_INITIAL_MODE
, EXTRAE_BURST_THRESHOLD
and
EXTRAE_MPI_STATISTICS
environment variables in appendix
Environment variables.
4.12. XML Section: Others¶
<others enabled="yes">
<minimum-time enabled="no">10M</minimum-time>
<finalize-on-signal enabled="yes"
SIGUSR1="no" SIGUSR2="no" SIGINT="yes"
SIGQUIT="yes" SIGTERM="yes" SIGXCPU="yes"
SIGFPE="yes" SIGSEGV="yes" SIGABRT="yes"
/>
<flush-sampling-buffer-at-instrumentation-point enabled="yes" />
</others>
This section contains other configuration details that do not fit in the previous sections. At the moment, there are three options to be configured.
- The
minimum-time
option indicates the instrumentation package the minimum instrumentation time. To enable it, setenabled
toyes
and set the minimum time within theminimum-time
tag. - The option labeled as
finalize-on-signal
instructs the instrumentation package to listen for different types of signals [6] and dump and finalize the execution whenever they occur. If a signal occurs but it is not configured, then the execution may finish without generating the trace-file. Caveat: Some MPI implementations useSIGUSR1
and/orSIGUSR2
, so if you want to capture those signals check first that enabling them do not alter with the application execution. - The
flush-sampling-buffer-at-instrumentation-point
lets the user decide whether the sampling buffer should be checked for flushing at instrumentation points. If this option is not enabled, then the buffer will only be dumped once at the end of the application execution.
4.13. XML Section: Sampling¶
<sampling enabled="no" type="default" period="50m" variability="10m"/>
This section configures the time-based sampling capabilities. Every sample contains processor performance counters (if enabled in section Processor performance counters and either PAPI or PMAPI are referred at configure time) and callstack information (if enabled in section XML Section: Callers and proper dependencies are set at configure time).
This section contains two attributes besides enabled
. These are:
type
determines which timer domain is used (see setitimer(2) or setitimer(3p) for further information on time domains). Available options are:real
(which is also thedefault
value,virtual
andprof
(which use the SIGALRM, SIGVTALRM and SIGPROF respectively). The default timing accumulates real time, but only issues samples at master thread. To let all the threads to collect samples, the type must bevirtual
orprof
.period
specifies the sampling periodicity. In the example above, samples are gathered every 50ms.variability
specifies the variability to the sampling periodicity. Such variability is calculated through therandom()
system call and then is added to the periodicity. In the given example, the variability is set to 10ms, thus the final sampling period ranges from 45 to 55ms.
See also
EXTRAE_SAMPLING_PERIOD
, EXTRAE_SAMPLING_VARIABILITY
,
EXTRAE_SAMPLING_CLOCKTYPE
and EXTRAE_SAMPLING_CALLER
environment variables in appendix Environment variables.
4.14. XML Section: CUDA¶
<cuda enabled="yes" />
This section indicates whether the CUDA calls should be instrumented or not. If
enabled
is set to yes, CUDA calls will be instrumented, otherwise they
will not be instrumented.
4.15. XML Section: OpenACC¶
<openacc enabled="yes" />
This section indicates wheter OpenACC host activity should be instrumented. If enabled
is set to yes, OpenACC
activity made by the host will be instrumented, otherwise it will be not. If the user wants to capture device activity,
they must also enable CUDA instrumentation.
4.16. XML Section: OpenCL¶
<opencl enabled="yes" />
This section indicates whether the OpenCL calls should be instrumented or not.
If enabled
is set to yes, Opencl calls will be instrumented, otherwise they
will not be instrumented.
4.17. XML Section: Input/Output¶
<input-output enabled="no" internals="no"/>
This section indicates whether I/O calls (read
and write
) are meant to
be instrumented. If enabled
is set to yes, the aforementioned calls will be
instrumented, otherwise they will not be instrumented. If internals
is set
to yes, I/O calls that occur inside other traced calls will also be captured.
4.18. XML Section: Dynamic memory¶
<dynamic-memory enabled="no">
<alloc enabled="yes" threshold="32768" />
<free enabled="yes" />
</dynamic-memory>
This section indicates whether dynamic memory calls (malloc
, free
,
realloc
) are meant to be instrumented. If enabled
is set to yes, the
aforementioned calls will be instrumented, otherwise they will not be
instrumented.
This section allows deciding whether allocation and free-related memory calls shall be instrumented.
Additionally, the configuration can also indicate whether allocation calls should be instrumented if the requested memory size surpasses a given threshold (32768 bytes, in the example).
4.19. XML Section: Memory references through Intel PEBS sampling¶
<pebs-sampling enabled="yes">
<loads enabled="yes" frequency="100" minimum-latency="10" />
<stores enabled="no" frequency="50">
<offcore-l3-misses enabled="no" /> <!-- Read together with stores samples. -->
</stores>
<load-l3-misses enabled="no" frequency="25" />
</pebs-sampling>
This section tells Extrae to use the PEBS feature from recent Intel processors [7] to sample memory references. These memory references capture the linear address referenced, the component of the memory hierarchy that solved the reference and the number of cycles to solve the reference.
In the example above, PEBS monitors one load instruction every 100 Hz that requires at least 10 cycles to be solved. Alternatively, the setting ‘frequency’ can be replaced by ‘period’, then PEBS will monitor one load instruction every given number of loads. Please note that the ‘period’ setting is not available for Skylake and newer processors.
4.20. XML Section: Executing CPU identification¶
<cpu-events enabled="no" frequency="0" emit-always="no" poi="none|openmp" />
By default the core identifier where every thread is executing is emitted at initialization points. This section enables extra measurements, configurable by time (‘frequency’) and points of interest (‘poi’). When ‘frequency’ is set, new measurements are emitted at instrumentation points if enough time has passed since the previous measurement. When ‘poi’ is set to ‘openmp’, new measurements are emitted at the entry and exit points of OpenMP outlined functions, tasks, and work dispatches. This option may introduce high variable overhead, please use with caution.
4.21. XML Section: Merge¶
<merge enabled="yes"
synchronization="default"
binary="mpi_ping"
tree-fan-out="16"
max-memory="512"
joint-states="yes"
keep-mpits="yes"
translate-addresses="yes"
sort-addresses="yes"
translate-data-addresses="yes"
overwrite="yes"
stop-at-percentage="50"
>
mpi_ping.prv
</merge>
If this section is enabled and the instrumentation package is configured to support this, the merge process will be automatically invoked after the application run. The merge process will use all the resources devoted to run the application.
In the given example, the leaf of this node will be used as the tracefile name
(mpi_ping.prv`
). Current available options for the merge process are
given as attribute of the <merge>
node and they are:
synchronization
: which can be set todefault
,node
,task
,no
. This determines how task clocks will be synchronized (default is node).binary
: points to the binary that is being executed. It will be used to translate gathered addresses (MPI callers, sampling points and user functions) into source code references.tree-fan-out
: only for MPI executions sets the tree-based topology to run the merger in a parallel fashion.max-memory
: limits the intermediate merging process to run up to the specified limit (in MBytes).joint-states
: which can be set toyes
,no
. Determines if the resulting Paraver tracefile will split or join equal consecutive states (default is ``yes``).keep-mpits
: whether to keep the intermediate tracefiles after performing the merge.translate-addresses
: whether to identify the calling site of instrumented calls and samples by the specified levels of the callstack (enabled by default); or just by their instruction addresses.sort-addresses
: whether to sort all addresses that refer to the source code (enabled by default).translate-data-addresses
: whether to identify allocated objects by their full callpath (enabled by default); or just by the tuple <library_base_address, symbol_offset_within_library>.overwrite
: set toyes
if the new tracefile can overwrite an existing tracefile with the same name. If set tono
, then the tracefile will be given a new name using a consecutive id.stop-at-percentage
: stops the generation of the tracefile at a given percentage. Accepts integer values from1
to99
. All other values disable this option.
In Linux systems, the tracing package can take advantage of certain functionalities from the system and can guess the binary name, and from it the tracefile name. In such systems, you can use the following reduced XML section replacing the earlier section.
<merge enabled="yes"
synchronization="default"
tree-fan-out="16"
max-memory="512"
joint-states="yes"
keep-mpits="yes"
translate-addresses="yes"
sort-addresses="yes"
translate-data-addresses="yes"
overwrite="yes"
stop-at-percentage="50"
/>
See also
For further references, see chapter Merging process.
4.22. Using environment variables within the XML file¶
XML tags and attributes can refer to environment variables that are defined in
the environment during the application run. If you want to refer to an
environment variable within the XML file, just enclose the name of the variable
using the dollar symbol ($
), for example: {$FOO$
}.
Note that the user has to put an specific value or a reference to an environment
variable which means that expanding environment variables in text is not allowed
as in a regular shell (i.e., the instrumentation package will not convert the
follwing text {bar$FOO$bar
).
Footnotes
[1] | More information available on their website http://icl.cs.utk.edu/papi. Extrae requires at least PAPI 3.x. |
[2] | PMAPI is only available for AIX operating system, and it is on the base operating system since AIX5.3. Extrae requires at least AIX 5.3. |
[3] | Some architectures do not allow grouping some performance counters in the same set. |
[4] | Each group contains several performance counters. |
[5] | This check is done each time the buffer is flushed, so the resulting size of the intermediate trace file depends also on the number of elements contained in the tracing buffer (see XML Section: Buffer management). |
[6] | See man signal(2) and man signal(7) for more details. |
[7] | Check for availability on your system by looking for pebs in
/proc/cpuinfo . |