13. Overhead

Extrae includes a set of tests to evaluate the overhead imposed to the application by different components.

These tests are installed in $EXTRAE_HOME/share/tests/overhead and can be run by executing the run_overhead_tests.sh script within this directory.

Note that this script compiles and executes the generated binaries on the same system, so this script will require some tuning to run in a system that uses a batch-queuing system and/or needs cross-compiling.

Currently there are the following tests the evaluate the necessary time to perform certain operations:

  • posix_clock grab the current time using the posix clock. Even the simpler emitted event requires gathering a timestamp.
  • extrae_event emit one event (without performance counters) into the tracing buffer using the Extrae_event API call.
  • extrae_nevent4 emit four events (without performance counters) into the tracing buffer using the Extrae_nevent4 API call.
  • extrae_eventandcounters emit one event (and reading 4 peformance counters) into the tracing buffer through the Extrae_eventandcounters call.
  • papi_read1 capture the value of one performance counter through PAPI.
  • papi_read4 capture the value of four performance counters through PAPI.
  • extrae_user_function involves traversing the processor call-stack while searching the frame that points to the current routine (as the Extrae_user_function API call).
  • extrae_get_caller1 traverses one level of the processor call-stack.
  • extrae_get_caller6 traverses six levels of the processor call-stack.
  • extrae_trace_callers collects three frames from the processor call-stack.
  • extrae_event_Java measures the time required to emit one event (without performance counters) from Java through the JNI connector.
  • extrae_nevent4_Java measures the time needed to emit four events (without performance counters) from Java through the JNI connector.

Figure 13.1 depicts the overhead of Extrae 4.1.2 in the following systems:

  • System based on Intel Xeon E5649 (Nehalem) processors. Extrae was compiled with support for libunwind 1.1 and PAPI 5.0.1.
  • System based on Intel Xeon E5-2670 (SandyBridge) processors. Extrae was compiled with support for libunwind 1.1, PAPI 5.4.1 and IBM’s Java7.
  • System based on Intel Xeon E5-2680 (Haswell) processors. Extrae was compiled with support for libunwind 1.1 and PAPI 5.4.1 and OpenJDK’s Java 1.8.
  • System based on IBM Power8. Extrae was compiled with support for libunwind (downloaded from GIT) and PAPI 5.4.1.
  • System based on Cortex-A15 (Samsung Exynos 5). Extrae was compiled with support for libunwind (downloaded from GIT) and PAPI 5.4.1.

The reader may notice that the ARM processor requires more time to execute the tests than the rest, even for the simpler cases (posix_clock and extrae_event). The Power8-based system takes a similar amount of time than Intel-based systems except for the call-stack traversal. Within Intel-based systems, the SandyBridge processor reduced the time significantly from the Nehalem processor but the Haswell does not show a great reduction from SandyBridge.

_images/overheads.eps

Figure 13.1 Overhead result in a variety of systems for Extrae 4.1.2