MARTA: Multi-configuration Assembly pRofiler Toolkit for performance Analysis

MARTA is a productivity-aware toolkit for profiling and performance characterization. It is meant for executing

This toolkit performs in two stages: profiling and analysis. The first component compiles, executes and collects information from hardware counters, and the second component post-process that data offline given a set of parameters to consider, applying data mining and ML techniques for classification in order to build knowledge, e.g., in the form of decision trees, analyzing the influence of dimensiones, etc. For instance, having a piece of code or kernel such as:

for (int i = INIT_VAL; i < UPPER_BOUND; i += STEP) {
    y[i] += A[i] * x[i];
}

It could be interesting to analyze the deviation in performance of same code but varying INIT_VAL, UPPER_BOUND and STEP. Just given that little code and those variables or parameters, MARTA extracts information in the form of a decision tree regarding performance. Decision trees categorize the performance of the kernel (or other target column of the domain) according to the dimensions of interest specified.

MARTA is also a very low intrusive profiler, even though it requires recompiling. It is a header-based profiler, including directives for detailing the start and end of the region of interest (RoI), it can perform different compilations and executions, for instance, using different flags and/or compilers, and generating a readable table with performance metrics. This enables a fast comparison between compilers for a vast set of different combinations of parameters and flags.

Dependencies

Python >=3.7.
Libraries specified in requirements.txt.
PAPI >=5.7.0.
Linux environment with root access. Recommended >=3.14 version to allow PAPI use rdpmc for reading hardware counters.

Profiler

The Profiler module is designed for parsing the configuration files, compiling all the binary versions specified in them, and running the generated binaries, collecting execution data. The strength of this module lies in its ability to generate as many different executable versions as necessary, as defined by the Cartesian product of the sets of different options in the configuration, e.g., compile-time options (e.g., whether to enable or disable particular optimizations), program inputs, or program features (e.g., -D flags enabling different code paths). The generation of different program versions, which is often a bottleneck in micro-architectural exploration, can be done in parallel.

In order to achieve maximum reliability, the Profiler integrates with several different tested-and-true software packages such as the PolyBench/C library, using their low-level configuration and measuring capabilities. The upper part of Figure~ref{fig:martaarch} details the design of this module. The Profiler receives two inputs:

Configuration file: a structured YAML file containing all parameters related to compilation (e.g. -D flags, compilers and their flags, etc.), execution (e.g. threads to launch and their affinity, number of repetitions, maximum deviation in measurements, etc.), and data collection (e.g. output format, dimensions to include, static code analysis parameters, etc.). For convenience, some of these parameters can be overwritten by using CLI arguments.
Source code/application: typically a C/C++ program whose execution prints in standard output values collected from hardware counters, as well as the execution time and values reported by the Time Stamp Counter (TSC). The system helps produce this output format by including a set of functions and macros at runtime.

The output generated by all the executions in the experimental set is encoded into a CSV file, which is passed as input to the Analyzer module.

Analyzer

The Analyzer integrated in the tool is meant for processing raw data, typically the output of the Profiler, and mining knowledge from these data, primarily through the use of scikit-learn. It can also generate relational plots given a set of dimensions of interest.

Configuration file: a structured YAML file specifying data wrangling parameters (including filtering, normalization and categorization) as well as classification and plotting parameters. For classification customization, all parameters follow the same naming or API as in scikit-learn.

Examples: cases of study

Under the examples directory there are available examples to better understand how the tool works.

Testing

This library uses pytest for unit and integration tests. All tests are located under tests directory. For more information refer to tests/README.md.

Contributing

See the CONTRIBUTING.md file.

License, copyright and authors

See LICENSE, COPYRIGHT and AUTHORS files, respectively, for further information.

Logo

Lili Kudrili/Shutterstock.com

Citation

Regular citation:

Horro, M. Pouchet, L.-N. Rodríguez, G. and Touriño, J. MARTA: Multi-configuration Assembly pRofiler and Toolkit for performance Analysis in Proceedings of the EEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Singapore, pp. 10:1-11. 2022.

Bibtex example:

@inproceedings{horro,
  author={Horro, Marcos and Pouchet, Louis-Noël and Rodríguez, Gabriel and Touriño, Juan},
  booktitle={2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)},
  title={MARTA: Multi-configuration Assembly pRofiler and Toolkit for performance Analysis},
  year={2022},
  number={10},
  pages={1--11},
}