Profiler configuration
The configuration file for the profiler is structured in a YAML file. Parameters available for the profiler kernel dictionary:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Name of the kernel or program. |
|
|
|
Folder containing the sources. |
|
|
|
Commands to execute before compilation. Tuning CPUs, allocating huge pages, etc. |
|
|
|
Tasks to execute after the experiments. |
|
|
|
Cartesian product of the list of parameters. This includes the list Makefile options, |
|
|
|
Compiler configurations ( |
|
|
|
Execution parameters ( |
|
|
|
Output options, such as name and format ( |
|
finalize parameters:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Clean temporal files. |
|
|
|
Clean assembly files generated. |
|
|
|
Clean binary files. |
|
|
|
Execute a command after the execution of the set of experiments. |
|
|
configuration parameters:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Options to Makefile. |
|
[ |
|
texttt{-D} flags. Each of them can be described as in table |
|
|
|
Expression for computing FLOPS count. This can be expressed dynamically using |
|
compiler parameters:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Enable/disable compilation. Useful for pre-generated binaries. |
|
|
|
Number of processes to use for compilation. |
|
1 |
|
Dictionary of compilers with a list of specific flags each. |
|
|
|
Main source file to be compiled |
|
|
|
If kernel not inlined, then it need to be compiled from a different source. |
|
|
|
“asm” or “C”. Determines the language for MARTA instrumentation insertion. |
|
“asm” |
|
syntax: ASM syntax, count_ins: count the number and type of ASM instructions in the region of interest, |
|
{} |
d_features parameters:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Type of expression: static, dynamic. static for |
|
|
|
Value of the expression: “numeric”, “string”. |
|
“numeric” |
|
Expression generating the list of values, e.g. |
|
execution parameters:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Enable execution |
|
|
|
List of PAPI counters to read. |
|
|
|
Measure execution time with |
|
|
|
Measure TSC cycles using |
|
|
|
Repetitions per each configuration. |
|
7 |
|
Threshold for outlier detection. |
|
.1 |
|
Compute average values after discarding outliers. |
|
|
|
Number of iterations of the loop containing the ROI if specified. |
|
1 |
|
Enable or disable turbo boost on Intel processors via MSR. |
|
|
|
Set maximum CPU frequency via MSR. |
|
|
|
Logical CPU ID for pinning single-thread measurements. |
|
0 |
|
Cache flush enabled for architectures supporting |
|
|
output parameters:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Name of output file. |
|
|
|
Output columns. If “all”, then all dimensions used in the configuration: compiler, d_features, kernel_config, papi_counters, etc. |
|
|
|
Generate a log file with all information related to the experiment: host machine, elapsed time, standard output, standard error, etc. |
|
Example
Imagine we want to compile a test.c file with the Cartesian product of the
-DINPUT=["$PWD/file0.txt","$PWD/file1.txt"] and -DNUM_VAR=[0,2,4]. This
way MARTA will generate the combination of all these parameters, i.e.,
-DINPUT="$PWD/file0.txt" and -DNUM_VAR=0, -DINPUT="$PWD/file0.txt"
and -DNUM_VAR=2, etc. A possible configuration file for the profiler could be:
- kernel:
name: "test"
type: "regular"
path: "src/"
preamble:
command: "sudo cpupower frequency-set -u 2.1GHz > /dev/null && sudo cpupower frequency-set -d 2.1GHz > /dev/null"
compilation:
enabled: True
processes: 1
language: "C"
compiler_flags:
{ "gcc-10": [" -Ofast -march=native -mtune=native -mavx2 "] }
main_src: "main.c"
kernel_inlined: False
asm_analysis:
syntax: "att"
count_ins: False
static_analysis: False
llvm_mca_bin: llvm-mca
debug: False
configuration:
d_features:
INPUT:
type: "static"
path: True
value: '["[PATH]/file0.txt","[PATH]/file1.txt"]'
NUM_VAR:
type: "dynamic"
value: 'range(5,0,2)'
flops: ""
execution:
enabled: True
papi_counters: ["PAPI_L1_DCM", "PAPI_L2_TCM"]
tsc: True
time: True
threshold_outliers: 5 # in percentage
discard_outliers: True # remove outliers from average
compute_avg: True # divide values by nsteps
nexec: 20
nsteps: 1
cpu_affinity: 1
check_dump: False
prefix: ""
output:
name: "test_config_file"
format: "csv"
columns: "all"
report: False
verbose: True
For further more complete examples please refer to the repository.