Analyzer configuration
The same scheme follows for the analyzer. Parameters available for this component:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Input data in CSV format. |
|
|
|
Output path. |
|
|
|
Preprocessing configuration. |
|
|
|
Plotting parameters. |
|
|
|
Parameters for classification analyses, e.g., decision trees. |
|
|
|
Parameters for feature importance analyses, e.g., random forests. |
|
prepare_data parameters:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Columns or dimensions to consider. |
|
|
|
Values of rows to filter. |
|
|
|
Dimension of interest, e.g. FLOPS. |
|
|
|
Normalization of values for the target dimension: |
|
|
|
Dictionary containing meta-information for the categories: |
|
plots parameters:
Parameter |
Description |
Type |
Default |
|---|---|---|---|
|
Dimension to use for sorting values. |
|
|
|
Type of plot: relplot, scatterplot, lineplot or kdeplot. |
|
|
|
Output format: png, pdf, eps, ps or svg. |
|
|
|
Dimension for the X axis. |
|
|
|
Dimension for the Y axis. |
|
|
|
Dimension to group by color. |
|
|
|
Dimension to group by size. |
|
|
|
Apply logarithmic scale. |
|
Example
Imagine we have a dataset with 5 columns: DIM_0, DIM_1, DIM_2,
DIM_3 and Cycles. We want to filter those values where
DIM_1=="VALUE", and we want to measure the impact of the rest of the
dimensions over Cycles. We also want to normalize the values and create
categories as the target dimension is not discrete. We are plotting a KDE or
density plot and a scatterplot. The system also performs a classification using
a 4-depth decision tree. The possible configuration file could be as follows:
- kernel:
input: input.csv
output_path: .
prepare_data:
cols: DIM_0 DIM_2 DIM_3
rows: { "DIM_1": "VALUE" }
target: Cycles
norm: True
categories:
enabled: True
scaling_factor: "1"
plot:
density:
enabled: True
type: kdeplot
format: pdf
data: processed
x_axis: tsc
x_label: "Time Stamp Counter (TSC)"
log_scale: False
hue: tsc_cat
cumulative: False
hatches: True
multiple: "stack"
rugplot: True
mark_centroids: True
scatterplot:
enabled: False
type: scatterplot
format: pdf
data: processed
x_axis: index
y_axis: cycles
palette: "crimson"
hue: arch
classification:
type: decision_tree
enabled: True
config:
max_depth: 4
max_leaves: 10000
proportion: True
text_tree: False
graph_tree: True
orientation: "horizontal"
feature_importance:
type: random_forest
enabled: True
config:
n_estimators: 50
criterion: "gini"
max_depth: 50
n_jobs: -1
random_state: 0
Refer to the examples/ for more elaborate examples.