Analyzer configuration

The same scheme follows for the analyzer. Parameters available for this component:

Parameter	Description	Type
`input`	Input data in CSV format.	`str`
`output_path`	Output path.	`str`
`prepare_data`	Preprocessing configuration.	`dict`
`plot`	Plotting parameters.	`dict`
`classification`	Parameters for classification analyses, e.g., decision trees.	`dict`
`feat_importance`	Parameters for feature importance analyses, e.g., random forests.	`dict`

prepare_data parameters:

Parameter	Description	Type
`cols`	Columns or dimensions to consider.	`list`
`rows`	Values of rows to filter.	`dict`
`target`	Dimension of interest, e.g. FLOPS.	`str`
`norm`	Normalization of values for the target dimension: `minmax` or `zscore`.	`str`
`categories`	Dictionary containing meta-information for the categories: `num` (number of categories to generate statically), `grid_search` (use KDE and perform grid searching for bandwidth and kernel parameters), `mode` (if `normal`, Silverman is used for KDE. If `multimodal`, Sheather-Jones is used).	`dict`

plots parameters:

Parameter	Description	Type
`sort`	Dimension to use for sorting values.	`str`
`type`	Type of plot: relplot, scatterplot, lineplot or kdeplot.	`str`
`format`	Output format: png, pdf, eps, ps or svg.	`str`
`x_axis`	Dimension for the X axis.	`str`
`y_axis`	Dimension for the Y axis.	`str`
`hue`	Dimension to group by color.	`str`
`size`	Dimension to group by size.	`str`
`log_scale`	Apply logarithmic scale.	`bool`

Example

Imagine we have a dataset with 5 columns: DIM_0, DIM_1, DIM_2, DIM_3 and Cycles. We want to filter those values where DIM_1=="VALUE", and we want to measure the impact of the rest of the dimensions over Cycles. We also want to normalize the values and create categories as the target dimension is not discrete. We are plotting a KDE or density plot and a scatterplot. The system also performs a classification using a 4-depth decision tree. The possible configuration file could be as follows:

- kernel:
    input: input.csv
    output_path: .
    prepare_data:
        cols: DIM_0 DIM_2 DIM_3
        rows: { "DIM_1": "VALUE" }
        target: Cycles
        norm: True
        categories:
            enabled: True
            scaling_factor: "1"
    plot:
        density:
            enabled: True
            type: kdeplot
            format: pdf
            data: processed
            x_axis: tsc
            x_label: "Time Stamp Counter (TSC)"
            log_scale: False
            hue: tsc_cat
            cumulative: False
            hatches: True
            multiple: "stack"
            rugplot: True
            mark_centroids: True
        scatterplot:
            enabled: False
            type: scatterplot
            format: pdf
            data: processed
            x_axis: index
            y_axis: cycles
            palette: "crimson"
            hue: arch
    classification:
        type: decision_tree
        enabled: True
        config:
            max_depth: 4
            max_leaves: 10000
            proportion: True
            text_tree: False
            graph_tree: True
            orientation: "horizontal"
    feature_importance:
        type: random_forest
        enabled: True
        config:
            n_estimators: 50
            criterion: "gini"
            max_depth: 50
            n_jobs: -1
            random_state: 0

Refer to the examples/ for more elaborate examples.