Experiments
The experiments on the Optimal Decision Diagrams for Classification paper are based
on the output of the run_experiments.py script.
Running
To run the experiments for a specific data set and seed, execute the following commands from the project root, within the virtual environment created during installation:
$ mkdir results
$ python run_experiments.py [name] [seed]
where [name] is the data set name and [seed] is an integer.
The resulting CSV files will be available in the results directory.
All available datasets can be found in the datasets/processed directory. Refer to the Data pipeline section
for more information.
Results
The experiment results are output to file [dataset]_[seed]_ddTrainingSolutions.csv.
Columns
Name |
Meaning |
|---|---|
seed |
Seed integer used for this run |
dataset |
Dataset name |
split |
Whether univariate or multivariate split type |
symBreak |
Whether symmetry breaking constraints are active |
forceTree |
Whether topology is forced to conform to a tree |
numSamples |
Number of samples in the dataset |
numFeatures |
Number of features in the dataset |
numClasses |
Number of classes in the dataset |
topology |
Topology skeleton used for this run |
alpha |
Regularization parameter used for this run |
optimal |
If the MIP solution is optimal |
accuracyStep1 |
Accuracy achieved in the first step |
accuracyStep2 |
Accuracy achieved in the second step |
objVal |
MIP objective function value |
gap |
MIP gap |
accValid |
Accuracy achieved in the validation set |
accTest |
Accuracy achieved in the test set |
durationStep1 |
Time duration of the first step |
durationStep2 |
Time duration of the second step |
bestSolutionTime |
Time when the best solution was obtained |
internalNodes1 |
Number of internal nodes used in the first step solution |
internalNodes2 |
Number of internal nodes used in the second step solution |
leafNodes |
Number of leaf nodes in the final solution |
upperBound1 |
MIP upper bound found in the first step |
upperBound2 |
MIP upper bound found in the second step |
lowerBound |
MIP lower bound found in the third step |
fragmentationPerNode |
% of training samples that reach each node |