Architecture
Dataset and Topology
The basic setup for building decision diagrams is to first instantiate the Dataset and Topology classes.
They serve as auxiliary structures for both the constructive heuristic algorithm and the MILP optimizer.
The Dataset class is mainly responsible for splitting data into train, validation and test sets, normalizing input, and creating
data structures for feature data (usually denoted as the matrix \(\mathbf{X}\)) and target data (usually denoted as the vector \(\mathbf{y}\)).
It expects as input the path to a dataset file in CSV format. A seed can be given for the randomization of the train/validation/test split.
The Topology class is responsible for storing all fixed structural information about the diagram’s topology, which includes, for instance,
the lists of the diagram’s layers, internal and terminal nodes, and possible arcs departing each internal node. As input, it expects a skeleton,
a list of numbers representing the width (number of nodes) of each layer.
from dataset import Dataset
from topology import Topology
dataset = "path/to/dataset.csv"
skeleton = [1,2,4,4,4]
data = Dataset(dataset, train_validation_test_split=[0.5, 0.25, 0.25], seed=1)
topology = Topology(skeleton, data)
# data structure examples
data.train_X # ndarray of training samples
data.train_Y # training targets
data.classes # list of classes
topology.internal_nodes # list of internal node indexes
topology.nodes_per_layer[2] # list of nodes in the third layer
Heuristic
The next step for building the decision diagram would be to find an initial solution using the constructive heuristic algorithm.
The Heuristic class expects as input the Dataset and Topology instances created earlier. You may also provide
an alpha hyperparameter, for controlling model complexity.
from heuristic import Heuristic
alpha = 0.1
heuristic = Heuristic(data, topology, alpha=alpha)
# training and test accuracy
heuristic.solution.training_accuracy()
heuristic.solution.test_accuracy()
Solution
As can be seen from the last example, the heuristic returns a solution attribute. This attribute is an instance of the Solution
class. It is responsible for storing final or intermediate structural data about a decision diagram, such as split decisions for internal nodes (in
the form of a hyperplane and intercept), used nodes, and the definition of the arcs departing each used internal node. This information defines a built
decision diagram.
Along with structural information, the Solution provides methods for important metrics, such as training, validation and test accuracy, as well as objective value.
Optimizer
The Optimizer class implements the MILP formulation. It expects as input the Data and Topology instances.
It also accepts an alpha hyperparameter and an initial solution. Here, we provide the heuristic solution found earlier:
optimized = Optimizer(data, topology, alpha=alpha, initial_solution=heuristic.solution)
# training and test accuracy
optimized.solution.training_accuracy()
optimized.solution.test_accuracy()
Configuration
Further configuration for the Optimizer can be found in the config module, including the solver’s tolerance parameters, time limit and
verbosity.
Visualizer
Finally, the Visualizer class can be used to graphically represent a decision diagram. It expects the Data and Solution
instances, and returns a Graphviz directd graph
(Digraph).
solution = optimized.solution # or heuristic.solution
viz = Visualizer(data, solution)
viz.view()
An example output for the tic-tac-toe data set: