Decision tree
An implementation of an ID3-style decision tree for classification.
Available Predicates
initModel/9
Construct the decision tree on the given data and labels, assuming that the data is all of the numeric type.
Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
%% part of the predicate definition
initModel( +pointer(float_array), +integer, +integer,
+pointer(float_array), +integer,
+integer, +integer, +float32, +integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
dataset | +matrix | Training dataset | - |
labels | +vector | Training labels. | - |
numClasses | +integer | Number of classes in the dataset. | - |
minimumLeafSize | +integer | Minimum number of points in each leaf node. | 20 |
minimumGainSplit | +float | Minimum gain for node splitting. | 1e-7 |
maximumDepth | +integer | Maximum depth of the tree (0 means no limit). | 0 |
classifyPoint/5
Classify the given point and also return estimates of the probability for each class in the given vector.
%% part of the predicate definition
classifyPoint( +pointer(float_array), +integer,
-integer,
-pointer(float_array), -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
point | +vector | Point to classify. | - |
prediction | -integer | This will be set to the predicted class of the point. | - |
probabilities | -vector | This will be filled with class probabilities for the point. | - |
classifyMatrix/7
Classify the given points and also return estimates of the probabilities for each class in the given matrix.
%% part of the predicate definition
classifyMatrix( +pointer(float_array), +integer, +integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
data | +matrix | Set of points to classify. | - |
predictions | -vector | This will be filled with predictions for each point. | - |
probabilities | -matrix | This will be filled with class probabilities for each point. | - |
train/10
Train the decision tree on the given data, assuming that all dimensions are numeric.
This will overwrite the given model. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
%% part of the predicate definition
train( +pointer(float_array), +integer, +integer,
+pointer(float_array), +integer,
+integer, +integer, +float32, +integer,
[-float32]).
Parameters
Name | Type | Description | Default |
---|---|---|---|
dataset | +matrix | Training dataset | - |
labels | +vector | Training labels. | - |
numClasses | +integer | Number of classes in the dataset. | - |
minimumLeafSize | +integer | Minimum number of points in each leaf node. | 20 |
minimumGainSplit | +float | Minimum gain for node splitting. | 1e-7 |
maximumDepth | +integer | Maximum depth of the tree (0 means no limit). | 0 |
entropy | -float | The final entropy of decision tree. | - |
Connected Links/Resources
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
added some of the links from the python documentation