Decision tree

An implementation of an ID3-style decision tree for classification.

Available Predicates

initModel/9

Construct the decision tree on the given data and labels, assuming that the data is all of the numeric type.

Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

%% part of the predicate definition
initModel(  +pointer(float_array), +integer, +integer,
            +pointer(float_array), +integer,
            +integer, +integer, +float32, +integer).

Parameters

Name	Type	Description	Default
dataset	+matrix	Training dataset	-
labels	+vector	Training labels.	-
numClasses	+integer	Number of classes in the dataset.	-
minimumLeafSize	+integer	Minimum number of points in each leaf node.	20
minimumGainSplit	+float	Minimum gain for node splitting.	1e-7
maximumDepth	+integer	Maximum depth of the tree (0 means no limit).	0

classifyPoint/5

Classify the given point and also return estimates of the probability for each class in the given vector.

%% part of the predicate definition
classifyPoint(  +pointer(float_array), +integer,
                -integer,
                -pointer(float_array), -integer).

Parameters

Name	Type	Description	Default
point	+vector	Point to classify.	-
prediction	-integer	This will be set to the predicted class of the point.	-
probabilities	-vector	This will be filled with class probabilities for the point.	-

classifyMatrix/7

Classify the given points and also return estimates of the probabilities for each class in the given matrix.

%% part of the predicate definition
classifyMatrix(  +pointer(float_array), +integer, +integer,
                 -pointer(float_array), -integer,
                 -pointer(float_array), -integer).

Parameters

Name	Type	Description	Default
data	+matrix	Set of points to classify.	-
predictions	-vector	This will be filled with predictions for each point.	-
probabilities	-matrix	This will be filled with class probabilities for each point.	-

train/10

Train the decision tree on the given data, assuming that all dimensions are numeric.

This will overwrite the given model. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.

%% part of the predicate definition
train(  +pointer(float_array), +integer, +integer,
        +pointer(float_array), +integer,
        +integer, +integer, +float32, +integer,
        [-float32]).

Parameters

Name	Type	Description	Default
dataset	+matrix	Training dataset	-
labels	+vector	Training labels.	-
numClasses	+integer	Number of classes in the dataset.	-
minimumLeafSize	+integer	Minimum number of points in each leaf node.	20
minimumGainSplit	+float	Minimum gain for node splitting.	1e-7
maximumDepth	+integer	Maximum depth of the tree (0 means no limit).	0
entropy	-float	The final entropy of decision tree.	-

Connected Links/Resources

If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.

added some of the links from the python documentation

Comments

Please register or sign in to add a comment.