Hoeffding trees
An implementation of Hoeffding trees, a form of streaming decision tree for classification. Given labeled data, a Hoeffding tree can be used for predicting the classifications of new points.
:- use_module('path/to/.../src/methods/hoeffding_tree/hoeffding_tree.pl').
%% usage example
TrainData = [5.1,3.5,1.4, 4.9,3.0,1.4, 4.7,3.2,1.3, 4.6,3.1,1.5],
TestData = [3,2,0, 5,1,4, 0,0,4, 3,3,5, 0,5,5, 2,5,5],
hoeffding_tree_initAndBuildModel(gini_hoeffding, TrainData, 3, [0,1,0,1], 2, 0, 0.95, 5000, 100, 100, 10, 100),
hoeffding_tree_classify(TestData, 3, PredicList, ProbsList).
Available Predicates
hoeffding_tree_initAndBuildModel/12
Construct the Hoeffding tree with the given parameters and given training data.
The tree may be trained either in batch mode (which looks at all points before splitting, and propagates these points to the created children for further training), or in streaming mode, where each point is only considered once. (In general, batch mode will give better-performing trees, but will have higher memory and runtime costs for the same dataset.)
%% part of the predicate definition
hoeffding_tree_initAndBuildModel(TreeType, DataList, DataRows, LabelsList, NumClasses, BatchTraining, SuccessProbability, MaxSamples, CheckInterval, MinSamples, Bins, ObservationsBeforeBinning) :-
NumClasses >= 0,
SuccessProbability >= 0,
SuccessProbability =< 1,
MaxSamples >= 0,
CheckInterval > 0,
MinSamples >= 0,
Bins >= 0,
ObservationsBeforeBinning >= 0,
convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrownum, X)),
convert_list_to_float_array(LabelsList, array(Ysize, Y)),
initAndBuildModelI(TreeType, X, Xsize, Xrownum, Y, Ysize, NumClasses, BatchTraining, SuccessProbability, MaxSamples, CheckInterval, MinSamples, Bins, ObservationsBeforeBinning).
foreign(initAndBuildModel, c, initAndBuildModelI( +string,
+pointer(float_array), +integer, +integer,
+pointer(float_array), +integer,
+integer, +integer, +float32, +integer, +integer, +integer, +integer, +integer)).
Parameters
Name | Type | Description | Default |
---|---|---|---|
treeType | +string | "gini-hoeffding", "gini-binary", "info-hoeffding", "info-binary" | gini-binary |
dataset | +matrix | Dataset to train on. | - |
labels | +vector | Labels of each point in the dataset. | - |
numClasses | +integer | Number of classes in the dataset. | 0 |
batchTraining | +integer(bool) | If true, samples will be considered in batch instead of as a stream. This generally results in better trees but at the cost of memory usage and runtime. | (0)false |
successProbability | +float | Probability of success required in Hoeffding bounds before a split can happen. | 0.95 |
maxSamples | +integer | Maximum number of samples before a split is forced (0 never forces a split); ignored in batch training mode. | 0 or 5000 |
checkInterval | +integer | Number of samples required before each split; ignored in batch training mode. | 100 |
minSamples | +integer | If the node has seen this many points or fewer, no split will be allowed. | 100 |
bins | +integer | If the "domingos"/"hoeffding" split strategy is used, this specifies the number of bins for each numeric split. | 10 |
observationsBeforeBinning | +integer | If the "domingos"/"hoeffding" split strategy is used, this specifies the number of samples observed before binning is performed. | 100 |
hoeffding_tree_classify/4
Classify the given points, using this node and the entire (sub)tree beneath it.
The predicted labels for each point are returned, as well as an estimate of the probability that the prediction is correct for each point. This estimate is simply the MajorityProbability for the leaf that each point bins to.
%% predicate definition
hoeffding_tree_classify(TestList, TestRows, PredicList, ProbsList) :-
convert_list_to_float_array(TestList, TestRows, array(Xsize, Xrownum, X)),
classifyI(X, Xsize, Xrownum, Y, Ysize, Z, Zsize),
convert_float_array_to_list(Y, Ysize, PredicList),
convert_float_array_to_list(Z, Zsize, ProbsList).
%% foreign c++ predicate definition
foreign(classify, c, classifyI( +pointer(float_array), +integer, +integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer)).
Parameters
Name | Type | Description | Default |
---|---|---|---|
data | +matrix | Points to classify. | - |
predictions | -vector | Predicted labels for each point. | - |
probabilities | -vector | Probability estimates for each predicted label. | - |
hoeffding_tree_train/4
Train in streaming mode on the given dataset.
This takes one pass. Be sure that hoeffding_tree_initAndBuildModel/12 has been called first!
%% predicate definition
hoeffding_tree_train(DataList, DataRows, LabelsList, BatchTraining) :-
convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrownum, X)),
convert_list_to_float_array(LabelsList, array(Ysize, Y)),
trainI(X, Xsize, Xrownum, Y, Ysize, BatchTraining).
%% foreign c++ predicate definition
foreign(train, c, trainI( +pointer(float_array), +integer, +integer,
+pointer(float_array), +integer,
+integer)).
Parameters
Name | Type | Description | Default |
---|---|---|---|
data | +matrix | Dataset to train on. | - |
labels | +vector | Labels for training set. | - |
batchTraining | +integer(bool) | Whether or not to train in batch. | (0)false |
Connected Links/Resources
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
added some of the links from the python documentation