|
|
# Hoeffding trees
|
|
|
|
|
|
An implementation of Hoeffding trees, a form of streaming decision tree for classification. Given labeled data, a Hoeffding tree can be used for predicting the classifications of new points.
|
|
|
|
|
|
# Available Predicates
|
|
|
|
|
|
* [initAndBuildModel/14](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/hoeffding_tree#initandbuildmodel14)
|
|
|
* [classify/7](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/hoeffding_tree#classify7)
|
|
|
* [train/6](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/hoeffding_tree#train6)
|
|
|
|
|
|
---
|
|
|
|
|
|
[links/resources](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/hoeffding_tree#connected-linksresources)
|
|
|
|
|
|
## **_initAndBuildModel/14_**
|
|
|
|
|
|
Construct the Hoeffding tree with the given parameters and given training data.
|
|
|
|
|
|
The tree may be trained either in batch mode (which looks at all points before splitting, and propagates these points to the created children for further training), or in streaming mode, where each point is only considered once. (In general, batch mode will give better-performing trees, but will have higher memory and runtime costs for the same dataset.)
|
|
|
|
|
|
```prolog
|
|
|
%% part of the predicate definition
|
|
|
initAndBuildModel( +string,
|
|
|
+pointer(float_array), +integer, +integer,
|
|
|
+pointer(float_array), +integer,
|
|
|
+integer,
|
|
|
+integer,
|
|
|
+float32, +integer, +integer, +integer, +integer, +integer).
|
|
|
```
|
|
|
|
|
|
### Parameters
|
|
|
| Name | Type | Description | Default |
|
|
|
|------|------|-------------|---------|
|
|
|
| treeType | +string | "gini-hoeffding", "gini-binary", "info-hoeffding", "info-binary" | gini-binary |
|
|
|
| dataset | +matrix | Dataset to train on. | - |
|
|
|
| labels | +vector | Labels of each point in the dataset. | - |
|
|
|
| numClasses | +integer | Number of classes in the dataset. | 0 |
|
|
|
| batchTraining | +integer(bool) | If true, samples will be considered in batch instead of as a stream. This generally results in better trees but at the cost of memory usage and runtime. | (0)false |
|
|
|
| successProbability | +float | Probability of success required in Hoeffding bounds before a split can happen. | 0.95 |
|
|
|
| maxSamples | +integer | Maximum number of samples before a split is forced (0 never forces a split); ignored in batch training mode. | 0 or 5000 |
|
|
|
| checkInterval | +integer | Number of samples required before each split; ignored in batch training mode. | 100 |
|
|
|
| minSamples | +integer | If the node has seen this many points or fewer, no split will be allowed. | 100 |
|
|
|
| bins | +integer | If the "domingos"/"hoeffding" split strategy is used, this specifies the number of bins for each numeric split. | 10 |
|
|
|
| observationsBeforeBinning | +integer | If the "domingos"/"hoeffding" split strategy is used, this specifies the number of samples observed before binning is performed. | 100 |
|
|
|
|
|
|
---
|
|
|
|
|
|
## **_classify/7_**
|
|
|
|
|
|
Classify the given points, using this node and the entire (sub)tree beneath it.
|
|
|
|
|
|
The predicted labels for each point are returned, as well as an estimate of the probability that the prediction is correct for each point. This estimate is simply the **MajorityProbability** for the leaf that each point bins to.
|
|
|
|
|
|
```prolog
|
|
|
%% part of the predicate definition
|
|
|
classify( +pointer(float_array), +integer, +integer,
|
|
|
-pointer(float_array), -integer,
|
|
|
-pointer(float_array), -integer).
|
|
|
```
|
|
|
|
|
|
### Parameters
|
|
|
| Name | Type | Description | Default |
|
|
|
|------|------|-------------|---------|
|
|
|
| data | +matrix | Points to classify. | - |
|
|
|
| predictions | -vector | Predicted labels for each point. | - |
|
|
|
| probabilities | -vector | Probability estimates for each predicted label. | - |
|
|
|
|
|
|
---
|
|
|
|
|
|
## **_train/6_**
|
|
|
|
|
|
Train in streaming mode on the given dataset.
|
|
|
|
|
|
This takes one pass. Be sure that initAndBuildModel/14 has been called first!
|
|
|
|
|
|
```prolog
|
|
|
%% part of the predicate definition
|
|
|
train( +pointer(float_array), +integer, +integer,
|
|
|
+pointer(float_array), +integer,
|
|
|
+integer).
|
|
|
```
|
|
|
|
|
|
### Parameters
|
|
|
| Name | Type | Description | Default |
|
|
|
|------|------|-------------|---------|
|
|
|
| data | +matrix | Dataset to train on. | - |
|
|
|
| labels | +vector | Labels for training set. | - |
|
|
|
| batchTraining | +integer(bool) | Whether or not to train in batch. | (0)false |
|
|
|
|
|
|
---
|
|
|
|
|
|
# Connected Links/Resources
|
|
|
|
|
|
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
|
|
|
|
|
|
* [MLpack::hoeffding_tree_C++\_documentation](https://www.mlpack.org/doc/stable/doxygen/classmlpack_1_1tree_1_1HoeffdingTree.html)
|
|
|
* [MLpack::hoeffding_tree_Python_documentation](https://www.mlpack.org/doc/stable/python_documentation.html#hoeffding_tree)
|
|
|
|
|
|
added some of the links from the python documentation
|
|
|
|
|
|
* [decision_tree](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/decision_tree#decision-tree)
|
|
|
* random_forest
|
|
|
* [Mining High-Speed Data Streams (pdf)](http://dm.cs.washington.edu/papers/vfdt-kdd00.pdf) |
|
|
\ No newline at end of file |