|
|
|
# Decision tree
|
|
|
|
|
|
|
|
An implementation of an ID3-style decision tree for classification.
|
|
|
|
|
|
|
|
# Available Predicates
|
|
|
|
|
|
|
|
* [initModel/9](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/decision_tree#initmodel9)
|
|
|
|
* [classifyPoint/5](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/decision_tree#classifypoint5)
|
|
|
|
* [classifyMatrix/7](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/decision_tree#classifymatrix7)
|
|
|
|
* [train/10](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/decision_tree#train10)
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
[links/resources](https://gitlab.cs.uni-duesseldorf.de/stups/abschlussarbeiten/prolog-mlpack-libary/-/wikis/PrologMethods/Classification/decision_tree#connected-linksresources)
|
|
|
|
|
|
|
|
## **_initModel/9_**
|
|
|
|
|
|
|
|
Construct the decision tree on the given data and labels, assuming that the data is all of the numeric type.
|
|
|
|
|
|
|
|
Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
initModel( +pointer(float_array), +integer, +integer,
|
|
|
|
+pointer(float_array), +integer,
|
|
|
|
+integer, +integer, +float32, +integer).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| dataset | +matrix | Training dataset | - |
|
|
|
|
| labels | +vector | Training labels. | - |
|
|
|
|
| numClasses | +integer | Number of classes in the dataset. | - |
|
|
|
|
| minimumLeafSize | +integer | Minimum number of points in each leaf node. | 20 |
|
|
|
|
| minimumGainSplit | +float | Minimum gain for node splitting. | 1e-7 |
|
|
|
|
| maximumDepth | +integer | Maximum depth of the tree (0 means no limit). | 0 |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
## **_classifyPoint/5_**
|
|
|
|
|
|
|
|
Classify the given point and also return estimates of the probability for each class in the given vector.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
classifyPoint( +pointer(float_array), +integer,
|
|
|
|
-integer,
|
|
|
|
-pointer(float_array), -integer).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| point | +vector | Point to classify. | - |
|
|
|
|
| prediction | -integer | This will be set to the predicted class of the point. | - |
|
|
|
|
| probabilities | -vector | This will be filled with class probabilities for the point. | - |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
## **_classifyMatrix/7_**
|
|
|
|
|
|
|
|
Classify the given points and also return estimates of the probabilities for each class in the given matrix.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
classifyMatrix( +pointer(float_array), +integer, +integer,
|
|
|
|
-pointer(float_array), -integer,
|
|
|
|
-pointer(float_array), -integer).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| data | +matrix | Set of points to classify. | - |
|
|
|
|
| predictions | -vector | This will be filled with predictions for each point. | - |
|
|
|
|
| probabilities | -matrix | This will be filled with class probabilities for each point. | - |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
## **_train/10_**
|
|
|
|
|
|
|
|
Train the decision tree on the given data, assuming that all dimensions are numeric.
|
|
|
|
|
|
|
|
This will overwrite the given model. Setting minimumLeafSize and minimumGainSplit too small may cause the tree to overfit, but setting them too large may cause it to underfit.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
train( +pointer(float_array), +integer, +integer,
|
|
|
|
+pointer(float_array), +integer,
|
|
|
|
+integer, +integer, +float32, +integer,
|
|
|
|
[-float32]).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| dataset | +matrix | Training dataset | - |
|
|
|
|
| labels | +vector | Training labels. | - |
|
|
|
|
| numClasses | +integer | Number of classes in the dataset. | - |
|
|
|
|
| minimumLeafSize | +integer | Minimum number of points in each leaf node. | 20 |
|
|
|
|
| minimumGainSplit | +float | Minimum gain for node splitting. | 1e-7 |
|
|
|
|
| maximumDepth | +integer | Maximum depth of the tree (0 means no limit). | 0 |
|
|
|
|
| entropy | -float | The final entropy of decision tree. | - |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
# Connected Links/Resources
|
|
|
|
|
|
|
|
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
|
|
|
|
|
|
|
|
* [MLpack::decision_tree_C++\_documentation](https://www.mlpack.org/doc/stable/doxygen/classmlpack_1_1tree_1_1DecisionTree.html)
|
|
|
|
* [MLpack::decision_tree_Python_documentation](https://www.mlpack.org/doc/stable/python_documentation.html#decision_tree)
|
|
|
|
|
|
|
|
added some of the links from the python documentation
|
|
|
|
|
|
|
|
* Random forest
|
|
|
|
* [Decision trees on Wikipedia](https://en.wikipedia.org/wiki/Decision_tree_learning)
|
|
|
|
* [Induction of Decision Trees (pdf)](https://link.springer.com/content/pdf/10.1007/BF00116251.pdf) |
|
|
|
\ No newline at end of file |