|
|
|
# K-Means Clustering
|
|
|
|
|
|
|
|
An implementation of several strategies for efficient k-means clustering. Given a dataset and a value of k, this computes and returns a k-means clustering on that data.
|
|
|
|
|
|
|
|
# Available Predicates
|
|
|
|
|
|
|
|
* [naiveKMeans/12](/PrologMethods/Clustering/kmeans#naivekmeans12)
|
|
|
|
* [dualTreeKMeans/12](/PrologMethods/Clustering/kmeans#dualtreekmeans12)
|
|
|
|
* [elkanKMeans/12](/PrologMethods/Clustering/kmeans#elkankmeans12)
|
|
|
|
* [hamerlyKMeans/12](/PrologMethods/Clustering/kmeans#hamerlykmeans12)
|
|
|
|
* [pellegMooreKMeans/12](/PrologMethods/Clustering/kmeans#pellegmoorekmeans12)
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
[links/resources](/PrologMethods/Clustering/kmeans#connected-linksresources)
|
|
|
|
|
|
|
|
## **_naiveKMeans/12_**
|
|
|
|
|
|
|
|
Runs kmeans with naive as the algorithm for the Lloyd iteration.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
naiveKMeans( +integer, +integer, +integer,
|
|
|
|
+pointer(float_array), +integer, +integer,
|
|
|
|
+integer,
|
|
|
|
-pointer(float_array), -integer,
|
|
|
|
-pointer(float_array), -integer, -integer).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
|
|
|
|
| initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
|
|
|
|
| emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
|
|
|
|
| data | +matrix | Input dataset to perform clustering on. | - |
|
|
|
|
| clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
|
|
|
|
| assignments | -vector | Vector to store cluster assignments in. | - |
|
|
|
|
| centroids | -matrix | Matrix in which centroids are stored. | - |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
## **_dualTreeKMeans/12_**
|
|
|
|
|
|
|
|
Runs kmeans with dualtree as the algorithm for the Lloyd iteration.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
dualTreeKMeans( +integer, +integer, +integer,
|
|
|
|
+pointer(float_array), +integer, +integer,
|
|
|
|
+integer,
|
|
|
|
-pointer(float_array), -integer,
|
|
|
|
-pointer(float_array), -integer, -integer).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
|
|
|
|
| initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
|
|
|
|
| emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
|
|
|
|
| data | +matrix | Input dataset to perform clustering on. | - |
|
|
|
|
| clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
|
|
|
|
| assignments | -vector | Vector to store cluster assignments in. | - |
|
|
|
|
| centroids | -matrix | Matrix in which centroids are stored. | - |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
## **_elkanKMeans/12_**
|
|
|
|
|
|
|
|
Runs kmeans with elkan as the algorithm for the Lloyd iteration.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
elkanKMeans( +integer, +integer, +integer,
|
|
|
|
+pointer(float_array), +integer, +integer,
|
|
|
|
+integer,
|
|
|
|
-pointer(float_array), -integer,
|
|
|
|
-pointer(float_array), -integer, -integer).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
|
|
|
|
| initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
|
|
|
|
| emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
|
|
|
|
| data | +matrix | Input dataset to perform clustering on. | - |
|
|
|
|
| clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
|
|
|
|
| assignments | -vector | Vector to store cluster assignments in. | - |
|
|
|
|
| centroids | -matrix | Matrix in which centroids are stored. | - |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
## **_hamerlyKMeans/12_**
|
|
|
|
|
|
|
|
Runs kmeans with hamerly as the algorithm for the Lloyd iteration.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
hamerlyKMeans( +integer, +integer, +integer,
|
|
|
|
+pointer(float_array), +integer, +integer,
|
|
|
|
+integer,
|
|
|
|
-pointer(float_array), -integer,
|
|
|
|
-pointer(float_array), -integer, -integer).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
|
|
|
|
| initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
|
|
|
|
| emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
|
|
|
|
| data | +matrix | Input dataset to perform clustering on. | - |
|
|
|
|
| clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
|
|
|
|
| assignments | -vector | Vector to store cluster assignments in. | - |
|
|
|
|
| centroids | -matrix | Matrix in which centroids are stored. | - |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
## **_pellegMooreKMeans/12\]_**
|
|
|
|
|
|
|
|
Runs kmeans with pelleg morre as the algorithm for the Lloyd iteration.
|
|
|
|
|
|
|
|
```prolog
|
|
|
|
%% part of the predicate definition
|
|
|
|
pellegMooreKMeans( +integer, +integer, +integer,
|
|
|
|
+pointer(float_array), +integer, +integer,
|
|
|
|
+integer,
|
|
|
|
-pointer(float_array), -integer,
|
|
|
|
-pointer(float_array), -integer, -integer).
|
|
|
|
```
|
|
|
|
|
|
|
|
### Parameters
|
|
|
|
| Name | Type | Description | Default |
|
|
|
|
|------|------|-------------|---------|
|
|
|
|
| maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
|
|
|
|
| initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
|
|
|
|
| emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
|
|
|
|
| data | +matrix | Input dataset to perform clustering on. | - |
|
|
|
|
| clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
|
|
|
|
| assignments | -vector | Vector to store cluster assignments in. | - |
|
|
|
|
| centroids | -matrix | Matrix in which centroids are stored. | - |
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
# Connected Links/Resources
|
|
|
|
|
|
|
|
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
|
|
|
|
|
|
|
|
* [MLpack::kmeans_C++\_documentation](https://www.mlpack.org/doc/stable/doxygen/classmlpack_1_1kmeans_1_1KMeans.html)
|
|
|
|
* [MLpack::kmeans_Python_documentation](https://www.mlpack.org/doc/stable/python_documentation.html#kmeans)
|
|
|
|
|
|
|
|
added some of the links from the python documentation
|
|
|
|
|
|
|
|
* dbscan()
|
|
|
|
* [Using the triangle inequality to accelerate k-means (pdf)](http://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf)
|
|
|
|
* [Making k-means even faster (pdf)](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.586.2554&rep=rep1&type=pdf)
|
|
|
|
* [Accelerating exact k-means algorithms with geometric reasoning (pdf)](http://reports-archive.adm.cs.cmu.edu/anon/anon/usr/ftp/usr0/ftp/2000/CMU-CS-00-105.pdf)
|
|
|
|
* [A dual-tree algorithm for fast k-means clustering with large k (pdf)](http://www.ratml.org/pub/pdf/2017dual.pdf) |
|
|
|
\ No newline at end of file |