K-Means Clustering
An implementation of several strategies for efficient k-means clustering. Given a dataset and a value of k, this computes and returns a k-means clustering on that data.
Available Predicates
naiveKMeans/9
Runs kmeans with naive as the algorithm for the Lloyd iteration.
%% part of the predicate definition
naiveKMeans( +integer,
+string,
+string,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition" | sampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster" | allowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
dualTreeKMeans/9
Runs kmeans with dualtree as the algorithm for the Lloyd iteration.
%% part of the predicate definition
dualTreeKMeans( +integer,
+string,
+string,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition" | sampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster" | allowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
elkanKMeans/9
Runs kmeans with elkan as the algorithm for the Lloyd iteration.
%% part of the predicate definition
elkanKMeans( +integer,
+string,
+string,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition" | sampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster" | allowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
hamerlyKMeans/9
Runs kmeans with hamerly as the algorithm for the Lloyd iteration.
%% part of the predicate definition
hamerlyKMeans( +integer,
+string,
+string,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition" | sampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster" | allowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
pellegMooreKMeans/9
Runs kmeans with pelleg morre as the algorithm for the Lloyd iteration.
%% part of the predicate definition
pellegMooreKMeans( +integer,
+string,
+string,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition" | sampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster" | allowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
Connected Links/Resources
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
added some of the links from the python documentation