K-Means Clustering
An implementation of several strategies for efficient k-means clustering. Given a dataset and a value of k, this computes and returns a k-means clustering on that data.
Available Predicates
naiveKMeans/12
Runs kmeans with naive as the algorithm for the Lloyd iteration.
%% part of the predicate definition
naiveKMeans( +integer, +integer, +integer,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
dualTreeKMeans/12
Runs kmeans with dualtree as the algorithm for the Lloyd iteration.
%% part of the predicate definition
dualTreeKMeans( +integer, +integer, +integer,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
elkanKMeans/12
Runs kmeans with elkan as the algorithm for the Lloyd iteration.
%% part of the predicate definition
elkanKMeans( +integer, +integer, +integer,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
hamerlyKMeans/12
Runs kmeans with hamerly as the algorithm for the Lloyd iteration.
%% part of the predicate definition
hamerlyKMeans( +integer, +integer, +integer,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
pellegMooreKMeans/12]
Runs kmeans with pelleg morre as the algorithm for the Lloyd iteration.
%% part of the predicate definition
pellegMooreKMeans( +integer, +integer, +integer,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
maxIterations | +integer | Maximum number of iterations before k-means terminates. | 1000 |
initialPartition | +string | Sets the initialPartitionPolicy : "SampleInitialzation", "RandomPartition" | SampleInitialzation |
emptyCluster | +string | Sets the emptyClusterPolicy: "MaxVarianceNewCluster", "KillEmptyCluster", "AllowEmptyCluster" | AllowEmptyCluster |
data | +matrix | Input dataset to perform clustering on. | - |
clusters | +integer | Number of clusters to find (0 autodetects from initial centroids). | 0 |
assignments | -vector | Vector to store cluster assignments in. | - |
centroids | -matrix | Matrix in which centroids are stored. | - |
Connected Links/Resources
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
added some of the links from the python documentation