K-Means Clustering

An implementation of several strategies for efficient k-means clustering. Given a dataset and a value of k, this computes and returns a k-means clustering on that data.

Available Predicates

links/resources

naiveKMeans/9

Runs kmeans with naive as the algorithm for the Lloyd iteration.

%% part of the predicate definition
naiveKMeans(  +integer, 
              +string, 
              +string, 
              +pointer(float_array), +integer, +integer,
              +integer, 
              -pointer(float_array), -integer, 
              -pointer(float_array), -integer, -integer).

Parameters

Name	Type	Description	Default
maxIterations	+integer	Maximum number of iterations before k-means terminates.	1000
initialPartition	+string	Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition"	sampleInitialzation
emptyCluster	+string	Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster"	allowEmptyCluster
data	+matrix	Input dataset to perform clustering on.	-
clusters	+integer	Number of clusters to find (0 autodetects from initial centroids).	0
assignments	-vector	Vector to store cluster assignments in.	-
centroids	-matrix	Matrix in which centroids are stored.	-

dualTreeKMeans/9

Runs kmeans with dualtree as the algorithm for the Lloyd iteration.

%% part of the predicate definition
dualTreeKMeans(  +integer, 
                 +string, 
                 +string, 
                 +pointer(float_array), +integer, +integer,
                 +integer, 
                 -pointer(float_array), -integer, 
                 -pointer(float_array), -integer, -integer).

Parameters

Name	Type	Description	Default
maxIterations	+integer	Maximum number of iterations before k-means terminates.	1000
initialPartition	+string	Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition"	sampleInitialzation
emptyCluster	+string	Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster"	allowEmptyCluster
data	+matrix	Input dataset to perform clustering on.	-
clusters	+integer	Number of clusters to find (0 autodetects from initial centroids).	0
assignments	-vector	Vector to store cluster assignments in.	-
centroids	-matrix	Matrix in which centroids are stored.	-

elkanKMeans/9

Runs kmeans with elkan as the algorithm for the Lloyd iteration.

%% part of the predicate definition
elkanKMeans(  +integer, 
              +string, 
              +string, 
              +pointer(float_array), +integer, +integer,
              +integer, 
              -pointer(float_array), -integer, 
              -pointer(float_array), -integer, -integer).

Parameters

Name	Type	Description	Default
maxIterations	+integer	Maximum number of iterations before k-means terminates.	1000
initialPartition	+string	Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition"	sampleInitialzation
emptyCluster	+string	Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster"	allowEmptyCluster
data	+matrix	Input dataset to perform clustering on.	-
clusters	+integer	Number of clusters to find (0 autodetects from initial centroids).	0
assignments	-vector	Vector to store cluster assignments in.	-
centroids	-matrix	Matrix in which centroids are stored.	-

hamerlyKMeans/9

Runs kmeans with hamerly as the algorithm for the Lloyd iteration.

%% part of the predicate definition
hamerlyKMeans(  +integer, 
                +string, 
                +string, 
                +pointer(float_array), +integer, +integer,
                +integer, 
                -pointer(float_array), -integer, 
                -pointer(float_array), -integer, -integer).

Parameters

Name	Type	Description	Default
maxIterations	+integer	Maximum number of iterations before k-means terminates.	1000
initialPartition	+string	Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition"	sampleInitialzation
emptyCluster	+string	Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster"	allowEmptyCluster
data	+matrix	Input dataset to perform clustering on.	-
clusters	+integer	Number of clusters to find (0 autodetects from initial centroids).	0
assignments	-vector	Vector to store cluster assignments in.	-
centroids	-matrix	Matrix in which centroids are stored.	-

pellegMooreKMeans/9

Runs kmeans with pelleg morre as the algorithm for the Lloyd iteration.

%% part of the predicate definition
pellegMooreKMeans(  +integer, 
                    +string, 
                    +string, 
                    +pointer(float_array), +integer, +integer,
                    +integer, 
                    -pointer(float_array), -integer, 
                    -pointer(float_array), -integer, -integer).

Parameters

Name	Type	Description	Default
maxIterations	+integer	Maximum number of iterations before k-means terminates.	1000
initialPartition	+string	Sets the initialPartitionPolicy : "sampleInitialization", "randomPartition"	sampleInitialzation
emptyCluster	+string	Sets the emptyClusterPolicy: "maxVarianceNewCluster", "killEmptyCluster", "allowEmptyCluster"	allowEmptyCluster
data	+matrix	Input dataset to perform clustering on.	-
clusters	+integer	Number of clusters to find (0 autodetects from initial centroids).	0
assignments	-vector	Vector to store cluster assignments in.	-
centroids	-matrix	Matrix in which centroids are stored.	-

Connected Links/Resources

If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.

added some of the links from the python documentation

Comments

Please register or sign in to add a comment.