Principal Components Analysis
An implementation of several strategies for principal components analysis (PCA), a common preprocessing step. Given a dataset and a desired new dimensionality, this can reduce the dimensionality of the data using the linear transformation determined by PCA.
Available Predicates
pca/13
Apply Principal Component Analysis to the provided data set.
%% part of the predicate definition
pca( +integer, +string,
+pointer(float_array), +integer, +integer,
-pointer(float_array), -integer, -integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer).
Parameters
Name | Type | Description | Default |
---|---|---|---|
scaleData | +integer(bool) | Whether or not to scale the data. | (0)false |
decompositionPolicy | +string | Decomposition policy to use: "exact", "randomized", "randomized-block-krylov", "quic" | exact |
data | +matrix | Input dataset to perform PCA on. | - |
newDimension | +integer | Desired dimensionality of output dataset. If 0, no dimensionality reduction is performed. | 0 |
transformedData | -matrix | Matrix to put results of PCA into. | - |
eigenValues | -vector | Vector to put eigenvalues into. | - |
eigenVectors | -matrix | Matrix to put eigenvectors (loadings) into. | - |
pcaDimReduction/10
Use PCA for dimensionality reduction on the given dataset.
This will save the newDimension largest principal components of the data and remove the rest. The parameter returned is the amount of variance of the data that is retained; this is a value between 0 and 1. For instance, a value of 0.9 indicates that 90% of the variance present in the data was retained.
%% part of the predicate definition
pcaDimReduction( +integer, +string,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer, -integer,
[-float32]).
Parameters
Name | Type | Description | Default |
---|---|---|---|
scaleData | +integer(bool) | Whether or not to scale the data. | (0)false |
decompositionPolicy | +string | Decomposition policy to use: "exact", "randomized", "randomized-block-krylov", "quic" | exact |
data | +matrix | Input dataset to perform PCA on. | - |
newDimension | +float | Desired dimensionality of output dataset. If 0, no dimensionality reduction is performed. | 0 |
transformedData | -matrix | Matrix to put results of PCA into. | - |
retainedVar | -float | Amount of Variance retained. Between [0,1] | - |
pcaVarianceDimReduction/10
Use PCA for dimensionality reduction on the given dataset.
This will save as many dimensions as necessary to retain at least the given amount of variance (specified by parameter varRetained). The amount should be between 0 and 1; if the amount is 0, then only 1 dimension will be retained. If the amount is 1, then all dimensions will be retained.
The method returns the actual amount of variance retained, which will always be greater than or equal to the varRetained parameter.
%% part of the predicate definition
pcaVarianceDimReduction( +integer, +string,
+pointer(float_array), +integer, +integer,
+float32,
-pointer(float_array), -integer, -integer,
[-float32]).
Parameters
Name | Type | Description | Default |
---|---|---|---|
scaleData | +integer(bool) | Whether or not to scale the data. | (0)false |
decompositionPolicy | +string | Decomposition policy to use: "exact", "randomized", "randomized-block-krylov", "quic" | exact |
data | +matrix | Input dataset to perform PCA on. | - |
varToRetaine | +float | Amount of variance to retain; should be between 0 and 1. If 1, all variance is retained. | 0 |
transformedData | -matrix | Matrix to put results of PCA into. | - |
retainedVar | -float | Amount of Variance actualy retained. Between [0,1] | - |
Connected Links/Resources
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
added some of the links from the python documentation