Principal Components Analysis
An implementation of several strategies for principal components analysis (PCA), a common preprocessing step. Given a dataset and a desired new dimensionality, this can reduce the dimensionality of the data using the linear transformation determined by PCA.
:- use_module('path/to/.../src/methods/pca/pca.pl').
%% usage example
TrainData = [5.1,3.5,1.4, 4.9,3.0,1.4, 4.7,3.2,1.3, 4.6,3.1,1.5],
pca(0, randomized, TrainData, 3, TDataList, _, EigValList, EigVecList, _).
Available Predicates
pca/9
Apply Principal Component Analysis to the provided data set.
%% predicate definition
pca(ScaleData, DecompositionPolicy, DataList, DataRows, TransformedList, TDataCols, EigValList, EigVecList, ZCols) :-
convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrows, X)),
pcaI(ScaleData, DecompositionPolicy, X, Xsize, Xrows, TData, TDataCols, TDataRows, Y, Ysize, Z, ZCols, ZRows),
convert_float_array_to_2d_list(TData, TDataCols, TDataRows, TransformedList),
convert_float_array_to_list(Y, Ysize, EigValList),
convert_float_array_to_2d_list(Z, ZCols, ZRows, EigVecList).
%% foreign c++ predicate definition
foreign(pca, c, pcaI( +integer,
+string,
+pointer(float_array), +integer, +integer,
-pointer(float_array), -integer, -integer,
-pointer(float_array), -integer,
-pointer(float_array), -integer, -integer)).
Parameters
Name | Type | Description | Default |
---|---|---|---|
scaleData | +integer(bool) | Whether or not to scale the data. | (0)false |
decompositionPolicy | +string | Decomposition policy to use: "exact", "randomized", "randomized_block_krylov", "quic" | exact |
data | +matrix | Input dataset to perform PCA on. | - |
transformedData | -matrix | Matrix to put results of PCA into. | - |
eigenValues | -vector | Vector to put eigenvalues into. | - |
eigenVectors | -matrix | Matrix to put eigenvectors (loadings) into. | - |
pcaDimReduction/8
Use PCA for dimensionality reduction on the given dataset.
This will save the newDimension largest principal components of the data and remove the rest. The parameter returned is the amount of variance of the data that is retained; this is a value between 0 and 1. For instance, a value of 0.9 indicates that 90% of the variance present in the data was retained.
%% predicate definition
pcaDimReduction(ScaleData, DecompositionPolicy, DataList, DataRows, NewDim, TransformedList, TDataCols, Variance) :-
NewDim > 0,
convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrows, X)),
pcaDimReductionI(ScaleData, DecompositionPolicy, X, Xsize, Xrows, NewDim, TData, TDataCols, TDataRows, Variance),
convert_float_array_to_2d_list(TData, TDataCols, TDataRows, TransformedList).
%% foreign c++ predicate definition
foreign(pcaDimReduction, c, pcaDimReductionI( +integer,
+string,
+pointer(float_array), +integer, +integer,
+integer,
-pointer(float_array), -integer, -integer,
[-float32])).
Parameters
Name | Type | Description | Default |
---|---|---|---|
scaleData | +integer(bool) | Whether or not to scale the data. | (0)false |
decompositionPolicy | +string | Decomposition policy to use: "exact", "randomized", "randomized_block_krylov", "quic" | exact |
data | +matrix | Input dataset to perform PCA on. | - |
newDimension | +float | Desired dimensionality of output dataset. If 0, no dimensionality reduction is performed. | 0 |
transformedData | -matrix | Matrix to put results of PCA into. | - |
retainedVar | -float | Amount of Variance retained. Between [0,1] | - |
pcaVarianceDimReduction/8
Use PCA for dimensionality reduction on the given dataset.
This will save as many dimensions as necessary to retain at least the given amount of variance (specified by parameter varRetained). The amount should be between 0 and 1; if the amount is 0, then only 1 dimension will be retained. If the amount is 1, then all dimensions will be retained.
The method returns the actual amount of variance retained, which will always be greater than or equal to the varRetained parameter.
%% predicate definition
pcaVarianceDimReduction(ScaleData, DecompositionPolicy, DataList, DataRows, VarRetained, TransformedList, TDataCols, Variance) :-
VarRetained >= 0.0,
VarRetained =< 1.0,
convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrows, X)),
pcaVarianceDimReductionI(ScaleData, DecompositionPolicy, X, Xsize, Xrows, VarRetained, TData, TDataCols, TDataRows, Variance),
convert_float_array_to_2d_list(TData, TDataCols, TDataRows, TransformedList).
%% foreign c++ predicate definition
foreign(pcaVarianceDimReduction, c, pcaVarianceDimReductionI( +integer,
+string,
+pointer(float_array), +integer, +integer,
+float32,
-pointer(float_array), -integer, -integer,
[-float32])).
Parameters
Name | Type | Description | Default |
---|---|---|---|
scaleData | +integer(bool) | Whether or not to scale the data. | (0)false |
decompositionPolicy | +string | Decomposition policy to use: "exact", "randomized", "randomized_block_krylov", "quic" | exact |
data | +matrix | Input dataset to perform PCA on. | - |
varToRetaine | +float | Amount of variance to retain; should be between 0 and 1. If 1, all variance is retained. | 0 |
transformedData | -matrix | Matrix to put results of PCA into. | - |
retainedVar | -float | Amount of Variance actualy retained. Between [0,1] | - |
Connected Links/Resources
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
added some of the links from the python documentation