Principal Components Analysis

An implementation of several strategies for principal components analysis (PCA), a common preprocessing step. Given a dataset and a desired new dimensionality, this can reduce the dimensionality of the data using the linear transformation determined by PCA.

:- use_module('path/to/.../src/methods/pca/pca.pl').

%% usage example
TrainData = [5.1,3.5,1.4, 4.9,3.0,1.4, 4.7,3.2,1.3, 4.6,3.1,1.5],
pca(0, randomized, TrainData, 3, TDataList, _, EigValList, EigVecList, _).

Available Predicates

links/resources

pca/9

Apply Principal Component Analysis to the provided data set.

%% predicate definition 
pca(ScaleData, DecompositionPolicy, DataList, DataRows, TransformedList, TDataCols, EigValList, EigVecList, ZCols) :-
        convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrows, X)),
        pcaI(ScaleData, DecompositionPolicy, X, Xsize, Xrows, TData, TDataCols, TDataRows, Y, Ysize, Z, ZCols, ZRows),
        convert_float_array_to_2d_list(TData, TDataCols, TDataRows, TransformedList),
        convert_float_array_to_list(Y, Ysize, EigValList),
        convert_float_array_to_2d_list(Z, ZCols, ZRows, EigVecList).

%% foreign c++ predicate definition 
foreign(pca, c, pcaI( +integer, 
                      +string,
                      +pointer(float_array), +integer, +integer,
                      -pointer(float_array), -integer, -integer,
                      -pointer(float_array), -integer,
                      -pointer(float_array), -integer, -integer)).

Parameters

Name	Type	Description	Default
scaleData	+integer(bool)	Whether or not to scale the data.	(0)false
decompositionPolicy	+string	Decomposition policy to use: "exact", "randomized", "randomized_block_krylov", "quic"	exact
data	+matrix	Input dataset to perform PCA on.	-
transformedData	-matrix	Matrix to put results of PCA into.	-
eigenValues	-vector	Vector to put eigenvalues into.	-
eigenVectors	-matrix	Matrix to put eigenvectors (loadings) into.	-

pcaDimReduction/8

Use PCA for dimensionality reduction on the given dataset.

This will save the newDimension largest principal components of the data and remove the rest. The parameter returned is the amount of variance of the data that is retained; this is a value between 0 and 1. For instance, a value of 0.9 indicates that 90% of the variance present in the data was retained.

%% predicate definition 
pcaDimReduction(ScaleData, DecompositionPolicy, DataList, DataRows, NewDim, TransformedList, TDataCols, Variance) :-
        NewDim > 0,
        convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrows, X)),
        pcaDimReductionI(ScaleData, DecompositionPolicy, X, Xsize, Xrows, NewDim, TData, TDataCols, TDataRows, Variance),
        convert_float_array_to_2d_list(TData, TDataCols, TDataRows, TransformedList).

%% foreign c++ predicate definition 
foreign(pcaDimReduction, c, pcaDimReductionI( +integer, 
                                              +string,
                                              +pointer(float_array), +integer, +integer,
                                              +integer,
                                              -pointer(float_array), -integer, -integer,
                                              [-float32])).

Parameters

Name	Type	Description	Default
scaleData	+integer(bool)	Whether or not to scale the data.	(0)false
decompositionPolicy	+string	Decomposition policy to use: "exact", "randomized", "randomized_block_krylov", "quic"	exact
data	+matrix	Input dataset to perform PCA on.	-
newDimension	+float	Desired dimensionality of output dataset. If 0, no dimensionality reduction is performed.	0
transformedData	-matrix	Matrix to put results of PCA into.	-
retainedVar	-float	Amount of Variance retained. Between [0,1]	-

pcaVarianceDimReduction/8

Use PCA for dimensionality reduction on the given dataset.

This will save as many dimensions as necessary to retain at least the given amount of variance (specified by parameter varRetained). The amount should be between 0 and 1; if the amount is 0, then only 1 dimension will be retained. If the amount is 1, then all dimensions will be retained.

The method returns the actual amount of variance retained, which will always be greater than or equal to the varRetained parameter.

%% predicate definition 
pcaVarianceDimReduction(ScaleData, DecompositionPolicy, DataList, DataRows, VarRetained, TransformedList, TDataCols, Variance) :-
        VarRetained >= 0.0,
        VarRetained =< 1.0,
        convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrows, X)),
        pcaVarianceDimReductionI(ScaleData, DecompositionPolicy, X, Xsize, Xrows, VarRetained, TData, TDataCols, TDataRows, Variance),
        convert_float_array_to_2d_list(TData, TDataCols, TDataRows, TransformedList).

%% foreign c++ predicate definition 
foreign(pcaVarianceDimReduction, c, pcaVarianceDimReductionI( +integer, 
                                                              +string,
                                                              +pointer(float_array), +integer, +integer,
                                                              +float32,
                                                              -pointer(float_array), -integer, -integer,
                                                              [-float32])).

Parameters

Name	Type	Description	Default
scaleData	+integer(bool)	Whether or not to scale the data.	(0)false
decompositionPolicy	+string	Decomposition policy to use: "exact", "randomized", "randomized_block_krylov", "quic"	exact
data	+matrix	Input dataset to perform PCA on.	-
varToRetaine	+float	Amount of variance to retain; should be between 0 and 1. If 1, all variance is retained.	0
transformedData	-matrix	Matrix to put results of PCA into.	-
retainedVar	-float	Amount of Variance actualy retained. Between [0,1]	-

Connected Links/Resources

If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.

added some of the links from the python documentation

Principal component analysis on Wikipedia

Comments

Please register or sign in to add a comment.