Neighborhood Components Analysis (NCA)
An implementation of neighborhood components analysis, a distance learning technique that can be used for preprocessing. Given a labeled dataset, this uses NCA, which seeks to improve the k-nearest-neighbor classification, and returns the learned distance metric.
:- use_module('path/to/.../src/methods/nca/nca.pl').
%% usage example
TrainData = [5.1,3.5,1.4, 4.9,3.0,1.4, 4.7,3.2,1.3, 4.6,3.1,1.5],
nca(sgd, 0.01, 500000, 0.00001, 1, 5, 0.0001, 0.9, 50, 0.000000001, 100000000, 50, TrainData, 3, [0,1,0,1], DistancesList, _).
Available Predicates
nca/17
Initialize the nca model and the selected optimizer.
Then perform nca on the given data and return the learned distance.
%% predicate definition
nca(OptimizerType, StepSize, MaxIterations, Tolerance, Shuffle, NumBasis, ArmijoConstant, Wolfe, MaxLine, MinStep, MaxStep, BatchSize,
DataList, DataRows, PredictionList, DistanceList, ZCols) :-
StepSize > 0,
MaxIterations >= 0,
Tolerance >= 0,
NumBasis > 0,
ArmijoConstant > 0,
Wolfe > 0,
MaxLine > 0,
MinStep > 0,
MaxStep > 0,
MaxStep >= MinStep,
BatchSize > 0,
convert_list_to_float_array(DataList, DataRows, array(Xsize, Xrows, X)),
convert_list_to_float_array(PredictionList, array(Ysize, Y)),
ncaI(OptimizerType, StepSize, MaxIterations, Tolerance, Shuffle, NumBasis, ArmijoConstant, Wolfe, MaxLine, MinStep, MaxStep, BatchSize,
X, Xsize, Xrows, Y, Ysize, Z, ZCols, ZRows),
convert_float_array_to_2d_list(Z, ZCols, ZRows, DistanceList).
%% foreign c++ predicate definition
foreign(nca, c, ncaI( +string,
+float32, +integer, +float32,
+integer,
+integer, +float32, +float32, +integer, +float32, +float32, +integer,
+pointer(float_array), +integer, +integer,
+pointer(float_array), +integer,
-pointer(float_array), -integer, -integer)).
Parameters
Name | Type | Description | Default |
---|---|---|---|
optimizerType | +string | Optimizer to use; "sgd" or "lbfgs". | sgd |
stepSize | +float | Step size for stochastic gradient descent (alpha). | 0.01 |
maxIterations | +integer | Maximum number of iterations for SGD or L-BFGS (0 indicates no limit). | 500000 |
tolerance | +float | Maximum tolerance for termination of SGD or L-BFGS. | 1e-7 |
shuffle | +integer(bool) | ||
numBasis | +integer | Number of memory points to be stored for L-BFGS. | 5 |
armijoConstant | +float | Armijo constant for L-BFGS. | 0.0001 |
wolfe | +float | Wolfe condition parameter for L-BFGS. | 0.9 |
maxLineSearchTrials | +integer | Maximum number of line search trials for L-BFGS. | 50 |
minStep | +float | Minimum step of line search for L-BFGS. | 1e-20 |
maxStep | +float | Maximum step of line search for L-BFGS. | 1e+20 |
batchSize | +integer | Batch size for mini-batch SGD. | 50 |
data | +matrix | Input dataset to run NCA on. | - |
labels | +vector | Labels for input dataset. | - |
distance | -matrix | Output matrix for learned distance matrix. | - |
Connected Links/Resources
If you want a more detailed explanation, then go to the python documentation. There is most of the time a good explanation on how the methods work and what the parameters do.
added some of the links from the python documentation