Skip to content
Snippets Groups Projects
Commit bd81af3b authored by Marc Feger's avatar Marc Feger
Browse files

Add more text for experiments

parent 228fe6d5
No related branches found
No related tags found
No related merge requests found
Bilder/battle.png

60.6 KiB

......@@ -25,6 +25,16 @@ As the \textit{Netflix-Prize} has shown, \textit{research} and \textit{validatio
Before actually conducting the experiment, the authors took a closer look at the given baselines. In the process, they noticed some \textit{systematic overlaps}. These can be taken from \textit{table} below.
\input{overlaps}
From the three aspects it can be seen that the models are fundamentally similar and that the main differences arise from different setups and learning procedures.
Thus, the authors examined the two learning methods \textit{stochastic gradient descent} and \textit{bayesian learning} in combination with \textit{biased matrix-factorization} before conducting the actual experiment. For $b_u = b_i = 0$ this is equivalent to \textit{regulated matrix-factorization (RSVD)}. In addition, for $\alpha = \beta = 1$ the \textit{weighted regulated matrix-factorization (WR)} is equivalent to \textit{RSVD}. Thus, the only differences are explained by the different adjustments of the methods.
To prepare the two learning procedures they were initialized with a \textit{gaussian normal distribution} $\mathcal{N}(\mu, 0.1^2)$. The value for the \textit{standard deviation} of 0.1 is the value suggested by the \textit{factorization machine libFM} as the default. In addition, \citet{Rendle13} achieved good results on the \textit{Netflix-Prize-dataset} with this value. Nothing is said about the parameter $\mu$. However, it can be assumed that this parameter is around the \textit{global average} of the \textit{ratings}. This can be assumed because \textit{ratings} are to be \textit{generated} with the \textit{initialization}.
For both approaches the number of \textit{sampling steps} was then set to \textit{128}. Since \textit{SGD} has two additional \textit{hyperparameters} $\lambda, \gamma$ these were also determined. Overall, the \textit{MovieLens10M-dataset} was evaluated by a \textit{10-fold cross-validation} over a \textit{random global} and \textit{non-overlapping 90:10 split}. In each split, \textit{90\%} of the data was used for \textit{training} and \textit{10\%} of the data was used for \textit{evaluation} without overlapping. In each split, \textit{95\%} of the \textit{training data} was used for \textit{training} and the remaining \textit{5\%} for \textit{evaluation} to determine the \textit{hyperparameters}. The \textit{hyperparameter search} was performed as mentioned in \textit{section} \ref{sec:sgd} using the \textit{grid} $(\lambda \in \{0.02, 0.03, 0.04, 0.05\}, \gamma \in \{0.001, 0.003\})$. This grid was inspired by findings during the \textit{Netflix-Prize} \citep{Kor08, Paterek07}. In total the parameters $\lambda=0.04$ and $\gamma=0.003$ could be determined. Afterwards both \textit{learning methods} and their settings were compared. The \textit{RMSE} was plotted against the used \textit{dimension} $f$ of $p_u, q_i \in \mathbb{R}^f$. \textit{Figure} \ref{fig:battle} shows the corresponding results.
\input{battle}
As a \textit{first intermediate result} of the preparation it can be stated that both \textit{SGD} and \textit{gibbs-samper} achieve better \textit{RMSE values} for increasing \textit{dimensional embedding}.
In addition, it can be stated that learning using the \textit{bayesian approach} is better than learning using \textit{SGD}. Even if the results could be different due to more efficient setups, it is still surprising that \textit{SGD} is worse than the \textit{bayesian approach}, although the \textit{exact opposite} was reported for \textit{MovieLens10M}. For example, \textit{figure} \ref{fig:reported_results} shows that the \textit{bayesian approach BPMF} achieved an \textit{RMSE} of \textit{0.8187} while the \textit{SGD approach Biased MF} performed better with \textit{0.803}. The fact that the \textit{bayesian approach} outperforms \textit{SGD} has already been reported and validated by \citet{Rendle13}, \citet{Rus08} for the \textit{Netflix-Prize-dataset}. Looking more closely at \textit{figures} \ref{fig:reported_results} and \ref{fig:battle}, the \textit{bayesian approach} scores better than the reported \textit{BPMF} and \textit{Biased MF} for each \textit{dimensional embedding}. Moreover, it even beats all reported baselines and new methods. Building on this, the authors have gone into the detailed examination of the methods and baselines.
\subsubsection{Experiment Implementation}
\subsection{Obeservations}
\subsubsection{Stronger Baselines}
......
\begin{figure}[!ht]
\centering
\includegraphics[scale=0.37]{Bilder/battle.png}
\caption{Comparison of \textit{matrix-factorization} learned by \textit{gibbs-sampling (bayesian learning)} and \textit{stochastic gradient descent (SGD)} for an \textit{embedding dimension} from \textit{16} to \textit{512}.
}
\label{fig:battle}
\end{figure}
......@@ -8,8 +8,8 @@
\hline
\textbf{Methods} & \textbf{Overlaps} \\ \hline
\textit{Biased MF}, \textit{RSVD} & Same method with the only difference being a different setup of the hyperparameters. \\ \hline
\textit{ALS-WR}, \textit{Biased MF}, \textit{RSVD} & Same models learned through different approaches. \\ \hline
\textit{BPMF}, \textit{RSVD}, \textit{ALS-WR} & Same models learned through different approaches. \\ \hline
\textit{ALS-WR}, \textit{Biased MF}, \textit{RSVD} & Same models that were learned with other approaches (\textit{SGD} and \textit{ALS}). \\ \hline
\textit{BPMF}, \textit{RSVD}, \textit{ALS-WR} & Completely different approach of learning but fundamentally the same model. \\ \hline
\end{tabular}%
}
\caption{\textit{Systematic consistency} of the \textit{baselines} used on \textit{MovieLens10M}.}
......
......@@ -53,6 +53,7 @@ As a second possibility, \textit{implicit influence} can be included. This can i
An important point that does not emerge from the above points is the question of how the individual components $p_u, q_i, b_u, b_i$ are constructed. In the following, the three most common methods are presented.
\subsubsection{Stochastic Gradient Descent}
\label{sec:sgd}
The best known and most common method when it comes to \textit{machine learning} is \textit{stochastic gradient descent (SGD)}. The goal of \textit{SGD} is to \textit{minimize} the \textit{error} of a given \textit{objective function}. Thus the estimators mentioned in section \ref{sec:mf} can be used as \textit{objective functions}. In the field of \textit{recommender systems}, \citet{Funk06} presented a \textit{modified} variant of \textit{SGD} in the context of the \textit{Netflix Challenge}. \textit{SGD} can be applied to \textit{regulated matrix-factorization} with \textit{bias} as well as without \textit{bias}. This method can be described by the following pseudo code:
\begin{algorithm}\label{alg:sgd}
\caption{SGD of Funk}
......
......@@ -158,3 +158,21 @@ title = {The BellKor solution to the Netflix Grand Prize}
address = {New York, NY, USA},
keywords = {Datasets, MovieLens, ratings, recommendations},
}
@inproceedings{Rendle13,
author = {Steffen Rendle},
year = {2013},
month = {03},
pages = {337-348},
title = {Scaling factorization machines to relational data},
volume = {6},
journal = {Proceedings of the VLDB Endowment},
doi = {10.14778/2535573.2488340}
}
@article{Paterek07,
author = {Arkadiusz Paterek},
year = {2007},
month = {01},
pages = {},
title = {Improving regularized singular value decomposition for collaborative filtering},
journal = {Proceedings of KDD Cup and Workshop}
}
\ No newline at end of file
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment