The core idea of \textit{matrix factorization} is to supplement the not completely filled out \textit{rating-matrix}$\mathcal{R}$. For this purpose the \textit{users} and \textit{items} are to be mapped to a joined \textit{latent feature space} with \textit{dimensionality}$f$. The \textit{user} is represented by the vector $p_u \in\mathbb{R}^{f}$ and the item by the vector $q_i \in\mathbb{R}^{f}$. As a result, the \textit{missing ratings} and thus the \textit{user-item interaction} are to be determined via the \textit{inner product}$\hat{r}_{ui}=q_i^Tp_u$ of the corresponding vectors \citep{Kor09}. In the following, the four most classical matrix factorization approaches are described in detail. Afterwards, the concrete learning methods with which the vectors are learned are presented. In addition, the \textit{training data} for which a \textit{concrete rating} is available should be referred to as $\mathcal{B}=\lbrace(u,i) | r_{ui}\in\mathcal{R}\rbrace$.
\subsubsection{Basic Matrix-Factorization}
...
...
@@ -48,3 +48,9 @@ Thus, \textit{implicit data} can also be included.
First of all, it should be mentioned that \textit{temporary dynamics} can also be included.
On the one hand, it is not realistic that a \textit{user} cannot change his taste. On the other hand, the properties of an \textit{item} remain constant. Therefore, \textit{missing ratings} can also be determined \textit{time-based}. A \textit{missing rating} is then determined by $\hat{r}_{ui}=\mu+ b_i(t)+ b_u(t)+ q_i^{T}p_u(t)$\citep{Kor09}.
As a second possibility, \textit{implicit influence} can be included. This can involve the \textit{properties} of the \textit{items} a \textit{user} is dealing with. A \textit{missing rating} can be determined by $\hat{r}_{ui}=\mu+ b_i + b_u + q_i^{T}(p_u + |\mathcal{I}_u|^{-\frac{1}{2}}\sum_{i \in\mathcal{I}_u}{y_i})$. $y_i \in\mathbb{R}^{f}$ describes the \textit{feature vectors} of the \textit{items}$i \in\mathcal{I}_u$ which have been evaluated by \textit{user}$u$. The corresponding \textit{minimization problems} can be adjusted as mentioned in the sections above \citep{Kor08}.
\subsection{Optimization and Learning}
An important point that does not emerge from the above points is the question of how the individual components $p_u, q_i, b_u, b_i$ are constructed. In the following, the three most common methods are presented.
\subsubsection{Stochastic Gradient Descent}
The best known and most common method when it comes to \textit{machine learning} is \textit{stochastic gradient descent (SGD)}. The goal of \textit{SGD} is to \textit{minimize} the \textit{error} of a given \textit{objective function}. Thus the estimators mentioned in section \ref{sec:mf} can be used as \textit{objective functions}. In the field of \textit{recommender systems}, \citet{Funk06} presented a \textit{modified} variant of \textit{SGD} in the context of the \textit{Netflix Challenge}. This can be applied to \textit{regulated matrix-factorization} with \textit{bias} as well as without \textit{bias}. This method can be described by the following pseudo code: