diff --git a/recommender.tex b/recommender.tex index f70837f5e86e0bfa42e3d4108f3aea936b6242d8..4e1e933da09933aed0a3ae667c29620fdadb9125 100644 --- a/recommender.tex +++ b/recommender.tex @@ -24,10 +24,11 @@ Figure \ref{fig:cf} shows a sketch of the general operation of the \textit{colla The core idea of \textit{matrix factorization} is to supplement the not completely filled out \textit{rating-matrix} $\mathcal{R}$. For this purpose the \textit{users} and \textit{items} are to be mapped to a joined \textit{latent feature space} with \textit{dimensionality} $f$. The \textit{user} is represented by the vector $p_u \in \mathbb{R}^{f}$ and the item by the vector $q_i \in \mathbb{R}^{f}$. As a result, the \textit{missing ratings} and thus the \textit{user-item interaction} are to be determined via the \textit{inner product} $\hat{r}_{ui}=q_i^Tp_u$ of the corresponding vectors \citep{Kor09}. In the following, the four most classical matrix factorization approaches are described in detail. Afterwards, the concrete learning methods with which the vectors are learned are presented. In addition, the \textit{training data} for which a \textit{concrete rating} is available should be referred to as $\mathcal{B} = \lbrace(u,i) | r_{ui} \in \mathcal{R}\rbrace$. \subsubsection{Basic Matrix-Factorization} -The first and easiest way to solve \textit{matrix-factorization} is to connect the \textit{feature vectors} of the \textit{users} and the \textit{items} using the \textit{inner product}. The result is the \textit{user-item interaction}. In addition, the \textit{error} should be as small as possible. Therefore, $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}}$ is defined as an associated \textit{minimization problem} for the \textit{RMSE}. +The first and easiest way to solve \textit{matrix-factorization} is to connect the \textit{feature vectors} of the \textit{users} and the \textit{items} using the \textit{inner product}. The result is the \textit{user-item interaction}. In addition, the \textit{error} should be as small as possible. Therefore, $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}}$ is defined as an associated \textit{minimization problem}. \subsubsection{Regulated Matrix-Factorization} -This problem extends the \textit{basic matrix factorization} by a \textit{regulation factor} $\lambda$ in the corresponding \textit{minimization problem}. Since $\mathcal{R}$ is thinly occupied, the effect of \textit{overfitting} may occur due to learning from the few known values. The problem with \textit{overfitting} is that the generated \textit{ratings} are too tight. To counteract this, the magnitudes of the previous vectors is taken into account. High magnitudes are punished by a factor $\lambda(\lVert q_i \rVert^2 + \lVert p_u \lVert^2)$ in the \textit{minimization problem}. Overall, the \textit{minimization problem} $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\lVert q_i \rVert^2 + \lVert p_u \lVert^2)$ is to be solved. +This problem extends the \textit{basic matrix-factorization} by a \textit{regulation factor} $\lambda$ in the corresponding \textit{minimization problem}. Since $\mathcal{R}$ is thinly occupied, the effect of \textit{overfitting} may occur due to learning from the few known values. The problem with \textit{overfitting} is that the generated \textit{ratings} are too tight. To counteract this, the magnitudes of the previous vectors is taken into account. High magnitudes are punished by a factor $\lambda(\lVert q_i \rVert^2 + \lVert p_u \lVert^2)$ in the \textit{minimization problem}. Overall, the \textit{minimization problem} $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\lVert q_i \lVert^2 + \lVert p_u \lVert^2)$ is to be solved. +The idea is that especially large entries in $q_i$ or $p_u$ cause $\lVert q_i \rVert, \lVert p_u \rVert$ to become larger. Accordingly, $\lVert q_i \rVert, \lVert p_u \rVert$ increases the larger its entries become. This value is then additionally punished by squaring it. Small values are rewarded and large values are penalized. Additionally the influence of this value can be regulated by $\lambda$. \subsubsection{Weighted Regulated Matrix-Factorization} A \textit{regulation factor} $\lambda$ is introduced in analogy to \textit{regulated matrix-factorization}. Additional \textit{weights} $\alpha$ and $\beta$ are introduced to take into account the individual magnitude of a vector. The \textit{minimization problem} then corresponds to $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\alpha\lVert q_i \rVert^2 + \beta\lVert p_u \lVert^2)$. @@ -39,4 +40,11 @@ In addition, the \textit{missing rating} is no longer determined only by the \te Furthermore, $b_u = \mu_u - \mu$ and $b_i = \mu_i - \mu$. Here $\mu_u$ denotes the \textit{average} of all \textit{assigned ratings} of the \textit{user} $u$. Similarly, $\mu_i$ denotes the \textit{average} of all \textit{received ratings} of an \textit{item} $i$. Thus $b_u$ indicates the \textit{deviation} of the \textit{average assigned rating} of a \textit{user} from the \textit{global average}. Similarly, $b_i$ indicates the \textit{deviation} of the \textit{average rating} of an item from the \textit{global average}. -In addition, the \textit{minimization problem} can be extended by the \textit{bias}. Accordingly, the \textit{minimization problem} is then $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\lVert q_i \rVert^2 + \lVert p_u \lVert^2 + b_u^2 + b_i^2)$. +In addition, the \textit{minimization problem} can be extended by the \textit{bias}. Accordingly, the \textit{minimization problem} is then $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\lVert q_i \rVert^2 + \lVert p_u \lVert^2 + b_u^2 + b_i^2)$. Analogous to the \textit{regulated matrix-factorization}, the values $b_u$ and $b_i$ are penalized in addition to $\lVert q_i \rVert, \lVert p_u \rVert$. In this case $b_u, b_i$ are penalized more if they assume a large value and thus deviate strongly from the \textit{global average}. + +\subsection{Advanced Matrix-Factorization} +This section is intended to show that there are \textit{other approaches} to \textit{matrix-factorization}. +Thus, \textit{implicit data} can also be included. +First of all, it should be mentioned that \textit{temporary dynamics} can also be included. +On the one hand, it is not realistic that a \textit{user} cannot change his taste. On the other hand, the properties of an \textit{item} remain constant. Therefore, \textit{missing ratings} can also be determined \textit{time-based}. A \textit{missing rating} is then determined by $\hat{r}_{ui}=\mu + b_i(t) + b_u(t) + q_i^{T}p_u(t)$ \citep{Kor09}. +As a second possibility, \textit{implicit influence} can be included. This can involve the \textit{properties} of the \textit{items} a \textit{user} is dealing with. A \textit{missing rating} can be determined by $\hat{r}_{ui}=\mu + b_i + b_u + q_i^{T}(p_u + |\mathcal{I}_u|^{-\frac{1}{2}}\sum_{i \in \mathcal{I}_u}{y_i})$. $y_i \in \mathbb{R}^{f}$ describes the \textit{feature vectors} of the \textit{items} $i \in \mathcal{I}_u$ which have been evaluated by \textit{user} $u$. The corresponding \textit{minimization problems} can be adjusted as mentioned in the sections above \citep{Kor08}. diff --git a/references.bib b/references.bib index df05aebe6ccf0e4450d68c164cd9d47b28fb820e..a391b459d67f74023cf0a68941ddf473250fcbb3 100644 --- a/references.bib +++ b/references.bib @@ -70,4 +70,13 @@ pages = {30-37}, title = {Matrix factorization techniques for recommender systems}, volume = {42}, journal = {Computer} +} +@inproceedings{Kor08, +author = {Koren, Yehuda}, +year = {2008}, +month = {08}, +pages = {426-434}, +title = {Factorization meets the neighborhood: A multifaceted collaborative filtering model}, +journal = {Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'08).}, +doi = {10.1145/1401890.1401944} } \ No newline at end of file diff --git a/submission.pdf b/submission.pdf index 40918daa327a14f5ad122b9dacdcf980edfa02d6..81b080d59625d2f5f6108bd185d0ba094e2df2d6 100644 Binary files a/submission.pdf and b/submission.pdf differ