diff --git a/recommender.tex b/recommender.tex index 36946e20d37e355bf9fd2ffbd1ab79f2200fa496..10bc2221d2cfd78ab037067940cc39c3f00c921b 100644 --- a/recommender.tex +++ b/recommender.tex @@ -20,7 +20,7 @@ Figure \ref{fig:cf} shows a sketch of the general operation of the \textit{colla \input{content-based-collaborative-filtering-comparison} -\subsection{Matrix-Factorization} +\subsection{Matrix-Factorization}\label{sec:mf} The core idea of \textit{matrix factorization} is to supplement the not completely filled out \textit{rating-matrix} $\mathcal{R}$. For this purpose the \textit{users} and \textit{items} are to be mapped to a joined \textit{latent feature space} with \textit{dimensionality} $f$. The \textit{user} is represented by the vector $p_u \in \mathbb{R}^{f}$ and the item by the vector $q_i \in \mathbb{R}^{f}$. As a result, the \textit{missing ratings} and thus the \textit{user-item interaction} are to be determined via the \textit{inner product} $\hat{r}_{ui}=q_i^Tp_u$ of the corresponding vectors \citep{Kor09}. In the following, the four most classical matrix factorization approaches are described in detail. Afterwards, the concrete learning methods with which the vectors are learned are presented. In addition, the \textit{training data} for which a \textit{concrete rating} is available should be referred to as $\mathcal{B} = \lbrace(u,i) | r_{ui} \in \mathcal{R}\rbrace$. \subsubsection{Basic Matrix-Factorization} @@ -48,3 +48,9 @@ Thus, \textit{implicit data} can also be included. First of all, it should be mentioned that \textit{temporary dynamics} can also be included. On the one hand, it is not realistic that a \textit{user} cannot change his taste. On the other hand, the properties of an \textit{item} remain constant. Therefore, \textit{missing ratings} can also be determined \textit{time-based}. A \textit{missing rating} is then determined by $\hat{r}_{ui}=\mu + b_i(t) + b_u(t) + q_i^{T}p_u(t)$ \citep{Kor09}. As a second possibility, \textit{implicit influence} can be included. This can involve the \textit{properties} of the \textit{items} a \textit{user} is dealing with. A \textit{missing rating} can be determined by $\hat{r}_{ui}=\mu + b_i + b_u + q_i^{T}(p_u + |\mathcal{I}_u|^{-\frac{1}{2}}\sum_{i \in \mathcal{I}_u}{y_i})$. $y_i \in \mathbb{R}^{f}$ describes the \textit{feature vectors} of the \textit{items} $i \in \mathcal{I}_u$ which have been evaluated by \textit{user} $u$. The corresponding \textit{minimization problems} can be adjusted as mentioned in the sections above \citep{Kor08}. + +\subsection{Optimization and Learning} +An important point that does not emerge from the above points is the question of how the individual components $p_u, q_i, b_u, b_i$ are constructed. In the following, the three most common methods are presented. + +\subsubsection{Stochastic Gradient Descent} +The best known and most common method when it comes to \textit{machine learning} is \textit{stochastic gradient descent (SGD)}. The goal of \textit{SGD} is to \textit{minimize} the \textit{error} of a given \textit{objective function}. Thus the estimators mentioned in section \ref{sec:mf} can be used as \textit{objective functions}. In the field of \textit{recommender systems}, \citet{Funk06} presented a \textit{modified} variant of \textit{SGD} in the context of the \textit{Netflix Challenge}. This can be applied to \textit{regulated matrix-factorization} with \textit{bias} as well as without \textit{bias}. This method can be described by the following pseudo code: \ No newline at end of file diff --git a/references.bib b/references.bib index a391b459d67f74023cf0a68941ddf473250fcbb3..88581aa95b68e0ed1a3a227e8ec49850d1c915fe 100644 --- a/references.bib +++ b/references.bib @@ -79,4 +79,23 @@ pages = {426-434}, title = {Factorization meets the neighborhood: A multifaceted collaborative filtering model}, journal = {Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'08).}, doi = {10.1145/1401890.1401944} +} +@inproceedings{Kor11, +author = {Yehuda Koren and Robert Bell}, +year = {2011}, +month = {01}, +pages = {145--186}, +title = {Advances in Collaborative Filtering}, +booktitle = {Recommender Systems Handbook}, +editor = {P.B. Kantor and F. Ricci and L. Rokach and B. Shapira}, +publisher={Springer}, +doi = {10.1007/978-0-387-85820-3_4} +} +@misc{Funk06, + author = {Simon Funk}, + title = {Netflix Update: Try This at Home}, + howpublished = {\url{https://sifter.org/~simon/journal/20061211.html}}, + note = {Accessed: 2019-12-12}, + year = {2006}, + month = {12} } \ No newline at end of file diff --git a/submission.pdf b/submission.pdf index 4255cb7c64b34e284853306b1928a10b18358a61..bec9a7f5524dbafdf42b131cc3625dd70b574300 100644 Binary files a/submission.pdf and b/submission.pdf differ diff --git a/submission.tex b/submission.tex index 6ec8f08af01ae4bee1d7b50762c8e6f9c4c44fb3..5fd0b2aa88c8da1ce7a595bc6fee7692cced1fef 100644 --- a/submission.tex +++ b/submission.tex @@ -48,7 +48,7 @@ \hypersetup{ colorlinks, citecolor=hhuUniBlau, - linkcolor=hhuUniBlau, + linkcolor=black, urlcolor=hhuUniBlau} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -62,6 +62,9 @@ A Study on Recommender Systems} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} \input{frontpage} +\newpage +\tableofcontents +\newpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Hier beginnt der Inhalt! % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%