Refactor

68501da5 · Marc Feger · 5d64ec7d · 68501da5 · 68501da5 · 68501da5
Commit 68501da5 authored Jan 28, 2020 by Marc Feger
--- a/baselines.tex
+++ b/baselines.tex
@@ -15,7 +15,7 @@ Thus, \citet{Zh08} presented an \textit{ALS variant} with an \textit{RMSE} of \t
 The \textit{Netflix-Prize} made it clear that even the \textit{simplest methods} are \textit{not trivial} and that a \textit{reasonable investigation} and \textit{evaluation requires} an \textit{immense effort} from within the \textit{community}.
 \subsubsection{MovieLens}
-In the \textit{non-commercial sector} of \textit{recommender systems} the \textit{MovieLens10M-dataset} is mostly used. It consists of \textit{10.000.054 elements} and was published by the research group \textit{GroupLens} in \textit{2009} \citep{Harper15}. In most cases a \textit{global} and \textit{random} \textit{90:10 split} of the data is used to evaluate the \textit{RMSE}. This means that through a \textit{random selection 90\%} of the data is used for \textit{training} and \textit{10\%} of the remaining data is used for \textit{testing}. Over the last \textit{five years} a large number of algorithms on this dataset have been evaluated and the results have been published on \textit{well-known convergences} such as \textit{ICML}, \textit{NeurIPS}, \textit{WWW}, \textit{SIGIR} and \textit{AAAI}. \textit{Figure} \ref{fig:reported_results} shows the \textit{results obtained} over the last \textit{five years} on the \textit{MovieLens10M-dataset}.
+In the \textit{non-commercial sector} of \textit{recommender systems} the \textit{MovieLens10M-dataset} is mostly used. It consists of \textit{10.000.054 elements} and was published by the research group \textit{GroupLens} in \textit{2009} \citep{Harper15}. In most cases a \textit{global} and \textit{random} \textit{90:10 split} of the data is used to evaluate the \textit{RMSE}. This means that through a \textit{random selection 90\%} of the data is used for \textit{training} and \textit{10\%} of the remaining data is used for \textit{testing}. Over the last \textit{five years} a large number of algorithms on this dataset have been evaluated and the results have been published on \textit{well-known conferences} such as \textit{ICML}, \textit{NeurIPS}, \textit{WWW}, \textit{SIGIR} and \textit{AAAI}. \textit{Figure} \ref{fig:reported_results} shows the \textit{results obtained} over the last \textit{five years} on the \textit{MovieLens10M-dataset}.
 It can be clearly stated that the \textit{existing baselines} have been \textit{beaten} and \textit{newer methods} have made \textit{steady progress}.
 \input{reported_results}

--- a/critical_assessment.tex
+++ b/critical_assessment.tex
@@ -2,12 +2,13 @@
 \section{Critical Assessment}
 With this paper \citet{Rendle19} addresses the highly experienced reader. The simple structure of the paper convinces by the clear and direct way in which the problem is identified. Additionally, the paper can be seen as an \textit{addendum} to the \textit{Netflix-Prize}. 
-The problem addressed by \citet{Rendle19} is already known from other topics like \textit{information-retrieval} and \textit{machine learning}. For example, \citet{Armstrong09} described the phenomenon observed by \citet{Rendle19} in the context of \textit{information-retrieval systems} that too \textit{weak baselines} are used. He also sees that \textit{experiments} are \textit{misinterpreted} by giving \textit{misunderstood indicators} such as \textit{statistical significance}. In addition, \citet{Armstrong09} also sees that the \textit{information-retrieval community} lacks an adequate overview of results. In this context, he proposes a collection of works that start reminiscent of the \textit{Netflix-Leaderboard}. \citet{Lin19} also observed the problem of \textit{baselines} for \textit{neural-networks} that are \textit{too weak}. Likewise, the actual observation that \textit{too weak baselines} exist due to empirical evaluation is not unknown in the field of \textit{recommender systems}. \citet{Ludewig18} already observed the same problem for \textit{session-based recommender systems}. Such systems only work with data generated during a \textit{session} and try to predict the next \textit{user} selection. They also managed to achieve better results using \textit{session-based matrix-factorization}, which was inspired by the work of \citet{Rendle09} and \citet{Rendle10}. The authors see the problem in the fact that there are \textit{too many datasets} and \textit{different measures} of evaluation for \textit{scientific work}. In addition, \citet{Dacrema19} take up the problem addressed by \citet{Lin19} and show that \textit{neural approaches} to solving the \textit{recommender-problem} can also be beaten by simplest methods. They see the main problem in the \textit{reproducibility} of publications and suggest a \textit{rethinking} in the \textit{verification} of results in this field of work. Furthermore, they do not refrain from taking a closer look at \textit{matrix-factorization} in this context.
+The problem addressed by \citet{Rendle19} is already known from other topics like \textit{information-retrieval} and \textit{machine learning}. For example, \citet{Armstrong09} described the phenomenon observed by \citet{Rendle19} in the context of \textit{information-retrieval systems} that too \textit{weak baselines} are used. He also sees that \textit{experiments} are \textit{misinterpreted} by giving \textit{misunderstood indicators} such as \textit{statistical significance}. In addition, \citet{Armstrong09} also sees that the \textit{information-retrieval community} lacks an adequate overview of results. In this context, he proposes a collection of works that is reminiscent of the \textit{Netflix-Leaderboard}. \citet{Lin19} also observed the problem of \textit{baselines} for \textit{neural-networks} that are \textit{too weak}. Likewise, the actual observation that \textit{too weak baselines} exist due to empirical evaluation is not unknown in the field of \textit{recommender systems}. \citet{Ludewig18} already observed the same problem for \textit{session-based recommender systems}. Such systems only work with data generated during a \textit{session} and try to predict the next \textit{user} selection. They also managed to achieve better results using \textit{session-based matrix-factorization}, which was inspired by the work of \citet{Rendle09} and \citet{Rendle10}. The authors see the problem in the fact that there are \textit{too many datasets} and \textit{different measures} of evaluation for \textit{scientific work}. In addition, \citet{Dacrema19} take up the problem addressed by \citet{Lin19} and shows that \textit{neural approaches} to solving the \textit{recommender-problem} can also be beaten by simplest methods. They see the main problem in the \textit{reproducibility} of publications and suggest a \textit{rethinking} in the \textit{verification} of results in this field of work. Furthermore, they do not refrain from taking a closer look at \textit{matrix-factorization} in this context.
 Compared to the listed work, it is not unknown that in some subject areas \textit{baselines} are \textit{too weak} and lead to \textit{stagnant development}. Especially when considering that \textit{information-retrieval} and \textit{machine learning} are the \textit{cornerstones} of \textit{recommender systems} it is not surprising to observe similar phenomena. Nevertheless, the work published by \citet{Rendle19} stands out from the others. Using the insights gained during the \textit{Netflix-Prize}, he underlines the problem of the \textit{lack of standards} and \textit{unity} for \textit{scientific experiments} in the work mentioned above.
 However, the work published by \citet{Rendle19} also clearly stands out from the above-mentioned work. In contrast to them, not only the problem for the \textit{MovieLens10M-dataset} in combination with \textit{matrix-factorization} is recognized. Rather, the problem is brought one level higher. Thus, it succeeds in gaining a global and reflected but still distanced view of the \textit{best practice} in the field of \textit{recommender systems}.
 Besides calling for \textit{uniform standards}, \citet{Rendle19} criticizes the way the \textit{scientific community} thinks. \citet{Rendle19} recognizes the \textit{publication-bias} addressed by \citet{Sterling59}. The so-called \textit{publication-bias} describes the problem that there is a \textit{statistical distortion} of the data situation within a \textit{scientific topic area}, since only successful or modern papers are published. \citet{Rendle19} clearly abstracts this problem from the presented experiment. The authors see the problem in the fact that a scientific paper is subject to a \textit{pressure to perform} which is based on the \textit{novelty} of such a paper. This thought can be transferred to the \textit{file-drawer-problem} described by \citet{Rosenthal79}. This describes the problem that many \textit{scientists} do not publish their work and, out of concern about not meeting the \textit{publication standards} such as \textit{novelty} or the question of the \textit{impact on the community}, do not submit their results at all and prefer to \textit{keep them in a drawer}. Although the problems mentioned above are not directly addressed, they can be abstracted due to the detailed presentation. In contrast to the other works, this way a wanted or unwanted abstraction and naming of concrete and comprehensible problems is achieved.
-Nevertheless, criticism must also be made of the work published by \citet{Rendle19}. Despite the high standard of the work, it must be said that the problems mentioned above can be identified but are not directly addressed by the authors. The work of \citet{Rendle19} even lacks an embedding in the context above. Thus, the experienced reader who is familiar with the problems addressed by \citet{Armstrong09}, \citet{Sterling59} and \citet{Rosenthal79} becomes aware of the contextual and historical embedding and value of the work. In contrast, \citet{Lin19}, published in the same period, succeeds in this embedding in the contextual problem and in the previous work. Moreover, it is questionable whether the problem addressed can actually lead to a change in \textit{long-established thinking}. Especially if one takes into account that many scientists are also investigating the \textit{transferability} of new methods to the \textit{recommender problem}. Thus, the call for research into \textit{better baselines} must be viewed from two perspectives. On the one hand, it must be noted that \textit{too weak baselines} can lead to a false understanding of new methods. On the other hand, however, it must also be noted that this could merely trigger the numerical evaluation in a competitive process to find the best method, as was the case with the \textit{Netflix-Prize}. However, in the spirit of \citet{Sculley18}, it should always be remembered that: \textit{"the goal of science is not wins, but knowledge"}.
+Nevertheless, criticism must also be made of the work published by \citet{Rendle19}. Despite the high standard of the work, it must be said that the problems mentioned above can be identified but are not directly addressed by the authors. The work of \citet{Rendle19} even lacks an embedding in the context above. Thus, the experienced reader who is familiar with the problems addressed by \citet{Armstrong09}, \citet{Sterling59} and \citet{Rosenthal79} becomes aware of the contextual and historical embedding and value of the work. In contrast, \citet{Lin19}, published in the same period, succeeds in this embedding in the contextual problem and in the previous work. Moreover, it is questionable whether the problem addressed can actually lead to a change in \textit{long-established thinking}. Especially if one takes into account that many scientists are also investigating the \textit{transferability} of new methods to the \textit{recommender problem}. Thus, the call for research into \textit{better baselines} must be viewed from two perspectives. On the one hand, it must be noted that \textit{too weak baselines} can lead to a false understanding of new methods. On the other hand, it must also be noted that this could merely trigger the numerical evaluation in a competitive process to find the best method, as was the case with the \textit{Netflix-Prize}. However, in the spirit of \citet{Sculley18}, it should always be remembered that: \textit{"the goal of science is not wins, but knowledge"}.
 As the authors \citet{Rendle} and \citet{Koren} were significantly \textit{involved} in this competition, the points mentioned above are convincing by the experience they have gained. With their results they support the very simple but not trivial statement that finding good \textit{baselines} requires an \textit{immense effort} and this has to be \textit{promoted} much more in a \textit{scientific context}. This implies a change in the \textit{long-established thinking} about the evaluation of scientific work. At this point it is questionable whether it is possible to change existing thinking. This should be considered especially because the scientific sector, unlike the industrial sector, cannot provide financial motivation due to limited resources. On the other hand, it must be considered that the individual focus of a work must also be taken into account. Thus, it is \textit{questionable} whether the \textit{scientific sector} is able to create such a large unit with regard to a \textit{common goal} as \textit{Netflix} did during the competition.
 It should be clearly emphasized that it is immensely important to use sharp \textit{baselines} as guidelines. However, in a \textit{scientific context} the \textit{goal} is not as \textit{precisely defined} as it was in the \textit{Netflix-Prize}. Rather, a large part of the work is aimed at investigating whether new methods such as \textit{neural networks} etc. are applicable to the \textit{recommender problem}.

--- a/recommender.tex
+++ b/recommender.tex
@@ -13,7 +13,7 @@ In the following, the two main approaches of \textit{collaborative-filtering} an
 \subsection{Collaborative-Filtering}
 Unlike the \textit{content-based recommender (CF)}, the \textit{collaborative-filtering recommender} not only considers individual \textit{users} and \textit{feature vectors}, but rather a \textit{like-minded neighborhood} of each \textit{user}.
 Missing \textit{user ratings} can be extracted by this \textit{neighbourhood} and \textit{networked} to form a whole. It is assumed that a \textit{missing rating} of the considered \textit{user} for an unknown \textit{item} $i$ will be similar to the \textit{rating} of a \textit{user} $v$ as soon as $u$ and $v$ have rated some \textit{items} similarly. The similarity of the \textit{users} is determined by the \textit{community ratings}. This type of \textit{recommender system} is also known by the term \textit{neighborhood-based recommender} \citep{DeKa11}. The main focus of \textit{neighbourhood-based methods} is on the application of iterative methods such as \textit{k-nearest-neighbours} or \textit{k-means}.
-A \textit{neighborhood-based recommender} can be viewed from two angles: The first and best known problem is the so-called \textit{user-based prediction}.  Here, the \textit{missing ratings} of a considered \textit{user} $u$ are to be determined from his \textit{neighborhood} $\mathcal{N}_i(u)$. 
+A \textit{neighborhood-based recommender} can be viewed from two perspetives: The first and best known problem is the so-called \textit{user-based prediction}.  Here, the \textit{missing ratings} of a considered \textit{user} $u$ are to be determined from his \textit{neighborhood} $\mathcal{N}_i(u)$. 
 $\mathcal{N}_i(u)$ denotes the subset of the \textit{neighborhood} of all \textit{users} who have a similar manner of evaluation to $u$ via the \textit{item} $i$. The second problem is that of \textit{item-based prediction}. Analogously, the similarity of the \textit{items} are determined by their received \textit{ratings}.
 This kind of problem consideres the \textit{neighborhood} $\mathcal{N}_u(i)$ of all \textit{items} $i$ which were similar rated via the \textit{user} $u$. The similarity between the objects of a \textit{neighborhood} is determined by \textit{distance functions} such as \textit{mean-squared-difference}, \textit{pearson-correlation} or \textit{cosine-similarity}.
 Figure \ref{fig:cf} shows a sketch of the general operation of \textit{collaborative-filtering} \textit{recommender}.
@@ -21,20 +21,22 @@ Figure \ref{fig:cf} shows a sketch of the general operation of \textit{collabora
 \input{content-based-collaborative-filtering-comparison}
 \subsection{Matrix-Factorization}\label{sec:mf}
-The core idea of \textit{matrix-factorization} is to supplement the not completely filled out \textit{rating-matrix} $\mathcal{R}$. For this purpose the \textit{users} and \textit{items} are to be mapped to a joined \textit{latent feature space} with \textit{dimensionality} $f$. The \textit{user} is represented by the vector $p_u \in \mathbb{R}^{f}$ and the \textit{item} by the vector $q_i \in \mathbb{R}^{f}$. As a result, the \textit{missing ratings} and thus the \textit{user-item interaction} are to be determined via the \textit{inner product} $\hat{r}_{ui}=q_i^Tp_u$ of the corresponding vectors \citep{Kor09}. In the following, the four most classical \textit{matrix-factorization} approaches are described in detail. Afterwards, the concrete learning methods with which the vectors are learned are presented. In addition, the \textit{training data} for which a \textit{concrete rating} is available should be referred to as $\mathcal{B} = \lbrace(u,i) | r_{ui} \in \mathcal{R}\rbrace$.
+The core idea of \textit{matrix-factorization} is to supplement the not completely filled out \textit{rating-matrix} $\mathcal{R}$. For this purpose the \textit{users} and \textit{items} are to be mapped to a joined \textit{latent feature space} with \textit{dimensionality} $f$. The \textit{user} is represented by the vector $p_u \in \mathbb{R}^{f}$ and the \textit{item} by the vector $q_i \in \mathbb{R}^{f}$. As a result, the \textit{missing ratings} and thus the \textit{user-item interaction} are to be determined via the \textit{inner product} $\hat{r}_{ui}=q_i^Tp_u$ of the corresponding vectors \citep{Kor09}. 
+\newpage
+In the following, the four most classical \textit{matrix-factorization} approaches are described in detail. Afterwards, the concrete learning methods with which the vectors are learned are presented. In addition, the \textit{training data} for which a \textit{concrete rating} is available should be referred to as $\mathcal{B} = \lbrace(u,i) | r_{ui} \in \mathcal{R}\rbrace$.
 \subsubsection{Basic Matrix-Factorization}
 The first and easiest way to solve \textit{matrix-factorization} is to connect the \textit{feature vectors} of the \textit{users} and the \textit{items} using the \textit{inner product}. The result is the \textit{user-item interaction}. In addition, the \textit{error} should be as small as possible. Therefore, $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}}$  is defined as an associated \textit{minimization problem}.
 \subsubsection{Regulated Matrix-Factorization}\label{subsec:rmf}
 This problem extends the \textit{basic matrix-factorization} by a \textit{regulation factor} $\lambda$ in the corresponding \textit{minimization problem}. Since $\mathcal{R}$ is thinly occupied, the effect of \textit{overfitting} may occur due to learning from the few known values. The problem with \textit{overfitting} is that the generated \textit{ratings} are too tight. To counteract this, the magnitudes of the previous vectors is taken into account. High magnitudes are punished by a factor $\lambda(\lVert q_i \rVert^2 + \lVert p_u \lVert^2)$ in the \textit{minimization problem}. Overall, the \textit{minimization problem} $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\lVert q_i \lVert^2 + \lVert p_u \lVert^2)$ is to be solved.
-The idea is that especially large entries in $q_i$ or $p_u$ cause $\lVert q_i \rVert, \lVert p_u \rVert$ to become larger. Accordingly, $\lVert q_i \rVert, \lVert p_u \rVert$ increases the larger its entries become. This value is then additionally punished by squaring it. Small values are rewarded and large values are penalized. Additionally the influence of this value can be regulated by $\lambda$.
+The idea is that especially large entries in $q_i$ or $p_u$ cause $\lVert q_i \rVert, \lVert p_u \rVert$ to become larger. Accordingly, $\lVert q_i \rVert$ and $\lVert p_u \rVert$ increases the larger its entries become. This value is then additionally punished by squaring it. Small values are rewarded and large values are penalized. Additionally the influence of this value can be regulated by $\lambda$.
 \subsubsection{Weighted Regulated Matrix-Factorization}
 A \textit{regulation factor} $\lambda$  is introduced in analogy to \textit{regulated matrix-factorization}. Additional \textit{weights} $\alpha$ and $\beta$ are introduced to take into account the individual magnitude of a vector. The \textit{minimization problem} then corresponds to $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\alpha\lVert q_i \rVert^2 + \beta\lVert p_u \lVert^2)$.
 \subsubsection{Biased Matrix-Factorization}\label{subsec:bmf}
-A major advantage of \textit{matrix-factorization} is the ability to model simple relationships according to the application. Thus, an excellent data source cannot always be assumed. Due to the \textit{natural interaction} of the \textit{users} with the \textit{items}, \textit{preferences} arise. Such \textit{preferences} lead to \textit{behaviour patterns} which manifest themselves in the form of a \textit{bias} in the data. In principle, a \textit{bias} is not bad, but it must be taken into account when modeling the \textit{recommender system}.
+A major advantage of \textit{matrix-factorization} is the ability to model simple relationships according to the application. Thus, an excellent data source cannot always be assumed. Due to the \textit{natural interaction} of the \textit{users} with the \textit{items}, \textit{preferences} arise. Such \textit{preferences} lead to \textit{behaviour patterns} which manifest themselves in the form of a \textit{bias} in the data. A \textit{bias} is not bad overall, but it must be taken into account when modeling the \textit{recommender system}.
 The most popular model that takes \textit{bias} into account is called \textit{biased matrix-factorization}.
 In addition, the \textit{missing rating} is no longer determined only by the \textit{inner product} of the two vectors $q_i$ and $p_u$. Rather, the \textit{bias} is also considered. Accordingly, a \textit{missing rating} is calculated by $\hat{r}_{ui} = b_{ui} + q_i^Tp_u$, where $b_{ui}$ is the \textit{bias} of a \textit{user} $u$ and an \textit{item} $i$. The \textit{bias} is determined by $b_{ui}=\mu + b_u + b_i$. The parameter $\mu$ is the \textit{global average} of all \textit{ratings} $r_{ui} \in \mathcal{R}$.
 Furthermore, $b_u = \mu_u - \mu$ and $b_i = \mu_i - \mu$.
@@ -50,7 +52,7 @@ On the one hand, it is not realistic that a \textit{user} cannot change his tast
 As a second possibility, \textit{implicit influence} can be included. This can involve the \textit{properties} of the \textit{items} a \textit{user} is dealing with. A \textit{missing rating} can be determined by $\hat{r}_{ui}=\mu + b_i + b_u + q_i^{T}(p_u + |\mathcal{I}_u|^{-\frac{1}{2}}\sum_{i \in \mathcal{I}_u}{y_i})$. $y_i \in \mathbb{R}^{f}$ describes the \textit{feature vectors} of the \textit{items} $i \in \mathcal{I}_u$ which have been evaluated by \textit{user} $u$. The corresponding \textit{minimization problems} can be adjusted as mentioned in the sections above \citep{Kor08}.
 \subsection{Optimization and Learning}
-An important point that does not emerge from the above points is the question of how the individual components $p_u, q_i, b_u, b_i$ are constructed. In the following, the three most common methods are presented.
+An important point that does not emerge from the above sections is the question of how the individual components $p_u, q_i, b_u, b_i$ are constructed. In the following, the three most common methods are presented.
 \subsubsection{Stochastic Gradient Descent}
 \label{sec:sgd}

--- a/submission.pdf
+++ b/submission.pdf
--- a/submission.tex
+++ b/submission.tex
@@ -56,7 +56,7 @@
 % Diese Felder ausfüllen!                                        %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \newcommand{\Datum}{\today}
-\newcommand{\bearbeiter}{Marc Feger B.Sc.}
+\newcommand{\bearbeiter}{Marc Feger, B.Sc.}
 \newcommand{\titel}{On the Diffculty of Evaluating Baselines\\
 A Study on Recommender Systems}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%