Skip to content
Snippets Groups Projects
Commit 25952712 authored by Marc Feger's avatar Marc Feger
Browse files

Refactor

parent 55aed132
Branches
No related tags found
No related merge requests found
......@@ -41,7 +41,7 @@ For the actual execution of the experiment, the authors used the knowledge they
As the \textit{Netflix-Prize} showed, the use of \textit{implicit data} such as \textit{time} or \textit{dependencies} between \textit{users} or \textit{items} could immensely improve existing models. In addition to the two \textit{simple matrix factorizations}, \textit{table} \ref{table:models} shows the extensions of the authors regarding the \textit{bayesian approach}.
\input{model_table}
As it turned out that the \textit{bayesian approach} gave more promising results, the given models were trained with it. For this purpose, the \textit{dimensional embedding} as well as the \textit{number of sampling steps} for the models were examined again. Again the \textit{gaussian-distribution} was used for \textit{initialization} as indicated in \textit{section} \ref{sec:experiment_preparation}. \textit{Figure} \ref{fig:bayes_evaluation} shows the corresponding results.
As it turned out that the \textit{bayesian approach} gave more promising results, the given models were trained with it. For this purpose, the \textit{dimensional embedding} as well as the \textit{number of sampling steps} for the models were examined again. As indicated in \textit{section} \ref{sec:experiment_preparation}, the \textit{gaussian-distribution} was used for \textit{initialization}. \textit{Figure} \ref{fig:bayes_evaluation} shows the corresponding results.
\input{bayes_evaluation}
\subsection{Obeservations}
......
......@@ -2,21 +2,21 @@
\section{Critical Assessment}
With this paper \citet{Rendle19} addresses the highly experienced reader. The simple structure of the paper convinces by the clear and direct way in which the problem is identified. Additionally, the paper can be seen as an \textit{addendum} to the \textit{Netflix-Prize}.
The problem addressed by \citet{Rendle19} is already known from other topics like \textit{information-retrieval} and \textit{machine learning}. For example, \citet{Armstrong09} described the phenomenon in the context of \textit{information-retrieval systems}, that too \textit{weak baselines} are used. He also sees that \textit{experiments} are \textit{misinterpreted} by giving \textit{misunderstood indicators} such as \textit{statistical significance}. In addition, \citet{Armstrong09} also sees that the \textit{information-retrieval community} lacks an adequate overview of results. In this context, he proposes a collection of works that is reminiscent of the \textit{Netflix-Leaderboard}. \citet{Lin19} also observed the problem of \textit{baselines} for \textit{neural-networks} that are \textit{too weak}. Likewise, the actual observation that \textit{too weak baselines} exist due to empirical evaluation is not unknown in the field of \textit{recommender systems}. \citet{Ludewig18} already observed the same problem for \textit{session-based recommender systems}. Such systems only work with data generated during a \textit{session} and try to predict the next \textit{user} selection. They also managed to achieve better results using \textit{session-based matrix-factorization}, which was inspired by the work of \citet{Rendle09} and \citet{Rendle10}. The authors see the problem in the fact that there are \textit{too many datasets} and \textit{different measures} of evaluation for \textit{scientific work}. In addition, \citet{Dacrema19} take up the problem addressed by \citet{Lin19} and shows that \textit{neural approaches} to solving the \textit{recommender-problem} can also be beaten by simplest methods. They see the main problem in the \textit{reproducibility} of publications and suggest a \textit{rethinking} in the \textit{verification} of results in this field of work. Furthermore, they do not refrain from taking a closer look at \textit{matrix-factorization} in this context.
The problem addressed by \citet{Rendle19} is already known from other topics like \textit{information-retrieval} and \textit{machine learning}. For example, \citet{Armstrong09} described the phenomenon in the context of \textit{information-retrieval systems}, that too \textit{weak baselines} are used. He also sees that \textit{experiments} are \textit{misinterpreted} by giving \textit{misunderstood indicators} such as \textit{statistical significance}. In addition, \citet{Armstrong09} also sees that the \textit{information-retrieval community} lacks an adequate overview of results. In this context, he proposes a collection of works that is reminiscent of the \textit{Netflix-Leaderboard}. \citet{Lin19} also observed the problem of \textit{baselines} for \textit{neuronal-networks} that are \textit{too weak}. Likewise, the actual observation that \textit{too weak baselines} exist due to empirical evaluation is not unknown in the field of \textit{recommender systems}. \citet{Ludewig18} already observed the same problem for \textit{session-based recommender systems}. Such systems only work with data generated during a \textit{session} and try to predict the next \textit{user} selection. They also managed to achieve better results using \textit{session-based matrix-factorization}, which was inspired by the work of \citet{Rendle09} and \citet{Rendle10}. The authors see the problem in the fact that there are \textit{too many datasets} and \textit{different measures} of evaluation for \textit{scientific work}. In addition, \citet{Dacrema19} take up the problem addressed by \citet{Lin19} and shows that \textit{neural approaches} to solving the \textit{recommender-problem} can also be beaten by simplest methods. They see the main problem in the \textit{reproducibility} of publications and suggest a \textit{rethinking} in the \textit{verification} of results in this field of work. Furthermore, they do not refrain from taking a closer look at \textit{matrix-factorization} in this context.
Compared to the listed work, it is not unknown that in some subject areas \textit{baselines} are \textit{too weak} and lead to \textit{stagnant development}. Especially when considering that \textit{information-retrieval} and \textit{machine learning} are the \textit{cornerstones} of \textit{recommender systems} it is not surprising to observe similar phenomena. Nevertheless, the work published by \citet{Rendle19} stands out from the others. Using the insights gained during the \textit{Netflix-Prize}, he underlines the problem of the \textit{lack of standards} and \textit{unity} for \textit{scientific experiments} in the work mentioned above.
However, the work published by \citet{Rendle19} also clearly stands out from the above-mentioned work. In contrast to them, not only the problem for the \textit{MovieLens10M-dataset} in combination with \textit{matrix-factorization} is recognized. Rather, the problem is brought one level higher. Thus, it succeeds in gaining a global and reflected but still distanced view of the \textit{best practice} in the field of \textit{recommender systems}.
Besides calling for \textit{uniform standards}, \citet{Rendle19} criticizes the way the \textit{scientific community} thinks. \citet{Rendle19} recognizes the \textit{publication-bias} addressed by \citet{Sterling59}. The so-called \textit{publication-bias} describes the problem that there is a \textit{statistical distortion} of the data situation within a \textit{scientific topic area}, since only successful or modern papers are published. \citet{Rendle19} clearly abstracts this problem from the presented experiment. The authors see the problem in the fact that a scientific paper is subject to a \textit{pressure to perform} which is based on the \textit{novelty} of such a paper. This thought can be transferred to the \textit{file-drawer-problem} described by \citet{Rosenthal79}. This describes the problem that many \textit{scientists} do not publish their work and, out of concern about not meeting the \textit{publication standards} such as \textit{novelty} or the question of the \textit{impact on the community}, do not submit their results at all and prefer to \textit{keep them in a drawer}. Although the problems mentioned above are not directly addressed, they can be abstracted due to the detailed presentation. In contrast to the other works, this way a wanted or unwanted abstraction and naming of concrete and comprehensible problems is achieved.
Nevertheless, criticism must also be made of the work published by \citet{Rendle19}. Despite the high standard of the work, it must be said that the problems mentioned above can be identified but are not directly addressed by the authors. The work of \citet{Rendle19} even lacks an embedding in the context above. Thus, the experienced reader who is familiar with the problems addressed by \citet{Armstrong09}, \citet{Sterling59} and \citet{Rosenthal79} becomes aware of the contextual and historical embedding and value of the work. In contrast, \citet{Lin19} and \citet{Dacrema19}, published in the same period, succeed in this embedding in the contextual problem and in the previous work. Moreover, it is questionable whether the problem addressed can actually lead to a change in \textit{long-established thinking}. Especially if one takes into account that many scientists are also investigating the \textit{transferability} of new methods to the \textit{recommender problem}. Thus, the call for research into \textit{better baselines} must be viewed from two perspectives. On the one hand, it must be noted that \textit{too weak baselines} can lead to a false understanding of new methods. On the other hand, it must also be noted that this could merely trigger the numerical evaluation in a competitive process to find the best method, as was the case with the \textit{Netflix-Prize}. However, in the spirit of \citet{Sculley18}, it should always be remembered that: \textit{"the goal of science is not wins, but knowledge"}.
Nevertheless, criticism must also be made of the work published by \citet{Rendle19}. Despite the high standard of the work, it must be said that the problems mentioned above can be identified but are not directly addressed by the authors. The work of \citet{Rendle19} even lacks an embedding in the context above. Thus, the experienced reader who is familiar with the problems addressed by \citet{Armstrong09}, \citet{Sterling59} and \citet{Rosenthal79} becomes aware of the contextual and historical embedding and value of the work. In contrast, \citet{Lin19} and \citet{Dacrema19}, published in the same period, succeed in this embedding in the contextual problem and in the previous work. Moreover, it is questionable whether the problem addressed can actually lead to a change in \textit{long-established thinking}. Especially if one takes into account that many scientists are also investigating the \textit{transferability} of new methods to the \textit{recommender problem}. Thus, the call for research into \textit{better baselines} must be viewed from two perspectives. On the one hand, it must be noted that \textit{too weak baselines} can lead to a false understanding of new methods. On the other hand, it must also be noted that this could merely trigger the numerical evaluation in a competitive process to find the best method, as was it the case with the \textit{Netflix-Prize}. However, in the spirit of \citet{Sculley18}, it should always be remembered that: \textit{"the goal of science is not wins, but knowledge"}.
As the authors \citet{Rendle} and \citet{Koren} were significantly \textit{involved} in this competition, the points mentioned above are convincing by the experience they have gained. With their results they support the very simple but not trivial statement that finding good \textit{baselines} requires an \textit{immense effort} and this has to be \textit{promoted} much more in a \textit{scientific context}. This implies a change in the \textit{long-established thinking} about the evaluation of scientific work. At this point it is questionable whether it is possible to change existing thinking. This should be considered especially because the scientific sector, unlike the industrial sector, cannot provide financial motivation due to limited resources. On the other hand, it must be considered that the individual focus of a work must also be taken into account. Thus, it is \textit{questionable} whether the \textit{scientific sector} is able to create such a large unit with regard to a \textit{common goal} as \textit{Netflix} did during the competition.
It should be clearly emphasized that it is immensely important to use sharp \textit{baselines} as guidelines. However, in a \textit{scientific context} the \textit{goal} is not as \textit{precisely defined} as it was in the \textit{Netflix-Prize}. Rather, a large part of the work is aimed at investigating whether new methods such as \textit{neural networks} etc. are applicable to the \textit{recommender problem}.
Regarding the results, however, it has to be said that they clearly support a \textit{rethinking} even if this should only concern a \textit{small part} of the work.
On the website \textit{Papers with Code}\footnote{\url{https://paperswithcode.com/sota/collaborative-filtering-on-movielens-10m}} the \textit{public leaderboard} regarding the results obtained on the \textit{MovieLens10M-dataset} can be viewed. The source analysis of \textit{Papers with Code} also identifies the results given by \citet{Rendle19} as leading.
In addition, \textit{future work} should focus on a more \textit{in-depth source analysis} which, besides the importance of the \textit{MovieLens10M-dataset} for the \textit{scientific community}, also examines whether and to what extent \textit{other datasets} are affected by this phenomenon.
Due to the recent publication in spring \textit{2019}, this paper has not yet been cited frequently. So time will tell what impact it will have on the \textit{community}.
In addition, \textit{future work} should be focused on a more \textit{in-depth source analysis} which, besides the importance of the \textit{MovieLens10M-dataset} for the \textit{scientific community}, also examines whether and to what extent \textit{other datasets} are affected by this phenomenon.
Due to the recent publication in spring \textit{2019}, this paper has not yet been cited frequently. So time will tell, what impact it will have on the \textit{community}.
Nevertheless, \citet{Dacrema2019} was able to base his own work on this article and expand it.
According to this, \citet{Rendle} seems to have recognized an elementary and unseen problem and made it public.
......
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment