diff --git a/baselines.tex b/baselines.tex index 1a876f56dd715d300f641a5414725a3a73b9364b..8ec20f3ae40bf494e779d98807ddbaa0cec3d680 100644 --- a/baselines.tex +++ b/baselines.tex @@ -3,6 +3,16 @@ This section reviews the \textit{main part} of the work represented by \citet{Re \subsection{Motivation and Background} As in many other fields of \textit{data-science}, a valid \textit{benchmark-dataset} is required for a proper execution of experiments. In the field of \textit{recommender systems}, the best known \textit{datasets} are the \textit{Netflix-} and \textit{MovieLens-dataset}. This section introduces both \textit{datasets} and shows the relationship of \citet{Koren}, one of the authors of this paper, to the \textit{Netflix-Prize}, in addition to the existing \textit{baselines}. \subsubsection{Netflix-Prize} +The topic of \textit{recommender systems} was first properly promoted and made known by the \textit{Netflix-Prize}. On \textit{October 2nd 2006}, the competition announced by \textit{Netflix} began with the \textit{goal} of beating the self-developed \textit{recommender system Cinematch} with an \textit{RMSE} of \textit{0.9514} by at least \textit{10\%}. +In total, the \textit{Netflix-dataset} was divided into three parts that can be grouped into two categories: \textit{training} and \textit{qualification}. In addition to a \textit{probe-dataset} for \textit{training} the algorithms, two further datasets were retained to qualify the winners. The \textit{quiz-dataset} was then used to calculate the \textit{score} of the \textit{submitted solutions} on the \textit{public leaderboard}. In contrast, the \textit{test-dataset} was used to determine the \textit{actual winners}. Each of the pieces had around \textit{1.408.000 data} and \textit{similar statistical values}. By splitting the data in this way, it was possible to ensure that an improvement could not be achieved by \textit{simple hill-climbing-algorithms}. +It took a total of \textit{three years} and \textit{several hundred models} until the team \textit{"BellKor`s Pragmatic Chaos"} was chosen as the \textit{winner} on \textit{21st September 2009}. They had managed to achieve an \textit{RMSE} of \textit{0.8554} and thus an \textit{improvement} of \textit{0.096}. Such a result is extraordinary excellent, because it took \textit{one year} of work and intensive research to reduce the \textit{RMSE} from \textit{0.8712 (progress award 2007)} to \textit{0.8616 (progress award 2008)}. +The \textit{co-author} of the present paper, \citet{Koren}, was significantly involved in the work of this team. Since the beginning of the event, \textit{matrix-factorization methods} have been regarded as promising approaches. Even with the simplest \textit{SVD} methods, \textit{RMSE values} of \textit{0.94} could be achieved by \citet{Kurucz07}. +The \textit{breakthrough} came through \citet{Funk06} who achieved an \textit{RMSE} of \textit{0.93} with his \textit{FunkSVD}. +Based on this, more and more work has been invested in the research of simple \textit{matrix-factorization methods}. +Thus, \citet{Zh08} presented an \textit{ALS variant} with an \textit{RMSE} of \textit{0.8985} and \citet{Koren09} presented an \textit{SGD variant} with \textit{RMSE 0.8995}. +\textit{Implicit data} were also used. For example, \citet{Koren09} could also achieve an \textit{RMSE} of \textit{0.8762} by extending \textit{SVD++} with a \textit{time variable}. This was then called \textit{timeSVD++}. + +The \textit{Netflix-Prize} made it clear that even the \textit{simplest methods} are \textit{not trivial} and that a \textit{reasonable investigation} and \textit{evaluation requires} an \textit{immense effort} from within the \textit{community}. \subsubsection{MovieLens} \subsection{Experiment Realization} \subsubsection{Experiment Preparation} diff --git a/references.bib b/references.bib index fe1e2b8e32226b4a315ac54794eab077c48bd804..8dc61fa9ce8a9b89628ac2301f4225e6b77c4187 100644 --- a/references.bib +++ b/references.bib @@ -122,4 +122,19 @@ doi = {10.1145/1390156.1390267} title = {Biography of Yehuda Koren}, howpublished = {\url{https://ieeexplore.ieee.org/author/37414256700}}, note = {Accessed: 2019-12-21}, +} +@article{Kurucz07, +author = {Miklós Kurucz and András Benczúr and Károly Csalogány}, +year = {2007}, +month = {01}, +pages = {}, +title = {Methods for large scale SVD with missing values}, +journal = {ACM KDDCup 2007} +} +@article{Koren09, +author = {Yehuda Koren}, +year = {2009}, +month = {09}, +pages = {}, +title = {The BellKor solution to the Netflix Grand Prize} } \ No newline at end of file diff --git a/submission.pdf b/submission.pdf index d4df5269967ce0012925d68f94f86c59fc97d446..f5f4961d3e05dd18255a63b3474a905375bd8c21 100644 Binary files a/submission.pdf and b/submission.pdf differ